I was interviewed for the women's newspaper, Phunữ. Left-to-right in the photo: DRD Director Chị Yến, my counterpart Triêu and me, and the MacBook in the foreground, of course. We're seated in the corner of the DRD office. I asked the journalist to publish my email address but sadly she did not.
More press coverage at Tuổi Trẻ and Daily.com.vn. I guess this means my AYAD project was a success.
We're famous, we've got the top hit on Google Việt Nam for "DRD", and on the front page for Google.com. The launch went fairly well, with demonstrations of the "zoom" layout and the JAWS screen reader, and good attendence by the press. There were some good questions and I was exhausted by the end of it.
Partly by accident, partly by design, the new DRD website is now live. It has some remaining rough edges which I'll be ironing out this week and next. We'll have a launch press conference this Sunday, June 29. Any and all feedback is very welcome.
I've been reading a lot of accessibility articles. Most are (at best) unscientific. If you can cope with the outmoded HTML advice, the best is Joe Clark's Building Accessible Websites. I feel his treatment of colour blindness is ... excessive, in a good way.
Finally someone has cooked up a good explanation of how to
stuff Flash into a webpage, and why <embed>
lurches forever onward. It has convinced me that if one decides to be
evil and use Flash, one must necessarily also use JavaScript.
My goal of actually putting the XML part of XHTML to work by validating the comments and general content coming into HOPE with HaXml is mostly working, modulo some bugs here and there. Break it here: http://210.245.124.74/~drdviet/hope/ (soon to become the main DRD site, I hope.)
I've been warmly surprised by some of the general-topics things I've found on the net recently; either Google's search results are tending away from waffle or I am becoming very adept at closing tabs and flushing the short-term memory buffer. Dive into Accessibility is straightfoward and the tips mostly directly applicable. Unfortunately it has dated a bit; I shudder to think that there are too many HTML 4 websites out there that people now want to make accessible...
I slightly quibble with his advice on access keys (Day 15, p32 of the PDF), where he suggests that a stroke survivor may make use of them. I expect there are many people who struggle to make key chords, especially those having the use of just one hand.
An accesskey
is another of the W3's ways of muddling concerns;
when you type something into a web browser, what does it mean? Does
the browser get the keystroke or the page, or the form, and which form
are we talking about anyway... It seems the prevailing
wisdom is not to use accesskey
s as the implementations are
broken; in a nutshell, this misfeature is not compositional, a
feature shared with all the very worst XHTML ideas.
Back in the real world we're stuck between doing something "many users will be familiar with" (yeah, I remember that) and leaving things alone. I've decided to follow the BBC's ambivalent lead here.
One of the fun bits about this project is the text munging that comes with it. The regexp libraries for Haskell have super-sophisticated do-what-I-think-you-mean interfaces and not enough (simple) use-cases in the docs. Couple that with my concerns about Unicode support and I'm stuck doing it the very old fashioned way.
OK, enough editorialising; I've written a mostly-{RFC 4810, Haskell 98}-compliant lazy CSV parser that appears to work OK on reasonable-sized inputs. Existing solutions use Parsec, whose return type seems to guarantee that more-or-less the entire output must reside in memory at some point. This might be OK for small files, but the 6Mb of Unicode data I need to import consumes a ridiculous amount of memory, even with GHC's optimiser going full-bore.
You can find it here. The licence is BSD. Couple it with the appropriate utf8-string for your GHC and it works well on UTF-8-encoded files.
Now, to track down a nasty memory leak somewhere in the database
code... the profiler tells me SYSTEM
is hanging onto some
stuff, but not what SYSTEM
actually is. Err, what did Fergus say
again?
For the usual reasons it seemed best to use FCKeditor as an input widget for HOPE. I had hoped to provide some kind of hacker-friendly markup but time is short and convincing FCKeditor to generate it would probably require some heart surgery. So XHTML everywhere it is.
Clearly this path should lead to paranoia; we can't allow users to
submit arbitary strings, or even arbitary XHTML. My heavyweight
solution is to validate such submissions against a stripped-down XHTML DTD using HaXml. So far I've removed forms, scripts and
restricted the attributes of <a>
to just
href
. I wish the DTD was readable; it is merely an algebraic data type afterall.
Combined with some thorough string-escaping for the other inputs and a tendency to cop-out (crash) on anything that doesn't completely conform to expectations, I think we will be all right.
You can try your hand here. Any and all feedback is much appreciated.
In related news I've uploaded my FCKeditor "server-side integration" Haskell library to Hackage. Find that here.
I went to visit Marc today, at the Prince of Wales Hospital, and we got around to talking about design. Roughly speaking, it seems to me that people tend to like their webpages the same way they like their streets. In Australia, and perhaps the West in general, we want order, clearly marked lanes, pedestrian crossings, accessibility in the form of kerb cuts and beeping and flashing attention-getting devices. The footpaths are clear of stalls and coffee merchants. Conversely Asia seems to prefer craziness, where finding things is difficult but what you do find is sometimes more valuable than what you set out for. As Marc observed, one uses fifty fonts to show that one is more prosperous than the guys who only used forty-nine, and damn that street food (mystery meat) is tasty.
I'm going to have to face up to the tension between Vietnamese website aesthetics and aspirations to accessibility rather soon.
I finally managed to get HOPE to statically link. Björn does
this against MySQL, so the infrastructure was there, but Debian's PostgreSQL binary packages are built with Kerberos
support, and apparently libkrb5-dev
no longer supports
static linking. I rebuilt them with all that switched off (thanks Debian, that was pretty easy), and now the linker seems happy. The
hazards of binary packaging...
So, why bother with this at all? Well, the company that will host DRD's new website apparently doesn't have GMP installed. This threw me a bit; I expected it to be missing all sorts of stuff, but GMP? I have been using GHC for too long it seems. Sure, I could try to arrange for the shared library to be present, or statically link just GMP in, but it seemed better to insulate HOPE from any other changes in the hostile environment with too much overkill, rather than not enough.
Next up: what happens when /etc/hosts
is MIA? How do I
talk to the database server? I begin to understand why everyone sticks
to PHP and MySQL, and why Ruby on Rails's convention-based approach
is such a big deal.
I am not a JavaScript hacker, so I have no clear idea how best to
use FCKeditor. My embryonic Haskell library just spits out
either a textarea
or some JavaScript that creates an
FCKeditor instance depending on how HTTP_USER_AGENT
is set, though I can imagine someone wanting to do something fancier
[*]. The POST
ed data is validated against XHTML 1.0
Strict using HaXml, which seems to work well for the most part;
for some reason FCKeditor uses the non-standard
<embed>
tag for Flash content, and I can't find
a convincing reason why [**].
In the not-to-distant future I will implement the connector stuff, and Cabalise it.
[*] Apparently I still need to crank out an
<iframe>
to satisfy Internet Explorer, so we can
either revert to XHTML 1.0 Transitional or generate some
non-standard XHTML just for Internet Explorer. It's a tough
call.
[**] It seems that recent versions of Internet Explorer (6 and 7),
Mozilla-based browsers (Camino, FireFox) and Safari 3
are all happy with the <object>
tag. Adobe has
a "knowledge
base" article full of non-reasons to use the
<embed>
tag. The great thing about web standards is
we're all empiricists now...
One reason I ran away from all of the CMS systems implemented in
PHP is its (historically) crappy support for Unicode [*]. Standard
Haskell, on the other hand, has required the Char
type to be able to represent a Unicode codepoint for quite a while
now. Unfortunately there are a few libraries that are not Unicode
friendly, such as just about every library interfacing with C.
Concretely:
- HSQL needed some work to get it to talk UTF-8 to PostgreSQL.
- Most but not all of the CGI library is Unicode friendly. I don't know enough about the various RFCs to know what's encoded as what, so I don't know how to do this right. For example, how are Unicode filenames handled?
- The regexp libs are a bit of a minefield (the user-interface is quite complex, and those C libraries are unknown quantities), so I have avoided using them.
- HOPE itself is almost entirely encoding-agnostic, apart from the top-level (where it builds a CGI header for the webserver's consumption), and HaskellDB just punts around the strings fairly blindly, doing a minimal amount of escaping. Good job, Björn.
I really, really wish Haskell had a decent story about character
encoding at the I/O level. Back in 2002 people seemed to
get really excited about doing something about it, but that
mailing list is dead now. I guess the hope is that once
ByteString
s and all that are bedded down, the I/O layer
can be rebuilt on efficient foundations, fusion will take care of
performance issues with codec layers and so forth.
Update: ConradP has surveyed some Haskell character munging libraries.
[*] perl has good Unicode support, if one is happy to play the guessing game as to what format each string is in. I feel that strong typing — clearly separating characters from strings of bytes — is just what is needed here.
Halfway through the project, I begin to talk about the project.
Tue, Nov 27, 2007./AYAD/Project | LinkSo the game here is to build a CMS-style website for DRD, who are presently using an unmaintainable ASP mess. (Heh, I think that's the old ASP, not ASP.NET, but what would I know.) I decided to renovate Björn's Haskell effort, HOPE, which looked, superficially at least, pretty hackable.
Activity for these past few months:
- I tried to fix the concurrency issues. There was/is [*] a lot of confusing code that looks like it might be safe, but wasn't. It might have worked if the DBMS provides coarse enough concurrency, and traffic is sufficiently light. (I don't claim to have fixed everything yet, and there are limits to what we can do.)
- As part of the above I hacked the daylights out of HaskellDB
and HSQL, but only conforming their PostgreSQL backends with
my higher-level changes [**]. Specifically I tried to extend their
notions of a relational database to encompass constraints [***], and
add support for the
serial
datatype.- HSQL seems adequate as a low-level SQL interface, at least as far as these things go in Haskell [***], so I don't know why anyone would reinvent that wheel (ask them).
- I would strongly recommend against trying to use HaskellDB, despite the heroic efforts of Björn et al. It's nice in theory but quite limited and very complex in practice. If I were to do this project over, I would drop HOPE's dependency on HaskellDB.
- I am now painfully aware of the semantic gap between Haskell and SQL databases. What we really want is serialisation and querying of algebraic data types, that is, something closer to XML technology. The only group I know that is taking persistence seriously at the typed, higher-order, etc. programming language level is the mob working on Alice/ML, and if I had a spare life I'd marry that with Benjamin C. Pierce's work of the past ten years or so and develop a mergeable, distributed, queryable storage manager for a decent language.
- Added a lot of I18N support. This is as-yet incomplete, of course, and I'm not very happy with how I've done the dynamic part of it. One major outstanding issue is how best to support multi-lingual tagging.
- Shifted away from Björn's home-brew and somewhat buggy
hmarkup
to the Windows-user friendly FCKeditor. I have my qualms about this, but I've got to consider my user-base.
Some of the abstractions in HOPE are fantastic, and others are head-scratching, tantalisingly close to being so. If I have the time and enough brain capacity, I'd really like to re-do the notion of resource so we can (for example) generate site maps and have fewer URL paths scattered through the code. So, good effort Björn.
If you're interested in any of this, you can take a look at the darcs repos at http://peteg.org/haskell. Please note that everything there should be considered alpha quality and under chaotic development.
[*] My changes are so pervasive that it's better to think of my version as a fork rather than a continuation. The database schema is quite different and currently requires PostgreSQL, so I doubt it is useful to any current users.
[**] This has some nasty ramifications. One is that it is unlikely that my code will be merged into the mainstream darcs repos, as I have no interest in or time to fix the other backends. (I refuse to encourage anyone to use speed-over-correctness software like MySQL.) Due to this, I doubt one can use the shiny-new cabal-install to suck down the myriad dependencies of my version of HOPE, as you'll need some stuff from my repos, and other stuff may as well come from the official places.
[***] Somewhat ironic to me is that all the low-level Haskell SQL bridges I've seen have a very limited view of what a relational database is; usually the bridge just ships SQL one way and gets a list of rows back, and provides a very basic table description mechanism. I haven't seen any support for defaults, triggers, constraints (foreign keys, primary keys, uniqueness, etc.), and while there is usually support for transactions, it is difficult to figure out what that means as the bridges all try to be backend-agnostic. Conversely there are a lot of attempts at making rows and queries type-safe.