peteg's blog - AYAD - Project

Rounding up some press.

/AYAD/Project | Link

Chị Yến,
Triêu and me. I was interviewed for the women's newspaper, Phunữ. Left-to-right in the photo: DRD Director Chị Yến, my counterpart Triêu and me, and the MacBook in the foreground, of course. We're seated in the corner of the DRD office. I asked the journalist to publish my email address but sadly she did not.

More press coverage at Tuổi Trẻ and Daily.com.vn. I guess this means my AYAD project was a success.

Website launch.

/AYAD/Project | Link

We're famous, we've got the top hit on Google Việt Nam for "DRD", and on the front page for Google.com. The launch went fairly well, with demonstrations of the "zoom" layout and the JAWS screen reader, and good attendence by the press. There were some good questions and I was exhausted by the end of it.

DRD Website Mới live.

/AYAD/Project | Link

Partly by accident, partly by design, the new DRD website is now live. It has some remaining rough edges which I'll be ironing out this week and next. We'll have a launch press conference this Sunday, June 29. Any and all feedback is very welcome.

Project blah blah.

/AYAD/Project | Link

I've been reading a lot of accessibility articles. Most are (at best) unscientific. If you can cope with the outmoded HTML advice, the best is Joe Clark's Building Accessible Websites. I feel his treatment of colour blindness is ... excessive, in a good way.

Finally someone has cooked up a good explanation of how to stuff Flash into a webpage, and why <embed> lurches forever onward. It has convinced me that if one decides to be evil and use Flash, one must necessarily also use JavaScript.

My goal of actually putting the XML part of XHTML to work by validating the comments and general content coming into HOPE with HaXml is mostly working, modulo some bugs here and there. Break it here: http://210.245.124.74/~drdviet/hope/ (soon to become the main DRD site, I hope.)

The I'm-in-a-hurry guide to accessibility.

/AYAD/Project | Link

I've been warmly surprised by some of the general-topics things I've found on the net recently; either Google's search results are tending away from waffle or I am becoming very adept at closing tabs and flushing the short-term memory buffer. Dive into Accessibility is straightfoward and the tips mostly directly applicable. Unfortunately it has dated a bit; I shudder to think that there are too many HTML 4 websites out there that people now want to make accessible...

I slightly quibble with his advice on access keys (Day 15, p32 of the PDF), where he suggests that a stroke survivor may make use of them. I expect there are many people who struggle to make key chords, especially those having the use of just one hand.

To accesskey or not to accesskey...

/AYAD/Project | Link

An accesskey is another of the W3's ways of muddling concerns; when you type something into a web browser, what does it mean? Does the browser get the keystroke or the page, or the form, and which form are we talking about anyway... It seems the prevailing wisdom is not to use accesskeys as the implementations are broken; in a nutshell, this misfeature is not compositional, a feature shared with all the very worst XHTML ideas.

Back in the real world we're stuck between doing something "many users will be familiar with" (yeah, I remember that) and leaving things alone. I've decided to follow the BBC's ambivalent lead here.

Lazy CSV Parsing in Haskell.

/AYAD/Project | Link

One of the fun bits about this project is the text munging that comes with it. The regexp libraries for Haskell have super-sophisticated do-what-I-think-you-mean interfaces and not enough (simple) use-cases in the docs. Couple that with my concerns about Unicode support and I'm stuck doing it the very old fashioned way.

OK, enough editorialising; I've written a mostly-{RFC 4810, Haskell 98}-compliant lazy CSV parser that appears to work OK on reasonable-sized inputs. Existing solutions use Parsec, whose return type seems to guarantee that more-or-less the entire output must reside in memory at some point. This might be OK for small files, but the 6Mb of Unicode data I need to import consumes a ridiculous amount of memory, even with GHC's optimiser going full-bore.

You can find it here. The licence is BSD. Couple it with the appropriate utf8-string for your GHC and it works well on UTF-8-encoded files.

Now, to track down a nasty memory leak somewhere in the database code... the profiler tells me SYSTEM is hanging onto some stuff, but not what SYSTEM actually is. Err, what did Fergus say again?

Simplifying the XHTML DTD for fun and profit.

/AYAD/Project | Link

For the usual reasons it seemed best to use FCKeditor as an input widget for HOPE. I had hoped to provide some kind of hacker-friendly markup but time is short and convincing FCKeditor to generate it would probably require some heart surgery. So XHTML everywhere it is.

Clearly this path should lead to paranoia; we can't allow users to submit arbitary strings, or even arbitary XHTML. My heavyweight solution is to validate such submissions against a stripped-down XHTML DTD using HaXml. So far I've removed forms, scripts and restricted the attributes of <a> to just href. I wish the DTD was readable; it is merely an algebraic data type afterall.

Combined with some thorough string-escaping for the other inputs and a tendency to cop-out (crash) on anything that doesn't completely conform to expectations, I think we will be all right.

You can try your hand here. Any and all feedback is much appreciated.

In related news I've uploaded my FCKeditor "server-side integration" Haskell library to Hackage. Find that here.

People like their webpages the way they like their streets.

/AYAD/Project | Link

I went to visit Marc today, at the Prince of Wales Hospital, and we got around to talking about design. Roughly speaking, it seems to me that people tend to like their webpages the same way they like their streets. In Australia, and perhaps the West in general, we want order, clearly marked lanes, pedestrian crossings, accessibility in the form of kerb cuts and beeping and flashing attention-getting devices. The footpaths are clear of stalls and coffee merchants. Conversely Asia seems to prefer craziness, where finding things is difficult but what you do find is sometimes more valuable than what you set out for. As Marc observed, one uses fifty fonts to show that one is more prosperous than the guys who only used forty-nine, and damn that street food (mystery meat) is tasty.

I'm going to have to face up to the tension between Vietnamese website aesthetics and aspirations to accessibility rather soon.

HOPE in a host(ile) environment.

/AYAD/Project | Link

I finally managed to get HOPE to statically link. Björn does this against MySQL, so the infrastructure was there, but Debian's PostgreSQL binary packages are built with Kerberos support, and apparently libkrb5-dev no longer supports static linking. I rebuilt them with all that switched off (thanks Debian, that was pretty easy), and now the linker seems happy. The hazards of binary packaging...

So, why bother with this at all? Well, the company that will host DRD's new website apparently doesn't have GMP installed. This threw me a bit; I expected it to be missing all sorts of stuff, but GMP? I have been using GHC for too long it seems. Sure, I could try to arrange for the shared library to be present, or statically link just GMP in, but it seemed better to insulate HOPE from any other changes in the hostile environment with too much overkill, rather than not enough.

Next up: what happens when /etc/hosts is MIA? How do I talk to the database server? I begin to understand why everyone sticks to PHP and MySQL, and why Ruby on Rails's convention-based approach is such a big deal.

Haskell server-side integration for FCKeditor.

/AYAD/Project | Link

I am not a JavaScript hacker, so I have no clear idea how best to use FCKeditor. My embryonic Haskell library just spits out either a textarea or some JavaScript that creates an FCKeditor instance depending on how HTTP_USER_AGENT is set, though I can imagine someone wanting to do something fancier [*]. The POSTed data is validated against XHTML 1.0 Strict using HaXml, which seems to work well for the most part; for some reason FCKeditor uses the non-standard <embed> tag for Flash content, and I can't find a convincing reason why [**].

In the not-to-distant future I will implement the connector stuff, and Cabalise it.

[*] Apparently I still need to crank out an <iframe> to satisfy Internet Explorer, so we can either revert to XHTML 1.0 Transitional or generate some non-standard XHTML just for Internet Explorer. It's a tough call.

[**] It seems that recent versions of Internet Explorer (6 and 7), Mozilla-based browsers (Camino, FireFox) and Safari 3 are all happy with the <object> tag. Adobe has a "knowledge base" article full of non-reasons to use the <embed> tag. The great thing about web standards is we're all empiricists now...

Lest I forget, Haskell and Unicode.

/AYAD/Project | Link

One reason I ran away from all of the CMS systems implemented in PHP is its (historically) crappy support for Unicode [*]. Standard Haskell, on the other hand, has required the Char type to be able to represent a Unicode codepoint for quite a while now. Unfortunately there are a few libraries that are not Unicode friendly, such as just about every library interfacing with C.

Concretely:

  • HSQL needed some work to get it to talk UTF-8 to PostgreSQL.
  • Most but not all of the CGI library is Unicode friendly. I don't know enough about the various RFCs to know what's encoded as what, so I don't know how to do this right. For example, how are Unicode filenames handled?
  • The regexp libs are a bit of a minefield (the user-interface is quite complex, and those C libraries are unknown quantities), so I have avoided using them.
  • HOPE itself is almost entirely encoding-agnostic, apart from the top-level (where it builds a CGI header for the webserver's consumption), and HaskellDB just punts around the strings fairly blindly, doing a minimal amount of escaping. Good job, Björn.

I really, really wish Haskell had a decent story about character encoding at the I/O level. Back in 2002 people seemed to get really excited about doing something about it, but that mailing list is dead now. I guess the hope is that once ByteStrings and all that are bedded down, the I/O layer can be rebuilt on efficient foundations, fusion will take care of performance issues with codec layers and so forth.

Update: ConradP has surveyed some Haskell character munging libraries.

[*] perl has good Unicode support, if one is happy to play the guessing game as to what format each string is in. I feel that strong typing — clearly separating characters from strings of bytes — is just what is needed here.

Halfway through the project, I begin to talk about the project.

/AYAD/Project | Link

So the game here is to build a CMS-style website for DRD, who are presently using an unmaintainable ASP mess. (Heh, I think that's the old ASP, not ASP.NET, but what would I know.) I decided to renovate Björn's Haskell effort, HOPE, which looked, superficially at least, pretty hackable.

Activity for these past few months:

  • I tried to fix the concurrency issues. There was/is [*] a lot of confusing code that looks like it might be safe, but wasn't. It might have worked if the DBMS provides coarse enough concurrency, and traffic is sufficiently light. (I don't claim to have fixed everything yet, and there are limits to what we can do.)
  • As part of the above I hacked the daylights out of HaskellDB and HSQL, but only conforming their PostgreSQL backends with my higher-level changes [**]. Specifically I tried to extend their notions of a relational database to encompass constraints [***], and add support for the serial datatype.
    • HSQL seems adequate as a low-level SQL interface, at least as far as these things go in Haskell [***], so I don't know why anyone would reinvent that wheel (ask them).
    • I would strongly recommend against trying to use HaskellDB, despite the heroic efforts of Björn et al. It's nice in theory but quite limited and very complex in practice. If I were to do this project over, I would drop HOPE's dependency on HaskellDB.
    • I am now painfully aware of the semantic gap between Haskell and SQL databases. What we really want is serialisation and querying of algebraic data types, that is, something closer to XML technology. The only group I know that is taking persistence seriously at the typed, higher-order, etc. programming language level is the mob working on Alice/ML, and if I had a spare life I'd marry that with Benjamin C. Pierce's work of the past ten years or so and develop a mergeable, distributed, queryable storage manager for a decent language.
  • Added a lot of I18N support. This is as-yet incomplete, of course, and I'm not very happy with how I've done the dynamic part of it. One major outstanding issue is how best to support multi-lingual tagging.
  • Shifted away from Björn's home-brew and somewhat buggy hmarkup to the Windows-user friendly FCKeditor. I have my qualms about this, but I've got to consider my user-base.

Some of the abstractions in HOPE are fantastic, and others are head-scratching, tantalisingly close to being so. If I have the time and enough brain capacity, I'd really like to re-do the notion of resource so we can (for example) generate site maps and have fewer URL paths scattered through the code. So, good effort Björn.

If you're interested in any of this, you can take a look at the darcs repos at http://peteg.org/haskell. Please note that everything there should be considered alpha quality and under chaotic development.

[*] My changes are so pervasive that it's better to think of my version as a fork rather than a continuation. The database schema is quite different and currently requires PostgreSQL, so I doubt it is useful to any current users.

[**] This has some nasty ramifications. One is that it is unlikely that my code will be merged into the mainstream darcs repos, as I have no interest in or time to fix the other backends. (I refuse to encourage anyone to use speed-over-correctness software like MySQL.) Due to this, I doubt one can use the shiny-new cabal-install to suck down the myriad dependencies of my version of HOPE, as you'll need some stuff from my repos, and other stuff may as well come from the official places.

[***] Somewhat ironic to me is that all the low-level Haskell SQL bridges I've seen have a very limited view of what a relational database is; usually the bridge just ships SQL one way and gets a list of rows back, and provides a very basic table description mechanism. I haven't seen any support for defaults, triggers, constraints (foreign keys, primary keys, uniqueness, etc.), and while there is usually support for transactions, it is difficult to figure out what that means as the bridges all try to be backend-agnostic. Conversely there are a lot of attempts at making rows and queries type-safe.