peteg's blog - AYAD - Project - 2008 04 11 LazyCSVParser

Lazy CSV Parsing in Haskell.

/AYAD/Project | Link

One of the fun bits about this project is the text munging that comes with it. The regexp libraries for Haskell have super-sophisticated do-what-I-think-you-mean interfaces and not enough (simple) use-cases in the docs. Couple that with my concerns about Unicode support and I'm stuck doing it the very old fashioned way.

OK, enough editorialising; I've written a mostly-{RFC 4810, Haskell 98}-compliant lazy CSV parser that appears to work OK on reasonable-sized inputs. Existing solutions use Parsec, whose return type seems to guarantee that more-or-less the entire output must reside in memory at some point. This might be OK for small files, but the 6Mb of Unicode data I need to import consumes a ridiculous amount of memory, even with GHC's optimiser going full-bore.

You can find it here. The licence is BSD. Couple it with the appropriate utf8-string for your GHC and it works well on UTF-8-encoded files.

Now, to track down a nasty memory leak somewhere in the database code... the profiler tells me SYSTEM is hanging onto some stuff, but not what SYSTEM actually is. Err, what did Fergus say again?