[yunqa.de] Read/Write compressed streams

  • From: Rolf Lampa <rolf.lampa@xxxxxxxxxx>
  • To: yunqa <yunqa@xxxxxxxxxxxxx>
  • Date: Wed, 02 Jul 2008 19:49:39 +0200

Hi,

Perhaps I sholuld have posted here at first. (I just posted essentially this text to the borland...win32 NG and then I recalled this list).

Since I often read crunch HUGE xml files for testing "worst case" scenarios, like enwiki xml dumps, I wonder if there's any VERY fast TFileStream based readers/writers out there which can optionally read .bz2, .gzip, 7z or .zip files directly, without unzipping them? (and, well, also write directly to disk in the mentioned compressed fmts).

The readers should of course uncompress data in its internal buffers for convenient use.

Since I read and manipulate Utf8 data, a stream which can be combined with Unicode converters would be fine.

I tried looking at Ralf Junkers library, but I couldn't find any which could read/write directly from/to mentioned compressed files, unless I simply missed it.

I'd prefer TFileStream compatible usage so I only have to modify the class name to create... (exceedingly technically skilled and lazy as I am :)

[Edit]: I do perform a lot of text manipulation on the texts read by the stream object (although that is done once the text have been inserted into objects), but some of the manipulations perhaps could be done directly in the stream's read buffer? I'm thinking about some more advanced tricks where Regex would apply. Speed is crucial though. (processing time like 12 days, to come down to < 3 hours, is what I'm currently onto, so... ).

Regards,

// Rolf Lampa
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: