[yunqa.de] Re: SAX speed

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Sat, 07 Jul 2012 18:06:12 +0200

On 06.07.2012 16:14, Torsten Spindler wrote:

> I'm fairly new to Delphi (3 weeks)and I'm looking for a way to read in
> a translation memory in the TMX or xliff format and store it in a
> datastructure of my own. I just downloaded DIXml and tried the SAX2
> example with this document:

The DIXml_SAX2 demo uses a TMemo component to list SAX events.
Unfortunately, TMemo does not scale well with very large texts. This is
what slows the demo down.

> http://opus.lingfil.uu.se/OpenSubtitles/de-en.tmx.gz It's 2 MB in size
> and contains ~70000 translated segments. The naive attempt with the demo
> app resulted in an approximate runtime of 10 minutes before the document
> was loaded to the memo.

I have added a performance measurement mode to the DIXml_SAX2 demo. With
the checkbox checked, the SAX events are still triggered but not written
to TMemo. Only some parsing stats are collected. I attached the new
version to this message.

> Will this runtime be significantly lowered, e.g. to less than one
> minute, when using less generic parsing callbacks? In general I need to
> detect a new <tu> tag and store the CDATA in <seg> for the language
> found in the xml:lang attribute.

In performance measurement mode, DIXml is able to parse your document in
180 milliseconds (compiled with Delphi XE2 Win64) or 220 ms (Delphi 7).
Both return a total of 334458 start tags and end tags each.

Ralf

Other related posts: