On 27.10.2011 15:48, Edwin Yip wrote: > One more question, now I see that dixml is more suitable for my problem, > but this led me to another question - what DIHtmlPaser can do that DIXml > doesn't? It is not the question what DIHtmlParser can to that DIXml can not, or the other way around. Both approach parsing quite differently. DIHtmlParser parses HTML and XML in a linear fashion. It does not build DOM structures and does not need memory for that. It is therefore very fast, especially for large files. Document encodings are converted to Unicode on the fly, interface functions use UnicodeString. DIHtmlParser is customizable via Filters and Plugins. I am using DIHtmlParser for HTML data extraction and modification, both simple and advanced (think state machine). DIXml is more "intelligent" than DIHtmlParser in that it knows about XML and HTML document structures. This makes DIXml more standard conformant, but non-standard documents can turn out problematic and HTML 5 specifics are not built in. DIXml converts character encodings to UTF-8 internally and passes UTF-8 to applications. Because of its DOM facilities, data extraction and document modification is different, but XSLT transformations are available. So it really depends on your requirements and/or liking if DIHtmlParser or DIXml suites you better. Ralf _______________________________________________ Delphi Inspiration mailing list yunqa@xxxxxxxxxxxxx //www.freelists.org/list/yunqa