[yunqa.de] Re: DIHTMLParser and msdn

  • From: "Rael Bauer" <rael.bauer@xxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Sun, 12 Oct 2008 02:11:31 +0200

>
> Important note: Attached files do not make it to the public archives. In
> case
> you have missed to retrieve the message via e-mail, you can temporarily
> download it from here (the file will eventually be deleted without further
> notice):
>
>   http://www.yunqa.de/delphi/downloads/DIHtmlParser_WebDownload.zip
>
>
Thanks. I've set my subscription to vacation mode (I don't like posts that
I'm not interested in clogging up my inbox...) so I would otherwise miss the
attachment.


> In order to download web pages for 100% identical offline viewing requires
> at
> least:
>
> 1. A CSS parser to extract linked files via CSS.
>
> 2. A JavaScript engine to process a HTML DOM structure and download
> additional
> information if necessary.
>
> 3. Analyzer code for installed browser plugins to extract links hidden in
> applets.
>
> 4. Possibly more?
>


Thanks for the extra info. Looking at the output of some archiving tools, I
see some don't save the .js files and still get good output - it makes me
think that cutting out the js files may even give better results...

Can you tell me how I can remove SCRIPT tag sections from the output html in
the WebDownload example?

Thanks
Rael

Other related posts: