[tssg-tech] Re: <link> "Working with XML on Android"

  • From: Jim Cant <cant_jim@xxxxxxxxxxx>
  • To: tssg tech <tssg-tech@xxxxxxxxxxxxx>
  • Date: Tue, 28 Sep 2010 17:25:33 -0400

I agree with this analysis of the problem.

If I recall correctly, all the &..; strings to escape characters that are 
meaningful in XML (<>&) occur in the content of the <description> element of 
the <RSS>.  The approach of processing this chunk of the RSS with a regular 
expression to turn it into XML which we can then parse seems like the first 
thing to try.

The Java String class provides a method
        replaceAll(String regex,
           String replacement)


which

          Replaces each substring of this string that matches the given regular 
expression with the
 given replacement.
that may be of help to whomever undertakes this.

jim
Date: Tue, 28 Sep 2010 17:09:59 -0400
From: bwchaney@xxxxxxxx
To: tssg-tech@xxxxxxxxxxxxx
Subject: [tssg-tech] Re: <link> "Working with XML on Android"






  
  Message body


A few explanations:

1. The http://validator.w3.org/
site is owned by the W3C  standards organization (World Wide Web
Consortium), which manages the HTML and XML industry standards, as a
service to web developers.

To validate a site just click the above link and enter the URL of the
site you would like to validate, in this case
http://www.bostoneventslist.com/ and click on 'Check'. It comes up with
70 errors. 

To see what those errors mean, go to http://www.bostoneventslist.com/
and choose the menu View -> Page Source. This displays the actual
HTML generated by the site and you can see the errors (validator gives
line numbers)

Most of the validation complaints are non-conformant XHTML syntax (its
header specifies 'strict')
but some are mis-matched end tags such as end tags </script>
found without a preceding <script> tag, etc...



2. Unfortunately, the fact that the RSS validates (it does) does not
mean that the content validates, as the RSS format just wraps the
content with all the < and >, etc.. converted to &lt; and
&gt; (the control characters are 'escaped') precisely to avoid 
being thrown off if the content is invalid. RSS feeds must validate. 

To get the XML format of the content fragments, we first have to run
them through a 'regular expression'
(http://en.wikipedia.org/wiki/Regular_expression) that replaces  the
&lt;,  &gt;, etc... with < and > , ... again, and then
try to parse these fragments as XML.



3. To view the RSS XML, just enter the URL: 
http://www.bostoneventslist.com/rss.xml in your browser.



4. The fact that the http://www.bostoneventslist.com/  does not
validate is not a direct cause of a potential issue with the content
items format, as it appears content items are generated dynamically (do
not show up in the source). So, we still need to determine whether the
unescaped content items  are well-formed, and if not 'tweak' them to be
well-formed. While RSS is guaranteed to validate, I don't believe we
can rely on content (that is, the
<description></description> elements) being well-formed.

TODO: write or find an 'unescape' regular expression.



Bea



Jim Cant wrote:


  Hey,
good news!

  

How did you validate it?

  

jim

  

  Date: Tue, 28 Sep 2010 11:11:08 -0700

From: jcarwellos@xxxxxxxxx

Subject: [tssg-tech] Re: <link> "Working with XML on Android"

To: tssg-tech@xxxxxxxxxxxxx

  

  
    
      
        BTW, the RSS validated with no errors.

        

        Julie (Dingee) Carwellos

Web and IT Project Analyst, User Experience and Interaction Designer

        LinkedIn
- http://www.linkedin.com/in/jdingeecarwellos

        

--- On Tue, 9/28/10, Julie Carwellos <jcarwellos@xxxxxxxxx>
wrote:

        

From: Julie Carwellos <jcarwellos@xxxxxxxxx>

Subject: [tssg-tech] Re: <link> "Working with XML on Android"

To: tssg-tech@xxxxxxxxxxxxx

Date: Tuesday, September 28, 2010, 6:06 PM

          

          
          
            
              
                Bea,

                

It isn't; I get a consistent 11 errors for each single-event web page
(using FireBug to validate HTML).

                

Additionally, each event page is styled with TABLEs, rather than
floating DIVs, so we can't use a handheld.css style sheet to load the
URL into a WebView and have only the event information display (using
display:none; for the outer columns). 

                

-julie

                

                Julie (Dingee)
Carwellos

Web and IT Project Analyst, User Experience and Interaction Designer

                LinkedIn
- http://www.linkedin.com/in/jdingeecarwellos

                

--- On Tue, 9/28/10, Beatrice W. Chaney <bwchaney@xxxxxxxx>
wrote:

                

From: Beatrice W. Chaney <bwchaney@xxxxxxxx>

Subject: [tssg-tech] Re: <link> "Working with XML on Android"

To: tssg-tech@xxxxxxxxxxxxx

Date: Tuesday, September 28, 2010, 4:18 PM

                  

                  
                  Message body
Hi,

I suspect (but haven't verified it) that the BostonEventList data might
possibly not be well-formed.

I ran the site through the W3 validator http://validator.w3.org/
some
time ago (and again now), and it comes up with a number of errors.
Having a site be valid XHTML is a critical prerquisite to getting on
top of Google's list.

                  

If this is the case (first, need to verify that well-formedness is
really the problem) there are tidy-up utilities available, but we'd
have to see whether they are suitable for Android. 

                  

Thanks,

Bea

                  

Harry Henriques wrote:

                  
                    
                    
                    Hello,

                    

I think Bea referenced the IBM website regarding RSS parser
alternatives.  I downloaded the application from the website, and
massaged the files.  I was able to get the application to successfully
create an apk and load successfully into the Android Emulator.  The
application is partially working, but I could use some help debugging
it.  The application doesn't parse the BostonEventsList.  For some
reason, it stops before displaying a ListView.

                    

I delivered the work I have finished to the SVN Repository in a Android
project called MessageList.

                    

I will continue to work on it as time permits.  I've only just begun to
fight.

                    

Regards,

Harry Henriques

Java Developer

                    
                    
                  
                  
                
                
              
            
          
          

          
        
        
      
    
  
  

=
                                          

Other related posts: