[tssg-tech] Re: <link> "Working with XML on Android"

  • From: "Beatrice W. Chaney" <bwchaney@xxxxxxxx>
  • To: tssg-tech@xxxxxxxxxxxxx
  • Date: Tue, 28 Sep 2010 17:57:28 -0400

I extracted and examined a sample entry manually with an editor (*), and it appears that the 'Organizer' item has a bug in it, it always has a <div> instead of a </div>, so this would throw off a parser because of dangling or rather missing </div>.

See attached, first occurrence is at line 32

What sway do we have over BostonEventsList developers? and should we rely on content being well-formed, or pre-process it ourselves just in case?

Bea

(*) I use Notepad++, http://notepad-plus-plus.org/ which shows matching elements when you click on them.

Jim Cant wrote:

I agree with this analysis of the problem.

If I recall correctly, all the &..; strings to escape characters that are meaningful in XML (<>&) occur in the content of the <description> element of the <RSS>. The approach of processing this chunk of the RSS with a regular expression to turn it into XML which we can then parse seems like the first thing to try.

The Java String class provides a method
replaceAll(String <http://download-llnw.oracle.com/javase/6/docs/api/java/lang/String.html> regex, String <http://download-llnw.oracle.com/javase/6/docs/api/java/lang/String.html> replacement)
which
Replaces each substring of this string that matches the given regular expression <http://download-llnw.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#sum> with the given replacement.
that may be of help to whomever undertakes this.

jim
------------------------------------------------------------------------
Date: Tue, 28 Sep 2010 17:09:59 -0400
From: bwchaney@xxxxxxxx
To: tssg-tech@xxxxxxxxxxxxx
Subject: [tssg-tech] Re: <link> "Working with XML on Android"

A few explanations:
1. The http://validator.w3.org/ site is owned by the W3C standards organization (World Wide Web Consortium), which manages the HTML and XML industry standards, as a service to web developers. To validate a site just click the above link and enter the URL of the site you would like to validate, in this case http://www.bostoneventslist.com/ and click on 'Check'. It comes up with 70 errors. To see what those errors mean, go to http://www.bostoneventslist.com/ and choose the menu View -> Page Source. This displays the actual HTML generated by the site and you can see the errors (validator gives line numbers) Most of the validation complaints are non-conformant XHTML syntax (its header specifies 'strict') but some are mis-matched end tags such as end tags </script> found without a preceding <script> tag, etc...

2. Unfortunately, the fact that the RSS validates (it does) does not mean that the content validates, as the RSS format just wraps the content with all the < and >, etc.. converted to &lt; and &gt; (the control characters are 'escaped') precisely to avoid being thrown off if the content is invalid. RSS feeds must validate. To get the XML format of the content fragments, we first have to run them through a 'regular expression' (http://en.wikipedia.org/wiki/Regular_expression) that replaces the &lt;, &gt;, etc... with < and > , ... again, and then try to parse these fragments as XML.

3. To view the RSS XML, just enter the URL: http://www.bostoneventslist.com/rss.xml in your browser.

4. The fact that the http://www.bostoneventslist.com/ does not validate is not a direct cause of a potential issue with the content items format, as it appears content items are generated dynamically (do not show up in the source). So, we still need to determine whether the unescaped content items are well-formed, and if not 'tweak' them to be well-formed. While RSS is guaranteed to validate, I don't believe we can rely on content (that is, the <description></description> elements) being well-formed.
TODO: write or find an 'unescape' regular expression.

Bea

Jim Cant wrote:

    Hey, good news!

    How did you validate it?

    jim

    ------------------------------------------------------------------------
    Date: Tue, 28 Sep 2010 11:11:08 -0700
    From: jcarwellos@xxxxxxxxx <mailto:jcarwellos@xxxxxxxxx>
    Subject: [tssg-tech] Re: <link> "Working with XML on Android"
    To: tssg-tech@xxxxxxxxxxxxx <mailto:tssg-tech@xxxxxxxxxxxxx>

    BTW, the RSS validated with no errors.

    ------------------------------------------------------------------------
    Julie (Dingee) Carwellos
    Web and IT Project Analyst, User Experience and Interaction Designer
    LinkedIn <http://www.linkedin.com/in/jdingeecarwellos> -
    http://www.linkedin.com/in/jdingeecarwellos

    --- On Tue, 9/28/10, Julie Carwellos <jcarwellos@xxxxxxxxx>
    <mailto:jcarwellos@xxxxxxxxx> wrote:


        From: Julie Carwellos <jcarwellos@xxxxxxxxx>
        <mailto:jcarwellos@xxxxxxxxx>
        Subject: [tssg-tech] Re: <link> "Working with XML on Android"
        To: tssg-tech@xxxxxxxxxxxxx <mailto:tssg-tech@xxxxxxxxxxxxx>
        Date: Tuesday, September 28, 2010, 6:06 PM

        Bea,

        It isn't; I get a consistent 11 errors for each single-event
        web page (using FireBug to validate HTML).

        Additionally, each event page is styled with TABLEs, rather
        than floating DIVs, so we can't use a handheld.css style sheet
        to load the URL into a WebView and have only the event
        information display (using display:none; for the outer columns).

        -julie

        ------------------------------------------------------------------------
        Julie (Dingee) Carwellos
        Web and IT Project Analyst, User Experience and Interaction
        Designer
        LinkedIn <http://www.linkedin.com/in/jdingeecarwellos> -
        http://www.linkedin.com/in/jdingeecarwellos

        --- On Tue, 9/28/10, Beatrice W. Chaney <bwchaney@xxxxxxxx>
        <mailto:bwchaney@xxxxxxxx> wrote:


            From: Beatrice W. Chaney <bwchaney@xxxxxxxx>
            <mailto:bwchaney@xxxxxxxx>
            Subject: [tssg-tech] Re: <link> "Working with XML on Android"
            To: tssg-tech@xxxxxxxxxxxxx <mailto:tssg-tech@xxxxxxxxxxxxx>
            Date: Tuesday, September 28, 2010, 4:18 PM

            Hi,
            I suspect (but haven't verified it) that the
            BostonEventList data might possibly not be well-formed.
            I ran the site through the W3 validator
            http://validator.w3.org/ some time ago (and again now),
            and it comes up with a number of errors. Having a site be
            valid XHTML is a critical prerquisite to getting on top of
            Google's list.

            If this is the case (first, need to verify that
            well-formedness is really the problem) there are tidy-up
            utilities available, but we'd have to see whether they are
            suitable for Android.

            Thanks,
            Bea

            Harry Henriques wrote:

                Hello,

                I think Bea referenced the IBM website regarding RSS
                parser alternatives.  I downloaded the application
                from the website, and massaged the files.  I was able
                to get the application to successfully create an apk
                and load successfully into the Android Emulator.  The
                application is partially working, but I could use some
                help debugging it.  The application doesn't parse the
                BostonEventsList.  For some reason, it stops before
                displaying a ListView.

                I delivered the work I have finished to the SVN
                Repository in a Android project called MessageList.

                I will continue to work on it as time permits.  I've
                only just begun to fight.

                Regards,
                Harry Henriques
                Java Developer



= =

<div class="event-nodeapi">
  10/14/2010 12:00 am    EST  Ending: 10/15/2010 4:30 pm </div>
<div>
<b>Time: </b>10/14/2010 12:00 am Ending : 10/15/2010 4:30 pm1</div>

<div class="field field-type-text field-field-type-of-event">
  <h3 class="field-label">Event type</h3>
  <div class="field-items">
          Engineering,
          General,
      </div>
</div>

<div class="field field-type-text field-field-link-to-website">
  <h3 class="field-label">Link to website/contact</h3>
  <div class="field-items">
      <div class="field-item"><a href = 
"http://www.portsmouthchamber.org/ecoast/local-news-events/tech-world2010.asp"; 
target="_blank">http://www.portsmouthchamber.org/ecoast/local-news-events/tech-world2010.asp</a></div>
  </div>
</div>

<div class="field field-type-text field-field-location">
  <h3 class="field-label">Location</h3>
  <div class="field-items">
      <div class="field-item">Pease International Tradeport in Portsmouth</div>
  </div>
</div>

<div class="field field-type-text field-field-event-organizer">
  <h3 class="field-label">Organizer</h3>
  <div class="field-items">
      <div class="field-item">NH High Technology Council</div>
  <div></div><br /><div>
<b>Description</b>
</div>
<p>    *  Two day conference targeted to companies that design, develop,and 
deliver technology.<br />
    * Three separate tracks designed to attract technical, business, and 
education/employment.<br />
    * Exclusive exhibit space.<br />
    * Held at the Celestica builiding at Pease Tradeport in Portsmouth, NH.<br 
/>
    * Presentation of annual InfoXchange Awards in three categories presented 
by NH Governor John Lynch.<br />
    * Networking reception for attendees, meet the technology innovators of the 
seacoast.</p><p>
</p>

<p>For more information or if interested in speaking or sponsoring 
opportunities for TechWorld 2010 please contact Salina McIntire at <a 
href="mailto:ecoast@xxxxxxxxxxxxxxxxxxxxx";>ecoast@xxxxxxxxxxxxxxxxxxxxx</a> or 
call 603-610-5514.</p><p>
</p>

<p>Day 1: Oct. 14th<br />
12:00 - 1:00pm NH High Tech Luncheon<br />
12:00 - 5:00pm Exhibit opens<br />
1:00 - 2:15pm Keynote Speaker<br />
2:15 - 5:00pm Speakers &amp;amp; demonstrations<br />
5:00 - 6:30pm Cocktail Party<br />
6:30 - 8:30pm NH High Technology Council Awards Dinner<br />
8:30 - 11:00pm After Party</p><p>
</p>

<p>      Day 2: Oct. 15th<br />
      9:00am - 4:00pm Expo opens<br />
      9:30 - 10:30am Success Stories and demonstrations<br />
      10:30 - 11:15am Speaker George Bald. DRED: Why stay, work, and play in 
NH?<br />
      12:00 - 2:00pm Keynote Luncheon<br />
      2:00 - 4:30pm Emerging Technologies and breakout sessions</p><p>
</p>

<p>Cost:  $20 - $80</p>
</description>
 
<comments>http://www.bostoneventslist.com/event/TechWorld-2010#comments</comments>
 <pubDate>Sun, 12 Sep 2010 23:11:02 +0000</pubDate>
 <dc:creator>priya</dc:creator>
 <guid isPermaLink="false">1950 at http://www.bostoneventslist.com</guid>
</item>
<item>
 <title>Second Annual Auto-ID &amp; Sensing Seminar &amp; Expo</title>
 
<link>http://www.bostoneventslist.com/event/Second-Annual-Auto-ID-Sensing-Seminar-Expo</link>
 <description><div class="event-nodeapi">
  10/13/2010 8:45 am    EST  </div><div>
<b>Time: </b>10/13/2010 8:45 am</div>

<div class="field field-type-text field-field-type-of-event">
  <h3 class="field-label">Event type</h3>
  <div class="field-items">
          Networking,
          Engineering,
          General,
      </div>
</div>

<div class="field field-type-text field-field-link-to-website">
  <h3 class="field-label">Link to website/contact</h3>
  <div class="field-items">
      <div class="field-item"><a href = 
"http://www.merrimack.edu/academics/science_engineering/ElectricalEngineering/news_events/Pages/RFID.aspx";
 
target="_blank">http://www.merrimack.edu/academics/science_engineering/ElectricalEngineering/news_events/Pages/RFID.aspx</a></div>
  </div>
</div>

<div class="field field-type-text field-field-location">
  <h3 class="field-label">Location</h3>
  <div class="field-items">
      <div class="field-item">The Rogers Center at Merrimack College
315 Turnpike St. No. Andover, MA. 01845</div>
  </div>
</div>

<div class="field field-type-text field-field-event-organizer">
  <h3 class="field-label">Organizer</h3>
  <div class="field-items">
      <div class="field-item">Merrimack College Department of Electrical 
Engineering
MIT Enterprise Forum Auto-ID &amp;amp; Sensing Solutions
UMass Lowell Nanomanufacturing Centers</div>
  <div></div><br /><div>
<b>Description</b>
</div>
<p>Last year we had a great event at MIT with 44 exhibitors of Auto-ID 
&amp;amp; Sensing solutions. This year&#039;s event has an expanded program 
with noteworthy keynote speakers, interesting panels plus 40 or more 
exhibitors.</p><p>
</p>

<p>This is a VERY low cost way to network with others and keep up with new 
developments. It will be FUN, INFORMATIVE and INTERACTIVE. Please join 
us!</p><p>
</p>

<p>Our Keynote Speakers are:<br />
Marty Meehan, Chancellor of University of Massachusetts Lowell<br />
Mark Russell, Raytheon&#039;s VP of Engineering<br />
Craig Casto, Dow Chemical, Global Leader RFID, GPS, AutoID Expertise 
Center</p><p>
</p>

<p>Also featured are three panel discussions:      </p><p>
</p>

<p>Marketing Successes and Pitfalls<br />
Tom Coyle, Moderator<br />
Roger Bridgeman, President of Bridgeman Communications<br />
Mike Liard, ABI Research<br />
Alan Sherman, Director of Marketing, OATSystems<br />
Liz Churchill, Director, Marketing &amp;amp; BusDev, Bilcare Technologies</p><p>
</p>

<p>Auto-ID Implementations<br />
Prof. Ram Nagarajan, University of Massachusetts Lowell<br />
Prof. Charles Kochakian, Merrimack College<br />
Steve Miles, MIT Enterprise Forum</p><p>
</p>

<p>Investing in Auto-ID &amp;amp; Sensing<br />
John Greaves, Moderator<br />
Ron Wagner, Sivix<br />
John Chalus, Vice President, Kinetic Advisors, LLC<br />
Registration Details  </p><p>
</p>

<p>Conference Cost (includes lunch)<br />
$25 "Early Bird" - register by 10/1<br />
$35  Through 10/12<br />
$40 "On Site" registration, 10/13<br />
Students attend free of charge: however, students MUST register online prior to 
the day of the conference.</p><p>
</p>

<p>Exposition Table Cost<br />
$200  "Early Bird" - register by 10/1<br />
$250  Through 10/12<br />
Purchase up to five conference registrations for the discounted rate of 
$20/registration. NOTE: offer is good at time of table rental.</p>

Other related posts: