I extracted and examined a sample entry manually with an editor (*), and it appears that the 'Organizer' item has a bug in it, it always has a <div> instead of a </div>, so this would throw off a parser because of dangling or rather missing </div>.
See attached, first occurrence is at line 32What sway do we have over BostonEventsList developers? and should we rely on content being well-formed, or pre-process it ourselves just in case?
Bea(*) I use Notepad++, http://notepad-plus-plus.org/ which shows matching elements when you click on them.
Jim Cant wrote:
I agree with this analysis of the problem.If I recall correctly, all the &..; strings to escape characters that are meaningful in XML (<>&) occur in the content of the <description> element of the <RSS>. The approach of processing this chunk of the RSS with a regular expression to turn it into XML which we can then parse seems like the first thing to try.The Java String class provides a methodreplaceAll(String <http://download-llnw.oracle.com/javase/6/docs/api/java/lang/String.html> regex, String <http://download-llnw.oracle.com/javase/6/docs/api/java/lang/String.html> replacement)whichReplaces each substring of this string that matches the given regular expression <http://download-llnw.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#sum> with the given replacement.that may be of help to whomever undertakes this. jim ------------------------------------------------------------------------ Date: Tue, 28 Sep 2010 17:09:59 -0400 From: bwchaney@xxxxxxxx To: tssg-tech@xxxxxxxxxxxxx Subject: [tssg-tech] Re: <link> "Working with XML on Android" A few explanations:1. The http://validator.w3.org/ site is owned by the W3C standards organization (World Wide Web Consortium), which manages the HTML and XML industry standards, as a service to web developers. To validate a site just click the above link and enter the URL of the site you would like to validate, in this case http://www.bostoneventslist.com/ and click on 'Check'. It comes up with 70 errors. To see what those errors mean, go to http://www.bostoneventslist.com/ and choose the menu View -> Page Source. This displays the actual HTML generated by the site and you can see the errors (validator gives line numbers) Most of the validation complaints are non-conformant XHTML syntax (its header specifies 'strict') but some are mis-matched end tags such as end tags </script> found without a preceding <script> tag, etc...2. Unfortunately, the fact that the RSS validates (it does) does not mean that the content validates, as the RSS format just wraps the content with all the < and >, etc.. converted to < and > (the control characters are 'escaped') precisely to avoid being thrown off if the content is invalid. RSS feeds must validate. To get the XML format of the content fragments, we first have to run them through a 'regular expression' (http://en.wikipedia.org/wiki/Regular_expression) that replaces the <, >, etc... with < and > , ... again, and then try to parse these fragments as XML.3. To view the RSS XML, just enter the URL: http://www.bostoneventslist.com/rss.xml in your browser.4. The fact that the http://www.bostoneventslist.com/ does not validate is not a direct cause of a potential issue with the content items format, as it appears content items are generated dynamically (do not show up in the source). So, we still need to determine whether the unescaped content items are well-formed, and if not 'tweak' them to be well-formed. While RSS is guaranteed to validate, I don't believe we can rely on content (that is, the <description></description> elements) being well-formed.TODO: write or find an 'unescape' regular expression. Bea Jim Cant wrote: Hey, good news! How did you validate it? jim ------------------------------------------------------------------------ Date: Tue, 28 Sep 2010 11:11:08 -0700 From: jcarwellos@xxxxxxxxx <mailto:jcarwellos@xxxxxxxxx> Subject: [tssg-tech] Re: <link> "Working with XML on Android" To: tssg-tech@xxxxxxxxxxxxx <mailto:tssg-tech@xxxxxxxxxxxxx> BTW, the RSS validated with no errors. ------------------------------------------------------------------------ Julie (Dingee) Carwellos Web and IT Project Analyst, User Experience and Interaction Designer LinkedIn <http://www.linkedin.com/in/jdingeecarwellos> - http://www.linkedin.com/in/jdingeecarwellos --- On Tue, 9/28/10, Julie Carwellos <jcarwellos@xxxxxxxxx> <mailto:jcarwellos@xxxxxxxxx> wrote: From: Julie Carwellos <jcarwellos@xxxxxxxxx> <mailto:jcarwellos@xxxxxxxxx> Subject: [tssg-tech] Re: <link> "Working with XML on Android" To: tssg-tech@xxxxxxxxxxxxx <mailto:tssg-tech@xxxxxxxxxxxxx> Date: Tuesday, September 28, 2010, 6:06 PM Bea, It isn't; I get a consistent 11 errors for each single-event web page (using FireBug to validate HTML). Additionally, each event page is styled with TABLEs, rather than floating DIVs, so we can't use a handheld.css style sheet to load the URL into a WebView and have only the event information display (using display:none; for the outer columns). -julie ------------------------------------------------------------------------ Julie (Dingee) Carwellos Web and IT Project Analyst, User Experience and Interaction Designer LinkedIn <http://www.linkedin.com/in/jdingeecarwellos> - http://www.linkedin.com/in/jdingeecarwellos --- On Tue, 9/28/10, Beatrice W. Chaney <bwchaney@xxxxxxxx> <mailto:bwchaney@xxxxxxxx> wrote: From: Beatrice W. Chaney <bwchaney@xxxxxxxx> <mailto:bwchaney@xxxxxxxx> Subject: [tssg-tech] Re: <link> "Working with XML on Android" To: tssg-tech@xxxxxxxxxxxxx <mailto:tssg-tech@xxxxxxxxxxxxx> Date: Tuesday, September 28, 2010, 4:18 PM Hi, I suspect (but haven't verified it) that the BostonEventList data might possibly not be well-formed. I ran the site through the W3 validator http://validator.w3.org/ some time ago (and again now), and it comes up with a number of errors. Having a site be valid XHTML is a critical prerquisite to getting on top of Google's list. If this is the case (first, need to verify that well-formedness is really the problem) there are tidy-up utilities available, but we'd have to see whether they are suitable for Android. Thanks, Bea Harry Henriques wrote: Hello, I think Bea referenced the IBM website regarding RSS parser alternatives. I downloaded the application from the website, and massaged the files. I was able to get the application to successfully create an apk and load successfully into the Android Emulator. The application is partially working, but I could use some help debugging it. The application doesn't parse the BostonEventsList. For some reason, it stops before displaying a ListView. I delivered the work I have finished to the SVN Repository in a Android project called MessageList. I will continue to work on it as time permits. I've only just begun to fight. Regards, Harry Henriques Java Developer= =
<div class="event-nodeapi"> 10/14/2010 12:00 am EST Ending: 10/15/2010 4:30 pm </div> <div> <b>Time: </b>10/14/2010 12:00 am Ending : 10/15/2010 4:30 pm1</div> <div class="field field-type-text field-field-type-of-event"> <h3 class="field-label">Event type</h3> <div class="field-items"> Engineering, General, </div> </div> <div class="field field-type-text field-field-link-to-website"> <h3 class="field-label">Link to website/contact</h3> <div class="field-items"> <div class="field-item"><a href = "http://www.portsmouthchamber.org/ecoast/local-news-events/tech-world2010.asp"; target="_blank">http://www.portsmouthchamber.org/ecoast/local-news-events/tech-world2010.asp</a></div> </div> </div> <div class="field field-type-text field-field-location"> <h3 class="field-label">Location</h3> <div class="field-items"> <div class="field-item">Pease International Tradeport in Portsmouth</div> </div> </div> <div class="field field-type-text field-field-event-organizer"> <h3 class="field-label">Organizer</h3> <div class="field-items"> <div class="field-item">NH High Technology Council</div> <div></div><br /><div> <b>Description</b> </div> <p> * Two day conference targeted to companies that design, develop,and deliver technology.<br /> * Three separate tracks designed to attract technical, business, and education/employment.<br /> * Exclusive exhibit space.<br /> * Held at the Celestica builiding at Pease Tradeport in Portsmouth, NH.<br /> * Presentation of annual InfoXchange Awards in three categories presented by NH Governor John Lynch.<br /> * Networking reception for attendees, meet the technology innovators of the seacoast.</p><p> </p> <p>For more information or if interested in speaking or sponsoring opportunities for TechWorld 2010 please contact Salina McIntire at <a href="mailto:ecoast@xxxxxxxxxxxxxxxxxxxxx";>ecoast@xxxxxxxxxxxxxxxxxxxxx</a> or call 603-610-5514.</p><p> </p> <p>Day 1: Oct. 14th<br /> 12:00 - 1:00pm NH High Tech Luncheon<br /> 12:00 - 5:00pm Exhibit opens<br /> 1:00 - 2:15pm Keynote Speaker<br /> 2:15 - 5:00pm Speakers &amp; demonstrations<br /> 5:00 - 6:30pm Cocktail Party<br /> 6:30 - 8:30pm NH High Technology Council Awards Dinner<br /> 8:30 - 11:00pm After Party</p><p> </p> <p> Day 2: Oct. 15th<br /> 9:00am - 4:00pm Expo opens<br /> 9:30 - 10:30am Success Stories and demonstrations<br /> 10:30 - 11:15am Speaker George Bald. DRED: Why stay, work, and play in NH?<br /> 12:00 - 2:00pm Keynote Luncheon<br /> 2:00 - 4:30pm Emerging Technologies and breakout sessions</p><p> </p> <p>Cost: $20 - $80</p> </description> <comments>http://www.bostoneventslist.com/event/TechWorld-2010#comments</comments> <pubDate>Sun, 12 Sep 2010 23:11:02 +0000</pubDate> <dc:creator>priya</dc:creator> <guid isPermaLink="false">1950 at http://www.bostoneventslist.com</guid> </item> <item> <title>Second Annual Auto-ID & Sensing Seminar & Expo</title> <link>http://www.bostoneventslist.com/event/Second-Annual-Auto-ID-Sensing-Seminar-Expo</link> <description><div class="event-nodeapi"> 10/13/2010 8:45 am EST </div><div> <b>Time: </b>10/13/2010 8:45 am</div> <div class="field field-type-text field-field-type-of-event"> <h3 class="field-label">Event type</h3> <div class="field-items"> Networking, Engineering, General, </div> </div> <div class="field field-type-text field-field-link-to-website"> <h3 class="field-label">Link to website/contact</h3> <div class="field-items"> <div class="field-item"><a href = "http://www.merrimack.edu/academics/science_engineering/ElectricalEngineering/news_events/Pages/RFID.aspx"; target="_blank">http://www.merrimack.edu/academics/science_engineering/ElectricalEngineering/news_events/Pages/RFID.aspx</a></div> </div> </div> <div class="field field-type-text field-field-location"> <h3 class="field-label">Location</h3> <div class="field-items"> <div class="field-item">The Rogers Center at Merrimack College 315 Turnpike St. No. Andover, MA. 01845</div> </div> </div> <div class="field field-type-text field-field-event-organizer"> <h3 class="field-label">Organizer</h3> <div class="field-items"> <div class="field-item">Merrimack College Department of Electrical Engineering MIT Enterprise Forum Auto-ID &amp; Sensing Solutions UMass Lowell Nanomanufacturing Centers</div> <div></div><br /><div> <b>Description</b> </div> <p>Last year we had a great event at MIT with 44 exhibitors of Auto-ID &amp; Sensing solutions. This year's event has an expanded program with noteworthy keynote speakers, interesting panels plus 40 or more exhibitors.</p><p> </p> <p>This is a VERY low cost way to network with others and keep up with new developments. It will be FUN, INFORMATIVE and INTERACTIVE. Please join us!</p><p> </p> <p>Our Keynote Speakers are:<br /> Marty Meehan, Chancellor of University of Massachusetts Lowell<br /> Mark Russell, Raytheon's VP of Engineering<br /> Craig Casto, Dow Chemical, Global Leader RFID, GPS, AutoID Expertise Center</p><p> </p> <p>Also featured are three panel discussions: </p><p> </p> <p>Marketing Successes and Pitfalls<br /> Tom Coyle, Moderator<br /> Roger Bridgeman, President of Bridgeman Communications<br /> Mike Liard, ABI Research<br /> Alan Sherman, Director of Marketing, OATSystems<br /> Liz Churchill, Director, Marketing &amp; BusDev, Bilcare Technologies</p><p> </p> <p>Auto-ID Implementations<br /> Prof. Ram Nagarajan, University of Massachusetts Lowell<br /> Prof. Charles Kochakian, Merrimack College<br /> Steve Miles, MIT Enterprise Forum</p><p> </p> <p>Investing in Auto-ID &amp; Sensing<br /> John Greaves, Moderator<br /> Ron Wagner, Sivix<br /> John Chalus, Vice President, Kinetic Advisors, LLC<br /> Registration Details </p><p> </p> <p>Conference Cost (includes lunch)<br /> $25 "Early Bird" - register by 10/1<br /> $35 Through 10/12<br /> $40 "On Site" registration, 10/13<br /> Students attend free of charge: however, students MUST register online prior to the day of the conference.</p><p> </p> <p>Exposition Table Cost<br /> $200 "Early Bird" - register by 10/1<br /> $250 Through 10/12<br /> Purchase up to five conference registrations for the discounted rate of $20/registration. NOTE: offer is good at time of table rental.</p>