[tssg-tech] Re: <link> "Working with XML on Android"

  • From: "Jim Cant" <cant_jim@xxxxxxxxxxx>
  • To: <tssg-tech@xxxxxxxxxxxxx>
  • Date: Tue, 28 Sep 2010 22:50:51 -0400

Bummer - not well formed.  I suppose in that case that your suggestion of 
preprocessing might be the way to go; I supposed it reasonable to expect that 
if the feed is not well formed, at least it's not well formed in a consistent 
manner.

The best of all possible worlds would be if BEL developed a XML schema for an 
event (or list of same) and then passed a conforming XML document as the 
content of the <description> element of the RSS.

I don't know how much sway we have; I would doubt that we'd find ourselves in 
the best of all possible worlds.  But maybe if we gave them a list of errors 
such as you discovered, they might fix their feed so it delivers a well-formed 
document.  A question for our leader.

jim



From: Beatrice W. Chaney 
Sent: Tuesday, September 28, 2010 5:57 PM
To: tssg-tech@xxxxxxxxxxxxx 
Subject: [tssg-tech] Re: <link> "Working with XML on Android"


I extracted and examined a sample entry manually with an editor (*), and it 
appears that the 'Organizer' item has a bug in it, it always has a <div> 
instead of a </div>, so this would throw off a parser because of dangling or 
rather missing </div>.
See attached, first occurrence is at line 32

What sway do we have over BostonEventsList developers? and should we rely on 
content being well-formed, or pre-process it ourselves just in case?

Bea

(*) I use Notepad++, http://notepad-plus-plus.org/ which shows matching 
elements when you click on them.

Jim Cant wrote:

  I agree with this analysis of the problem.

  If I recall correctly, all the &..; strings to escape characters that are 
meaningful in XML (<>&) occur in the content of the <description> element of 
the <RSS>.  The approach of processing this chunk of the RSS with a regular 
expression to turn it into XML which we can then parse seems like the first 
thing to try.

  The Java String class provides a method
          replaceAll(String regex, String replacement) 
  which
            Replaces each substring of this string that matches the given 
regular expression with the given replacement.
  that may be of help to whomever undertakes this.

  jim

------------------------------------------------------------------------------
  Date: Tue, 28 Sep 2010 17:09:59 -0400
  From: bwchaney@xxxxxxxx
  To: tssg-tech@xxxxxxxxxxxxx
  Subject: [tssg-tech] Re: <link> "Working with XML on Android"

  A few explanations:
  1. The http://validator.w3.org/ site is owned by the W3C  standards 
organization (World Wide Web Consortium), which manages the HTML and XML 
industry standards, as a service to web developers.
  To validate a site just click the above link and enter the URL of the site 
you would like to validate, in this case http://www.bostoneventslist.com/ and 
click on 'Check'. It comes up with 70 errors. 
  To see what those errors mean, go to http://www.bostoneventslist.com/ and 
choose the menu View -> Page Source. This displays the actual HTML generated by 
the site and you can see the errors (validator gives line numbers)
  Most of the validation complaints are non-conformant XHTML syntax (its header 
specifies 'strict') but some are mis-matched end tags such as end tags 
</script> found without a preceding <script> tag, etc...

  2. Unfortunately, the fact that the RSS validates (it does) does not mean 
that the content validates, as the RSS format just wraps the content with all 
the < and >, etc.. converted to &lt; and &gt; (the control characters are 
'escaped') precisely to avoid  being thrown off if the content is invalid. RSS 
feeds must validate. 
  To get the XML format of the content fragments, we first have to run them 
through a 'regular expression' 
(http://en.wikipedia.org/wiki/Regular_expression) that replaces  the &lt;,  
&gt;, etc... with < and > , ... again, and then try to parse these fragments as 
XML.

  3. To view the RSS XML, just enter the URL:  
http://www.bostoneventslist.com/rss.xml in your browser.

  4. The fact that the http://www.bostoneventslist.com/  does not validate is 
not a direct cause of a potential issue with the content items format, as it 
appears content items are generated dynamically (do not show up in the source). 
So, we still need to determine whether the unescaped content items  are 
well-formed, and if not 'tweak' them to be well-formed. While RSS is guaranteed 
to validate, I don't believe we can rely on content (that is, the 
<description></description> elements) being well-formed.
  TODO: write or find an 'unescape' regular expression.

  Bea

  Jim Cant wrote:

    Hey, good news!

    How did you validate it?

    jim


----------------------------------------------------------------------------
    Date: Tue, 28 Sep 2010 11:11:08 -0700
    From: jcarwellos@xxxxxxxxx
    Subject: [tssg-tech] Re: <link> "Working with XML on Android"
    To: tssg-tech@xxxxxxxxxxxxx

          BTW, the RSS validated with no errors.


----------------------------------------------------------------------
          Julie (Dingee) Carwellos
          Web and IT Project Analyst, User Experience and Interaction Designer
          LinkedIn - http://www.linkedin.com/in/jdingeecarwellos

          --- On Tue, 9/28/10, Julie Carwellos <jcarwellos@xxxxxxxxx> wrote:


            From: Julie Carwellos <jcarwellos@xxxxxxxxx>
            Subject: [tssg-tech] Re: <link> "Working with XML on Android"
            To: tssg-tech@xxxxxxxxxxxxx
            Date: Tuesday, September 28, 2010, 6:06 PM


                  Bea,

                  It isn't; I get a consistent 11 errors for each single-event 
web page (using FireBug to validate HTML).

                  Additionally, each event page is styled with TABLEs, rather 
than floating DIVs, so we can't use a handheld.css style sheet to load the URL 
into a WebView and have only the event information display (using display:none; 
for the outer columns). 

                  -julie


--------------------------------------------------------------
                  Julie (Dingee) Carwellos
                  Web and IT Project Analyst, User Experience and Interaction 
Designer
                  LinkedIn - http://www.linkedin.com/in/jdingeecarwellos

                  --- On Tue, 9/28/10, Beatrice W. Chaney <bwchaney@xxxxxxxx> 
wrote:


                    From: Beatrice W. Chaney <bwchaney@xxxxxxxx>
                    Subject: [tssg-tech] Re: <link> "Working with XML on 
Android"
                    To: tssg-tech@xxxxxxxxxxxxx
                    Date: Tuesday, September 28, 2010, 4:18 PM


                    Hi,
                    I suspect (but haven't verified it) that the 
BostonEventList data might possibly not be well-formed.
                    I ran the site through the W3 validator 
http://validator.w3.org/ some time ago (and again now), and it comes up with a 
number of errors. Having a site be valid XHTML is a critical prerquisite to 
getting on top of Google's list.

                    If this is the case (first, need to verify that 
well-formedness is really the problem) there are tidy-up utilities available, 
but we'd have to see whether they are suitable for Android. 

                    Thanks,
                    Bea

                    Harry Henriques wrote:

                      Hello,

                      I think Bea referenced the IBM website regarding RSS 
parser alternatives.  I downloaded the application from the website, and 
massaged the files.  I was able to get the application to successfully create 
an apk and load successfully into the Android Emulator.  The application is 
partially working, but I could use some help debugging it.  The application 
doesn't parse the BostonEventsList.  For some reason, it stops before 
displaying a ListView.

                      I delivered the work I have finished to the SVN 
Repository in a Android project called MessageList.

                      I will continue to work on it as time permits.  I've only 
just begun to fight.

                      Regards,
                      Harry Henriques
                      Java Developer
                 

         

    = 
  = 


--------------------------------------------------------------------------------


<div class="event-nodeapi">
  10/14/2010 12:00 am    EST  Ending: 10/15/2010 4:30 pm </div>
<div>
<b>Time: </b>10/14/2010 12:00 am Ending : 10/15/2010 4:30 pm1</div>

<div class="field field-type-text field-field-type-of-event">
  <h3 class="field-label">Event type</h3>
  <div class="field-items">
          Engineering,
          General,
      </div>
</div>

<div class="field field-type-text field-field-link-to-website">
  <h3 class="field-label">Link to website/contact</h3>
  <div class="field-items">
      <div class="field-item"><a href = 
"http://www.portsmouthchamber.org/ecoast/local-news-events/tech-world2010.asp"; 
target="_blank">http://www.portsmouthchamber.org/ecoast/local-news-events/tech-world2010.asp</a></div>
  </div>
</div>

<div class="field field-type-text field-field-location">
  <h3 class="field-label">Location</h3>
  <div class="field-items">
      <div class="field-item">Pease International Tradeport in Portsmouth</div>
  </div>
</div>

<div class="field field-type-text field-field-event-organizer">
  <h3 class="field-label">Organizer</h3>
  <div class="field-items">
      <div class="field-item">NH High Technology Council</div>
  <div></div><br /><div>
<b>Description</b>
</div>
<p>    *  Two day conference targeted to companies that design, develop,and 
deliver technology.<br />
    * Three separate tracks designed to attract technical, business, and 
education/employment.<br />
    * Exclusive exhibit space.<br />
    * Held at the Celestica builiding at Pease Tradeport in Portsmouth, NH.<br 
/>
    * Presentation of annual InfoXchange Awards in three categories presented 
by NH Governor John Lynch.<br />
    * Networking reception for attendees, meet the technology innovators of the 
seacoast.</p><p>
</p>

<p>For more information or if interested in speaking or sponsoring 
opportunities for TechWorld 2010 please contact Salina McIntire at <a 
href="mailto:ecoast@xxxxxxxxxxxxxxxxxxxxx";>ecoast@xxxxxxxxxxxxxxxxxxxxx</a> or 
call 603-610-5514.</p><p>
</p>

<p>Day 1: Oct. 14th<br />
12:00 - 1:00pm NH High Tech Luncheon<br />
12:00 - 5:00pm Exhibit opens<br />
1:00 - 2:15pm Keynote Speaker<br />
2:15 - 5:00pm Speakers &amp;amp; demonstrations<br />
5:00 - 6:30pm Cocktail Party<br />
6:30 - 8:30pm NH High Technology Council Awards Dinner<br />
8:30 - 11:00pm After Party</p><p>
</p>

<p>      Day 2: Oct. 15th<br />
      9:00am - 4:00pm Expo opens<br />
      9:30 - 10:30am Success Stories and demonstrations<br />
      10:30 - 11:15am Speaker George Bald. DRED: Why stay, work, and play in 
NH?<br />
      12:00 - 2:00pm Keynote Luncheon<br />
      2:00 - 4:30pm Emerging Technologies and breakout sessions</p><p>
</p>

<p>Cost:  $20 - $80</p>
</description>
 
<comments>http://www.bostoneventslist.com/event/TechWorld-2010#comments</comments>
 <pubDate>Sun, 12 Sep 2010 23:11:02 +0000</pubDate>
 <dc:creator>priya</dc:creator>
 <guid isPermaLink="false">1950 at http://www.bostoneventslist.com</guid>
</item>
<item>
 <title>Second Annual Auto-ID &amp; Sensing Seminar &amp; Expo</title>
 
<link>http://www.bostoneventslist.com/event/Second-Annual-Auto-ID-Sensing-Seminar-Expo</link>
 <description><div class="event-nodeapi">
  10/13/2010 8:45 am    EST  </div><div>
<b>Time: </b>10/13/2010 8:45 am</div>

<div class="field field-type-text field-field-type-of-event">
  <h3 class="field-label">Event type</h3>
  <div class="field-items">
          Networking,
          Engineering,
          General,
      </div>
</div>

<div class="field field-type-text field-field-link-to-website">
  <h3 class="field-label">Link to website/contact</h3>
  <div class="field-items">
      <div class="field-item"><a href = 
"http://www.merrimack.edu/academics/science_engineering/ElectricalEngineering/news_events/Pages/RFID.aspx";
 
target="_blank">http://www.merrimack.edu/academics/science_engineering/ElectricalEngineering/news_events/Pages/RFID.aspx</a></div>
  </div>
</div>

<div class="field field-type-text field-field-location">
  <h3 class="field-label">Location</h3>
  <div class="field-items">
      <div class="field-item">The Rogers Center at Merrimack College
315 Turnpike St. No. Andover, MA. 01845</div>
  </div>
</div>

<div class="field field-type-text field-field-event-organizer">
  <h3 class="field-label">Organizer</h3>
  <div class="field-items">
      <div class="field-item">Merrimack College Department of Electrical 
Engineering
MIT Enterprise Forum Auto-ID &amp;amp; Sensing Solutions
UMass Lowell Nanomanufacturing Centers</div>
  <div></div><br /><div>
<b>Description</b>
</div>
<p>Last year we had a great event at MIT with 44 exhibitors of Auto-ID 
&amp;amp; Sensing solutions. This year&#039;s event has an expanded program 
with noteworthy keynote speakers, interesting panels plus 40 or more 
exhibitors.</p><p>
</p>

<p>This is a VERY low cost way to network with others and keep up with new 
developments. It will be FUN, INFORMATIVE and INTERACTIVE. Please join 
us!</p><p>
</p>

<p>Our Keynote Speakers are:<br />
Marty Meehan, Chancellor of University of Massachusetts Lowell<br />
Mark Russell, Raytheon&#039;s VP of Engineering<br />
Craig Casto, Dow Chemical, Global Leader RFID, GPS, AutoID Expertise 
Center</p><p>
</p>

<p>Also featured are three panel discussions:      </p><p>
</p>

<p>Marketing Successes and Pitfalls<br />
Tom Coyle, Moderator<br />
Roger Bridgeman, President of Bridgeman Communications<br />
Mike Liard, ABI Research<br />
Alan Sherman, Director of Marketing, OATSystems<br />
Liz Churchill, Director, Marketing &amp;amp; BusDev, Bilcare Technologies</p><p>
</p>

<p>Auto-ID Implementations<br />
Prof. Ram Nagarajan, University of Massachusetts Lowell<br />
Prof. Charles Kochakian, Merrimack College<br />
Steve Miles, MIT Enterprise Forum</p><p>
</p>

<p>Investing in Auto-ID &amp;amp; Sensing<br />
John Greaves, Moderator<br />
Ron Wagner, Sivix<br />
John Chalus, Vice President, Kinetic Advisors, LLC<br />
Registration Details  </p><p>
</p>

<p>Conference Cost (includes lunch)<br />
$25 "Early Bird" - register by 10/1<br />
$35  Through 10/12<br />
$40 "On Site" registration, 10/13<br />
Students attend free of charge: however, students MUST register online prior to 
the day of the conference.</p><p>
</p>

<p>Exposition Table Cost<br />
$200  "Early Bird" - register by 10/1<br />
$250  Through 10/12<br />
Purchase up to five conference registrations for the discounted rate of 
$20/registration. NOTE: offer is good at time of table rental.</p>

Other related posts: