
|
[arachne]
||
[Date Prev]
[04-2007 Date Index]
[Date Next]
||
[Thread Prev]
[04-2007 Thread Index]
[Thread Next]
[arachne] Spaces in URLs
- From: Matt <mlewandowsky@xxxxxxxxxxxxx>
- To: arachne@xxxxxxxxxxxxx
- Date: Sat, 21 Apr 2007 22:46:32 -0700 (PDT)
Arachne at FreeLists---The Arachne Fan Club!
Glenn,
First and foremost, I appreciate your fix. Unfortunately, you got a bit
overexuberant, I fear. :(
As per RFC 2396, spaces MUST be escaped as %20 to be valid in a URI. Section
2.4.3 explains why. Your test page kinda demonstrates why, without realizing
it... ;) RFC 2616 provides no further restrictions on spaces in URIs. The only
mention of spaces in URIs in HTML attributes is in HTML 4.01, section 6.2, as
was quoted to you by Joe. The stripping in that case is optional, but is
common. Else <a href=" http://www.mysite.com/ "> would always break since
you're unlikely to have a resource at "%20http://www.mysite.com/%20". :) And,
of course, if you ever strip whitespace, you should do it consistently, not
just "before any http link" or some other weird rules.
So, based on the information from the specs, *all* your fix should do is to
remove leading and trailing whitespace from the attribute values. It should not
remove spaces embedded in URLs. It's perfectly valid to have a directory named
" Files" and use an href like this, <link href="%20Files">, pointing to it.
However, <link href=" Files"> MAY (standards-definition) link to "Files", per
the HTML specs. (And probably will in practically any browser.) Of course, it's
perfectly valid to send someone off to " Files/" even though most browsers
won't do that.
So, as long as the white-space token to be removed is at the beginning or end
of a CDATA value, there's nothing wrong with it (from the view of the standards
or common practice). However, stripping spaces from within URIs rather than
escaping them isn't justifiable from any specs I'm familiar with.
In any case, it seems that this fix should be done elsewhere in the code than
the URL parser... But in the case of meta refresh, perhaps the URL parser is
the correct place. (Since meta refresh is an odd exception to most rules... ;) )
Hope to have shed some light on this. And thanks for actually taking the time
to fix the space issue. I recall first noticing it in 1.4 or so (actually,
probably earlier, but I've no way to be sure anymore) and I've been manually
fixing offending URLs since. :) So, over the next year, you may save me a full
ten minutes. Multiply that by everyone who never had it dawn on them that not
implementing a "MAY" in a spec may be taken as a bug, and there's some manhours
which will now be better spent! :)
Now that I've gotten my piece in, I'll head to sleep. :)
--Matt
P.S. I've been inactive in the Arachne community for quite some time, but I've
been happily following its developments regardless. The problems this
particular "fix" may cause over time urged me to delurk momentarily. :) But
feel free to poke me regarding edge cases like this. I've practically memorized
RFCs 2616 and 2396 as well as the HTML 4.01 spec. :)
P.P.S. If anyone's counting, my last Arachne delurk was December 02, 2005...
Arachne at FreeLists
-- Arachne, The Premier GPL Web Browser/Suite for DOS --
|

|