Go to the FreeLists Home Page Home Signup Help Login
 



Browse arachne: This Month's ArchiveMain Archive PageRelated postsPrevious by DateNext by Date

[arachne] Re: Spaces in URLs

  • From: Jason & Ornumar Dodd <jasorn@xxxxxxxxx>
  • To: arachne@xxxxxxxxxxxxx
  • Date: Sun, 22 Apr 2007 19:44:01 -0400
Arachne at FreeLists---The Arachne Fan Club!

Glenn McCorkle wrote:
Arachne at FreeLists---The Arachne Fan Club!

Hi Matt,

Thank you very much for the heads-up.

I did some experimentation in Linux
and it turns-out that you are 100% correct.
(as you already knew) ;-)

Leading spaces in dirnames and filenames _are_ in-fact perfectly legal.

But try as I may, I could not make or rename any dirnames nor filenames
with trailing spaces.

The 'fix' has now been modified to simply remove any erroeous trailing
spaces which we might happen-across during our ventures through the web.

http://www.cisnet.com/glennmcc/badspace.htm
has also been updated to reflect this change.

If however you (or anyone else), points-out that trailing spaces are
in-fact also 'legal' ... the trailing space code will also be removed.

Thank again for showing me the error of my ways. :))
I wasn't very good at making my case the first time but you can also make directories with trailing spaces... at least in linux.
e.g.

Here are some directory listing commands after I created a directory with a trailing space. Notice there is no directory 'trash' but there is a 'trash '. And I created a file in it.

jason@jason-laptop:~$ ls -l 'trash'
ls: trash: No such file or directory

jason@jason-laptop:~$ ls -l 'trash '
total 4
-rw-r--r-- 1 jason jason 35 2007-04-22 19:39 test_file_in_dir_trailing_space.txt

jason@jason-laptop:~$ ls -l /home/jason/trash\ /test*
-rw-r--r-- 1 jason jason 35 2007-04-22 19:39 /home/jason/trash /test_file_in_dir_trailing_space.txt
jason@jason-laptop:~$


__________________________________________________________________________________

On Sat, 21 Apr 2007 22:46:32 -0700 (PDT), Matt <mlewandowsky@xxxxxxxxxxxxx> 
wrote:

Glenn,

First and foremost, I appreciate your fix. Unfortunately, you got a bit
overexuberant, I fear. :(

As per RFC 2396, spaces MUST be escaped as %20 to be valid in a URI. Section
2.4.3 explains why. Your test page kinda demonstrates why, without realizing
it... ;) RFC 2616 provides no further restrictions on spaces in URIs. The only
mention of spaces in URIs in HTML attributes is in HTML 4.01, section 6.2, as
was quoted to you by Joe. The stripping in that case is optional, but is
common. Else <a href=" http://www.mysite.com/ "> would always break since
you're unlikely to have a resource at "%20http://www.mysite.com/%20;. :) And,
of course, if you ever strip whitespace, you should do it consistently, not
just "before any http link" or some other weird rules.

So, based on the information from the specs, *all* your fix should do is to
remove leading and trailing whitespace from the attribute values. It should not
remove spaces embedded in URLs. It's perfectly valid to have a directory named
" Files" and use an href like this, <link href="%20Files">, pointing to it.
However, <link href=" Files"> MAY (standards-definition) link to "Files", per
the HTML specs. (And probably will in practically any browser.) Of course, it's
perfectly valid to send someone off to " Files/" even though most browsers
won't do that.

So, as long as the white-space token to be removed is at the beginning or end
of a CDATA value, there's nothing wrong with it (from the view of the standards
or common practice). However, stripping spaces from within URIs rather than
escaping them isn't justifiable from any specs I'm familiar with.

In any case, it seems that this fix should be done elsewhere in the code than
the URL parser... But in the case of meta refresh, perhaps the URL parser is
the correct place. (Since meta refresh is an odd exception to most rules... ;)

Hope to have shed some light on this. And thanks for actually taking the time
to fix the space issue. I recall first noticing it in 1.4 or so (actually,
probably earlier, but I've no way to be sure anymore) and I've been manually
fixing offending URLs since. :) So, over the next year, you may save me a full
ten minutes. Multiply that by everyone who never had it dawn on them that not
implementing a "MAY" in a spec may be taken as a bug, and there's some manhours
which will now be better spent! :)

Now that I've gotten my piece in, I'll head to sleep. :)

--Matt

P.S. I've been inactive in the Arachne community for quite some time, but I've
been happily following its developments regardless. The problems this
particular "fix" may cause over time urged me to delurk momentarily. :) But
feel free to poke me regarding edge cases like this. I've practically memorized
RFCs 2616 and 2396 as well as the HTML 4.01 spec. :)

P.P.S. If anyone's counting, my last Arachne delurk was December 02, 2005...
             Arachne at FreeLists
-- Arachne, The Premier GPL Web Browser/Suite for DOS --



Arachne at FreeLists -- Arachne, The Premier GPL Web Browser/Suite for DOS --

Other related posts:

  • [arachne] Spaces in URLs
  • [arachne] Re: Spaces in URLs
  • [arachne] Re: Spaces in URLs
  • [arachne] Re: Spaces in URLs
  • [arachne] Re: Spaces in URLs
  • [arachne] Re: Spaces in URLs
  • [arachne] Re: Spaces in URLs
  • [arachne] Re: Spaces in URLs




  • [ Home | Signup | Help | Login | Archives | Lists ]

    All trademarks and copyrights within the FreeLists archives are owned by their respective owners.
    Everything else ©2008 Avenir Technologies, LLC.