[phorm] Re: URL Validation : How?

  • From: "Rick working on the Africa Database Project" <africaproject@xxxxxxxxxxxxxxxxx>
  • To: phorm@xxxxxxxxxxxxx
  • Date: Sat, 27 Sep 2003 05:23:40 +0100

On 26 Sep 2003 at 20:38, Rick working on the Africa Da wrote:
> Sorry -- it wasn't very clear!
Perhaps still not clear enough ... !!!???

How about this ...

After trimming, the user input string should contain no whitespace.

The string should now be split into 3 sections:

1.      Scheme, eg: (http|https|ftp), followed by ://

2.      A run of arbitrary case-insensitive strings separated by single
        dots, and ending with a gTLD or CC, or a gTLD/CC followed by a
        colon and port number, eg:
                mydomain.com
                www.MyDomain.net.de
                www.things.morethings.mydomain.info
                search.mydomain.org:80

3. Begins with a forward slash, followed by nothing or almost anything

The third section could be a path to a directory or file, eg:
        /
        /index.html
        /stuff/
        /stuff/page.html
followed by a question mark ? and a query/parameter string, eg:
        /stuff/?id=this&q=that
        /?id=this;q=that


function parse_url http://uk.php.net/manual/en/function.parse-url.php
will break a string into 8 possible components:
        scheme, host, port, user, pass, path, query, fragment

For our purposes, we might want to reject anything that contained
user, password or fragment -- but our main requirement is that the
individual components and the string as a whole should formally be a
valid URL.

We don't want to correct the users' input, for example by url-encoding
non-alphanumeric characters in a query section -- we want to tell them
that their input is not a valid URL, and ask them to correct it.

Here are some examples of error input we want to catch ...
        htp://www.mydomain.com
        http//www.mydomain.com
        www.mydomain.com
        http://www . mydomain. com
        http://www,mydomain.com
        http://www.mydomain
        http://www.mydomain.com.index.html
        http://www.mydomain.com/my info.html


-------------------------------------------------
You are receiving this message because you are subscribed to the Phorm mailing 
list. To send messages to the mailing list, simply send email to 
phorm@xxxxxxxxxxxxx from the address you have subscribed. You may unsubscribe 
from the list by sending email to phorm-request@xxxxxxxxxxxxx with 
'unsubscribe' in the SUBJECT field.

Other related posts: