[hashcash] Re: new format tweak coming...

  • From: Ben Laurie <ben@xxxxxxxxxxxxx>
  • To: hashcash@xxxxxxxxxxxxx
  • Date: Sun, 14 Mar 2004 12:39:33 +0000

Hubert Chan wrote:

>>>>>>"Justin" == Justin Mason <jm@xxxxxxxxxx> writes:
> 
> 
> Justin> Oh good, you'd be happy to provide a regex that'll do the above
> Justin> left-to-right scan to cope with
> 
> Justin>   foo:"bar \\\\\\\\\\\\\\\":baz":blargh
> 
> Justin> then.  as far as I know it can be done with perl re's, but I
> Justin> don't know the recipe off-hand ;)
> 
> What do you want the regexp to do?  Just match the value part of the
> string?  `"([^\"]|\\.)*"' should do the trick.  (Or something like that,
> depending on what needs to be escaped, etc.)  No Perl extensions
> needed.  Just straight regexp.
> 
> Breaking it down:
> Start with the first quote: "
> (
>   match anything other than a backslash or quote: [^\"]
>   (I think that a backslash within [ ] doesn't usually need to be
>   escaped, but I'm not sure.)
>     or |
>   match a backslash, followed by any other character: \\.
>   (assuming backslash needs to be escaped here)
> )
> any number of times: *
> And finally the end quote: "
> 
> This is making the simplifying assumption that a backslash followed by
> a character means that the following character is meant to be a
> literal.  If not, it's easy to change the backslash matching part (to
> something like `\\[\"]').
> 
> Not tested.  Just off the top of my head.

Doesn't work, either. You also haven't dealt with : inside the "s.

Before you continue to pursue this line of thought, you might like to 
consider why there are at least two CPAN modules for parsing this format 
(well, CSV) if its so easy to do? Alternatively, produce a _tested_ 
regex that parses the whole format, not a subset.

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html       http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

Other related posts: