Re: regex for base64 encoded string

  • From: Mark Brinsmead <mark.brinsmead@xxxxxxx>
  • To: sarath_kumar0@xxxxxxxxx
  • Date: Wed, 11 Jan 2006 21:56:26 -0700

Okay, I've never had an excuse to play with Oracle regular expressions (although they look pretty much like the old regexp's we've known and loved for decades), so here's a try:

'(^Content-Transfer-Encoding: base64$)(^$)(^(([A-Za-z0-9+/=]){4}){1,19}$)*(^$)'

I know that this is not *quite* precise, but it's pretty close, I think. Here is what I *intended* to describe with this pattern:

* One "header" line, containing the literal text "Content-Transfer-Encoding: base64".
* Exactly 1 (one) blank line
* Zero or more lines of data, each consisting of:
* 1 to 19 quartets of characters from the alphabet A-Z, a-z, +, /, or =
Note that the '=' character is really a special case -- it can only appear as
the last zero to two characters of the last quartet in the message, but I didn't feel
like trying to describe *that* rule within the same regexp...
Actually, I didn't see anything in the specification limiting output to 19
quartets per line, but that *does* seem to be the convention...
* Exactly 1 (one) blank line. (Actually, the "UUENCODE" standard seems
to call for a final line of the form "====", but that is obviously not what you
are looking at...


For convenience and readability (?) I've encoded each of elements described above as
a subexpression.


To use this you'll need to do:
REGEXP_LIKE(my_column, '(^Content-Transfer-Encoding: base64$)(^$)(^(([A-Za-z0-9+/=]){4}){1,19}$)*(^$)', 'm');


The final 'm' parameter tells REGEXP_LIKE to break the source string into (treat the source string as) multiple lines.

Sorry, I haven't actually tested this. If you'd care to and let me know how far off I was, I guess I wouldn't mind knowing. ;-)

By the way, if you are interested in the specification for BASE64 encoding, you can find it in the Linux manpage for "uuencode", or (I think) in IEEE Standard 1003.1-2001, Section 12.2. (That should be the formal standard for the UUENCODE utility.) Note that the header you quoted does not comply with this standard but rather (I think) with the standard for MIME-encoding in e-mail messages. Still, I am reasonably sure that the specification for the actual message body remains about the same.

Cheers,
-- Mark.

P.S., I'm kind of curious as to why you wanted a REGEX for this. Chances are for most purposes, the expression MY_COLUMN LIKE '^Content-Transfer_Encoding: base64%' would suffice, wouldn't it? Is it really necessary to verify that the encoding itself is valid? Or were you interested in how to *decode* the string? (You can learn that from the UUDENCODE manpage, too.)

sarath kumar wrote:

I got a column with the following data
Content-Transfer-Encoding: base64

U2VsZWN0IFRoaXMgaXMgZmlmdGg7DQogDQpOT1RFOiBUaGUgaW5mb3JtYXRpb24gY29udGFpbmVk
IGluIHRoaXMgZW1haWwgbWVzc2FnZSBpcyBjb25zaWRlcmVkIGNvbmZpZGVudGlhbCBhbmQgcHJv
cHJpZXRhcnkgdG8gdGhlIHNlbmRlciBhbmQgaXMgaW50ZW5kZWQgc29sZWx5IGZvciByZXZpZXcg
YW5kIHVzZSBieSB0aGUgbmFtZWQgcmVjaXBpZW50LiAgQW55IHVuYXV0aG9yaXplZCByZXZpZXcs
IHVzZSBvciBkaXN0cmlidXRpb24gaXMgc3RyaWN0bHkgcHJvaGliaXRlZC4gSWYgeW91IGhhdmUg
cmVjZWl2ZWQgdGhpcyBtZXNzYWdlIGluIGVycm9yLCBwbGVhc2UgYWR2aXNlIHRoZSBzZW5kZXIg
YnkgcmVwbHkgZW1haWwgYW5kIGRlbGV0ZSB0aGUgbWVzc2FnZS4NCg==

Does any one know a regex for the base64 encoded
string.

TIA
sarath

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --
//www.freelists.org/webpage/oracle-l









--
//www.freelists.org/webpage/oracle-l


Other related posts: