RE: Announcing Encoding utility

  • From: "DaShiell, Jude T. CIV NAVAIR 1490, 1, 26" <jude.dashiell@xxxxxxxx>
  • To: <programmingblind@xxxxxxxxxxxxx>
  • Date: Thu, 19 Aug 2010 13:27:16 -0400

The file command could have been used so long as email wasn't being
saved in mbox format.  The msg format if I remember correctly leaves
each email message in its own file in a directory off the user's $HOME
directory.  It probably would have gone through all those multiple
extensions like a blow torch through butter.  In that kind of set up, I
think a command like cat * | file >report.mail could have gotten a
report out on all of the single msg emails that would have come in.

-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx
[mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of katherine
Moss
Sent: Thursday, August 19, 2010 13:11
To: programmingblind@xxxxxxxxxxxxx
Subject: RE: Announcing Encoding utility

Does that also compare to most of the 2001/2002 email malware when it
had a
double extension?

-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx
[mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of DaShiell,
Jude
T. CIV NAVAIR 1490, 1, 26
Sent: Thursday, August 19, 2010 7:43 AM
To: programmingblind@xxxxxxxxxxxxx
Subject: RE: Announcing Encoding utility
Importance: Low

You may find such a package useful the first time you get handed a bunch
of data files on which to work.  Confidence building measures can be to
check encoding on all files to make sure the files are as described to
you.  That way you know what tool or tools to use to best handle the
files be it for searches or modification.  Just because a file has a
.csv extension on it doesn't at all guarantee its contents are actually
comma separated values; it could be an executable or actually html or
any number of other formats.  In the Linux world, one of the lighter
versions of the tools we have is called file.  You can run file prog.csv
and get back information about prog.csv and know before going any
further with the file.

-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx
[mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Alex Midence
Sent: Wednesday, August 18, 2010 22:36
To: programmingblind@xxxxxxxxxxxxx
Subject: Re: Announcing Encoding utility

Interesting.  I can't personally conceive of when I would need such an
app as of yet but, I'm new to developing so that may change.  It's
still rather intriguing, however.

Alex M

On 8/18/10, Jamal Mazrui <empower@xxxxxxxxx> wrote:
> Now available at
> http://EmpowermentZone.com/Encoding.zip
>
> Encoding
> Version 1.0
> August 8, 2010
> Copyright 2010 by Jamal Mazrui
> GNU Lesser General Public License (LGPL)
> ----------
>
> Contents
>
> Description
> Installation
> Operation
> Development Notes
> ----------
>
> Description
>
> Encoding is a free, open source, command-line utility for performing
> encoding-related operations on files.  It can show the encoding of
files,
> and convert between different encodings.  Batch operations are
supported
> if wildcard characters are used in the file specification.  The
> executable, Encoding.exe, should run on any version of Windows.  The
> source code, Encoding.py, should run on other platforms as well.
>
> An encoding is an agreement about how to represent textual characters
with
> computer bytes.  Characters are encoded as byte sequences that may be
> stored in disk files or computer memory.  A byte stream is decoded to
> produce characters in a human language.  If a text file is not
readable,
> the reason may be that it has an encoding that was either not
recognized
> or not decoded properly.  This utility may help with such issues,
> benefiting software developers or end users.  It works with over a
hundred
> character encodings.
> ----------
>
> Installation
>
> Unarchive Encoding.zip into a directory, e.g., into
> C:\Encoding
>
> Run Encoding.exe at a command prompt, e.g., one created by entering
> cmd.exe
>
> at the Windows Start/Run prompt.
>
> Since Encoding is developed in a cross-platform language, Python, it
> should also be possible to run the source code, Encoding.py, on other
> platforms that have a Python interpreter.
> ----------
>
> Operation
>
> The complete command-line syntax of Encoding is as follows:
>
> Encoding.exe TaskName FileSpec SourceEncoding TargetEncoding
>
> Some parameters are optional or not applicable depending on the name
of
> the task.  Typing the .exe extension is optional.  Capitalization does
not
> matter in task or encoding names .  The following tasks are supported,
> illustrated with example parameter values:
>
> encoding help
>
> provides a help summary.  The help parameter is assumed if no other
valid
> task name is entered.
>
> encoding default
>
> provides the default language and encoding of the computer, e.g.,
> en-us cp1252
>
> which means U.S. English using code page 1252.
>
> encoding show *.txt
>
> provides the encoding of all files meeting the *.txt specification.
If a
> file has a Unicode byte order mark (BOM), the encoding can be exactly
> determined.  Otherwise, the encoding is huristically detected by
analyzing
> various factors.  This is the same algorithm used by the Firefox web
> browser to detect the encoding of text.  It is usually correct, but
not
> always.
>
> encoding convert *.txt utf-8b
>
> converts all *.txt files to UTF-8 encoding with a BOM.  Use utf-8n to
get
> utf-8 without a BOM, which is the norm on Linux and the Mac.  For ease
of
> typing, the dash character (-) is optional, so utf8b or utf8n may be
used
> instead.  Note that these are not official encoding names, but
conventions
> to help clarify whether utf-8 is being encoded with or without a BOM.
> Some Windows programs prefer one, while others do not.
>
> encode convert *.txt utf8n utf8b
>
> converts *.txt files to UTF8 with a BOM.  In this case, both a source
and
> target encoding are specified.  Rather than detecting the source
encoding,
> it is treated as UTF-8 without a BOM.
>
> If the word 'backup' rather than 'convert' is used for the task, the
> original files will be backed up with the same names except for the
> addition of a .bak extension.
>
> encode url http://python.org
>
> provides encoding information about the web page at that address.
> Encoding references are sought in the server response headers and meta
> data of the page.  A conflict between encoding references is reported.
>
> encoding bytes *.txt
>
> provides a list of numeric byte values, one per line, for all files
> matching the pattern.  The first line is the file name.  This is
probably
> most useful when analyzing a single source file, and when redirecting
> standard output to another file that may be examined in an editor,
e.g.,
> encoding bytes test.txt >temp.txt
>
> encoding chars temp.txt >test.txt
>
> provides output in a similar form except that each line shows
information
> about a character rather than a byte (Unicode can represent a
character
> with multiple bytes).  Each line has the Unicode name of the
character,
> its numeric code point, and an ASCII equivalent of the character if
> available and different from the original character.  For example, the
> ellipses symbol  has the code point U2026, and an ASCII equivalent of
> three consecutive periods (...), so it would appear as
> HORIZONTAL ELLIPSIS 8230 ...
>
>
> Add a SourceEncoding parameter to specify the file's encoding
directly,
> rather than auto-detect it.
> ----------
>
> Development Notes
>
> The Encoding utility is developed with the Python 2.5 language from
> http://python.org
>
> The following built-in packages are used:  codecs, glob, locale, os,
> shutil, sys, and unicodedata.
>
> The following third-party packages are used:
>
> chardet -- Universal encoding detector
> http://chardet.feedparser.org
>
> encutils -- Encoding detection collection for Python
> http://cthedot.de/encutils/
>
> py2exe -- Build standalone executables for Windows
> http://py2exe.org
>
> unidecode -- Unicode transliteration in Python
>
http://www.tablix.org/~avian/blog/archives/2009/01/unicode_transliterati
on_in_python/
>
> The batch file, RunSetup.bat, runs the py2exe script, setup.py, to
create
> the stand-alone executable, Encoding.exe.
>
> I welcome feedback, suggestions, and code contributions, which will
help
> this project improve over time.
>
> __________
> View the list's information and change your settings at
> //www.freelists.org/list/programmingblind
>
>
__________
View the list's information and change your settings at 
//www.freelists.org/list/programmingblind

__________
View the list's information and change your settings at 
//www.freelists.org/list/programmingblind
 

__________ Information from ESET NOD32 Antivirus, version of virus
signature
database 5379 (20100819) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__________ Information from ESET NOD32 Antivirus, version of virus
signature
database 5379 (20100819) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

__________
View the list's information and change your settings at 
//www.freelists.org/list/programmingblind

__________
View the list's information and change your settings at
//www.freelists.org/list/programmingblind

Other related posts: