[pedevel] infile settings

  • From: Rainer Riedl <mlist@xxxxxxxxx>
  • To: pedevel@xxxxxxxxxxxxx
  • Date: Tue, 30 May 2006 19:48:19 +0200

Hi,

let's see if here's still someone listening :-)

I have the problem that I often work with files with other encodings than 
UTF-8 that are mounted via CIFs. So of course there aren't any attributes 
and pe always needs to be told about the used encoding after loading. That 
really annoyed me and I was looking for a fast solution other than writing 
some encoding detection which doesn't seem to be a trivial task.

So I came up with the idea of putting the setting anywhere into the first 
line of the file. So eg. the following file would be loaded with 
ISO-8859-15 encoding used:

        <?php # [pe:ENC=ISO-8859-15]
        ...

The format is case insensitive, all encodings used by pe are allowed, just 
replace spaces with "-". If there's interest the format could be extended 
for aoter settings later, eg.

        [pe:ENC=DOS-437,TAB=4]

could set the tab width to 4 spaces.

Please tell me what you think! If you like it I will try to find out how 
commit works in svn and put the changes into the tree where it can stay 
until a real encoding detection comes up.



The changed code is in CDocIO.cpp, if someone wann give it a try:


static int32 DetermineEncoding(const BString& str)
{
        /* HACK: Get the first line and see if there's something like 
          "[PE:ENC=<encoding>]" in there. <encoding> is supported encodings
          with spaces replaced by "-", eg.: "ISO-8859-15" */
        int32                   pos;
        BString                 line;
        int32                   enc_id = -1;
        BString                 enc_name;
        CEncodingRoster enc_roster;

        // Get the first line
        if ((pos = str.FindFirst('\n')) != B_ERROR ||
            (pos = str.FindFirst('\r')) != B_ERROR)
        {
                str.CopyInto(line, 0, pos);
                // Cut down to begin of magic identifier, if there
                if ((pos = line.IFindFirst("[PE:")) != B_ERROR)
                {
                        line.Remove(0, pos+4);
                        // Find end of settings and cut the rest off
                        if ((pos = line.FindFirst(']')) != B_ERROR)
                        {
                                line.Remove(pos, line.Length()-pos);
                                // Check supported encodings
                                // (sofar no other settings allowed)
                                while (enc_roster.IsSupportedEncoding(++enc_id))
                                {
                                        enc_name = 
enc_roster.EncodingNameByIdx(enc_id);
                                        enc_name.ReplaceAll(' ', '-');
                                        enc_name.Prepend("ENC=");
                                        if (line.ICompare(enc_name) == 0)
                                        {
                                                return enc_id;
                                        }
                                }
                        }
                }
        }
        return B_UNICODE_UTF8;
}



Other related posts: