Re: File sort order

On Tue, 05 Oct 2010 09:07:31 -0400
"Prof. John C Nash" <nashjc@xxxxxxxxxx> wrote:

> While the "clever" sorting might be useful to some, I have many hundreds of 
> files in many 
> directories, so a way to use a predictable, if clunky, sort order would be 
> preferred for 
> me. The case that brought this up cost me an hour looking for an important 
> file that was 
> already there, but in a place no indexer would ever think of. Finally found 
> it using 
> command line 'ls'.
> 
> Is there anyone willing to collaborate with me to see if 
> g_utf8_collate_key_for_filename() 
> can be given an option to do that?

Glib's mechanism essentially wraps around string-normalisation, followed by 
calls to [g]libc function wcsxfrm() or (if __STDC_ISO_10646__ is not defined) 
to strxfrm(). The latter are locale-dependant.

The 'filename' variant parses the string into clumps of chars, processes those, 
then joins the results. Your data suggest that there's a problem with the way 
that's working. Essentially, the 4-digit number 2010 is incorrectly ordered 
before longer number 101004. Using g_utf8_collate_key() shows different 
ordering, in accord with your original mc example (below).

So a solution for you might be to revert to that, by editing file 
e2_filelist.c. At about line 813 ...
                if (caseignore)
                {
                        freeme = g_utf8_casefold (buf[FILENAME], -1);
#ifdef USE_GTK2_8
                        buf[NAMEKEY] = g_utf8_collate_key_for_filename (freeme, 
-1);
#else
                        buf[NAMEKEY] = g_utf8_collate_key (freeme, -1);
#endif
                        g_free (freeme);
                }
                else
#ifdef USE_GTK2_8
                        buf[NAMEKEY] = g_utf8_collate_key_for_filename 
(buf[FILENAME], -1);
#else
                        buf[NAMEKEY] = g_utf8_collate_key (buf[FILENAME], -1);
#endif

becomes 

                if (caseignore)
                {
                        freeme = g_utf8_casefold (buf[FILENAME], -1);
                        buf[NAMEKEY] = g_utf8_collate_key (freeme, -1);
                        g_free (freeme);
                }
                else
                        buf[NAMEKEY] = g_utf8_collate_key (buf[FILENAME], -1);


Otherwise, I suspect you'd need to persuade the [g]libc maintainers and/or your 
locale's LC_COLLATE data maintainers that there's a problem. Good luck with 
that ...


 Or else a switch in emelfm2. Afraid I rarely use C or 
> C++, but am fairly good at digging around and trying things.
> 
> JN
> 
> 
> On 10/05/2010 03:40 AM, tpgww@xxxxxxxxxxx wrote:
> > On Mon, 04 Oct 2010 19:26:26 -0400
> > "Prof. John C Nash"<nashjc@xxxxxxxxxx>  wrote:
> >
> >> This is an issue that is not unique to elemfm2. It also afflicts 
> >> gnome-commander and I'd
> >> guess some other apps. It does not affect ls.
> >>
> >> I'm running Ubuntu Lucid 10.04.1 (Gnome). I have a directory with a number 
> >> of files. Here
> >> are a few of the names:
> >>
> >> jnrv_07312010.pdf
> >> jnrv101004.tex
> >> jnrv20100811.ps
> >> jnrv2010a.tex
> >>
> >> But the sorted pane for this directory (and in Gnome-Commander) is
> >>
> >> jnrv2010a.tex
> >> jnrv101004.tex
> >> jnrv20100811.ps
> >> jnrv_07312010.pdf
> >>
> >> Does anyone know of documentation of this? I suspect some strange 
> >> collating sequence rules
> >> that have been embedded somewhere that will be awkward to find. In this 
> >> case, there is
> >> really no problem, but I came across this in a directory with several 
> >> hundred files and
> >> couldn't find one I'd just moved into the directory, at least not for a 
> >> while.
> >
> > Name-sorting relies on comparing strings created with glib functions - by 
> > default, g_utf8_collate_key(), or if built with support for gtk>= 2.8, then 
> > g_utf8_collate_key_for_filename().
> >
> > According to glib documentation, they both "depend on the current locale", 
> > but the latter treats the dot '.' as a special case, and treats numbers 
> > intelligently so that "file1" "file10" "file5" are sorted as "file1" 
> > "file5" "file10".
> >
> > Notwithstanding such claim, quite some time ago we investigated reported 
> > sorting anomalies by playing with the inherent collation-string creation 
> > process, but eventually concluded there's nothing reasonably practicable to 
> > be done to improve it.
> >
> > Regards
> > Tom
> >
> >>
> >> For completeness, mc gets the ordering
> >>
> >> jnrv101004.tex
> >> jnrv20100811.ps
> >> jnrv2010a.tex
> >> jnrv_07312010.pdf
> >>
> >> which increases my suspicions about collate sequence etc.
> >>
> >>
> >> JN
> >>
> >>
> >> --
> >> Users can unsubscribe from the list by sending email to 
> >> emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or 
> >> by logging into the web interface.
> >
> >
> 
> 
> -- 
> Users can unsubscribe from the list by sending email to 
> emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by 
> logging into the web interface.


-- 
Users can unsubscribe from the list by sending email to 
emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by 
logging into the web interface.

Other related posts: