Re: File sort order

THanks for this. This appears to be in emelfm2 code, so not requiring upstream 
politics.

I'll give it a try (prob a few weeks) and if successful, post back to see if worth putting into the common code set.

JN


On 10/06/2010 01:34 AM, tpgww@xxxxxxxxxxx wrote:
On Tue, 05 Oct 2010 09:07:31 -0400
"Prof. John C Nash"<nashjc@xxxxxxxxxx>  wrote:

While the "clever" sorting might be useful to some, I have many hundreds of 
files in many
directories, so a way to use a predictable, if clunky, sort order would be 
preferred for
me. The case that brought this up cost me an hour looking for an important file 
that was
already there, but in a place no indexer would ever think of. Finally found it 
using
command line 'ls'.

Is there anyone willing to collaborate with me to see if 
g_utf8_collate_key_for_filename()
can be given an option to do that?

Glib's mechanism essentially wraps around string-normalisation, followed by 
calls to [g]libc function wcsxfrm() or (if __STDC_ISO_10646__ is not defined) 
to strxfrm(). The latter are locale-dependant.

The 'filename' variant parses the string into clumps of chars, processes those, 
then joins the results. Your data suggest that there's a problem with the way 
that's working. Essentially, the 4-digit number 2010 is incorrectly ordered 
before longer number 101004. Using g_utf8_collate_key() shows different 
ordering, in accord with your original mc example (below).

So a solution for you might be to revert to that, by editing file 
e2_filelist.c. At about line 813 ...
                if (caseignore)
                {
                        freeme = g_utf8_casefold (buf[FILENAME], -1);
#ifdef USE_GTK2_8
                        buf[NAMEKEY] = g_utf8_collate_key_for_filename (freeme, 
-1);
#else
                        buf[NAMEKEY] = g_utf8_collate_key (freeme, -1);
#endif
                        g_free (freeme);
                }
                else
#ifdef USE_GTK2_8
                        buf[NAMEKEY] = g_utf8_collate_key_for_filename 
(buf[FILENAME], -1);
#else
                        buf[NAMEKEY] = g_utf8_collate_key (buf[FILENAME], -1);
#endif

becomes

                if (caseignore)
                {
                        freeme = g_utf8_casefold (buf[FILENAME], -1);
                        buf[NAMEKEY] = g_utf8_collate_key (freeme, -1);
                        g_free (freeme);
                }
                else
                        buf[NAMEKEY] = g_utf8_collate_key (buf[FILENAME], -1);


Otherwise, I suspect you'd need to persuade the [g]libc maintainers and/or your 
locale's LC_COLLATE data maintainers that there's a problem. Good luck with 
that ...


  Or else a switch in emelfm2. Afraid I rarely use C or
C++, but am fairly good at digging around and trying things.

JN


On 10/05/2010 03:40 AM, tpgww@xxxxxxxxxxx wrote:
On Mon, 04 Oct 2010 19:26:26 -0400
"Prof. John C Nash"<nashjc@xxxxxxxxxx>   wrote:

This is an issue that is not unique to elemfm2. It also afflicts 
gnome-commander and I'd
guess some other apps. It does not affect ls.

I'm running Ubuntu Lucid 10.04.1 (Gnome). I have a directory with a number of 
files. Here
are a few of the names:

jnrv_07312010.pdf
jnrv101004.tex
jnrv20100811.ps
jnrv2010a.tex

But the sorted pane for this directory (and in Gnome-Commander) is

jnrv2010a.tex
jnrv101004.tex
jnrv20100811.ps
jnrv_07312010.pdf

Does anyone know of documentation of this? I suspect some strange collating 
sequence rules
that have been embedded somewhere that will be awkward to find. In this case, 
there is
really no problem, but I came across this in a directory with several hundred 
files and
couldn't find one I'd just moved into the directory, at least not for a while.

Name-sorting relies on comparing strings created with glib functions - by default, 
g_utf8_collate_key(), or if built with support for gtk>= 2.8, then 
g_utf8_collate_key_for_filename().

According to glib documentation, they both "depend on the current locale", but the latter treats the dot '.' as a special case, and treats 
numbers intelligently so that "file1" "file10" "file5" are sorted as "file1" "file5" 
"file10".

Notwithstanding such claim, quite some time ago we investigated reported 
sorting anomalies by playing with the inherent collation-string creation 
process, but eventually concluded there's nothing reasonably practicable to be 
done to improve it.

Regards
Tom


For completeness, mc gets the ordering

jnrv101004.tex
jnrv20100811.ps
jnrv2010a.tex
jnrv_07312010.pdf

which increases my suspicions about collate sequence etc.


JN


--
Users can unsubscribe from the list by sending email to 
emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by 
logging into the web interface.




--
Users can unsubscribe from the list by sending email to 
emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by 
logging into the web interface.




--
Users can unsubscribe from the list by sending email to 
emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by 
logging into the web interface.

Other related posts: