Re: filelist - sort order
- From: <tpgww@xxxxxxxxxxx>
- To: emelfm2@xxxxxxxxxxxxx
- Date: Tue, 14 Apr 2009 21:20:09 +1000
On Tue, 14 Apr 2009 08:34:03 +0200
Liviu Andronic <landronimirc@xxxxxxxxx> wrote:
> On Sat, Apr 11, 2009 at 6:52 AM, <tpgww@xxxxxxxxxxx> wrote:
> > I can't comment on the others, but e2 uses glib functionality which,
> > according to API information, "compares strings for ordering using the
> > linguistically correct rules for the current locale" but with special
> > treatment of any ".". Maybe it's that special treatment which is
> > mis-behaving ?
> >
> It seems there are issues with other characters:
> "R-help archive May 2004: Re: [R] Export summary statistics to latex_01.mht
> R help archive: Re: [R] Export to LaTeX.mht
> Ricci-refcard-regression.pdf
> Ricci-refcard-ts_1.pdf
> R-intro.pdf
> Rivera-Tutorial_Sweave.pdf
> R-lang.pdf
> R_language.pdf
> Rmanual.pdf
> Rnews_2004-2.pdf
> Rnews_2005-1.pdf
> Rnews_2006-2.pdf
> Rnews_2007-2.pdf
> Rnews_2007-3.pdf
> Rnews_2008-1.pdf
> [R-pkgs] Rcmdr 1.3-0 and RcmdrPlugins.TeachingDemos.mht
> R: Principal Component Analysis - dudi.pca.mht
> R_relative_statpack.pdf
> [R] RODBC fail install.html
> R tips & tricks.R
> rv.pdf
> rwiki-graphics-export.html.mht"
>
> "R help*" and "R tips*" are separated by a big bunch of documents.
> "R-intro.pdf" is between "Ricci-refcard*" and "Rivera-Tutorial*".
> Many non "R-*" docs get between "R-help*" and "R-intro*".
> As the above for "R_language*" and "R_relative*".
>
> I couldn't say what would be the right way to sort all this mess, but
> as it stands it doesn't seem right. However, the terminal orders them
> similarly:
> liviu@localhost /tmp/test $ ls
> R-help archive May 2004: Re: [R] Export summary statistics to latex_01.mht
> R help archive: Re: [R] Export to LaTeX.mht
> Ricci-refcard-regression.pdf
> Ricci-refcard-ts_1.pdf
> R-intro.pdf
> Rivera-Tutorial_Sweave.pdf
> R-lang.pdf
> R_language.pdf
> Rmanual.pdf
> Rnews_2004-2.pdf
> Rnews_2005-1.pdf
> Rnews_2006-2.pdf
> Rnews_2007-2.pdf
> Rnews_2007-3.pdf
> Rnews_2008-1.pdf
> [R-pkgs] Rcmdr 1.3-0 and RcmdrPlugins.TeachingDemos.mht
> R: Principal Component Analysis - dudi.pca.mht
> R_relative_statpack.pdf
> [R] RODBC fail install.html
> R tips & tricks.R
> rv.pdf
> rwiki-graphics-export.html.mht
>
> But Thunar does not (see attached). Personally, I'd probably expect a
> Thunar-sort of sorting.
This is from glib source:
/*
* How it works:
*
* Split the filename into collatable substrings which do
* not contain [.0-9] and special-cased substrings. The collatable
* substrings are run through the normal g_utf8_collate_key() and the
* resulting keys are concatenated with keys generated from the
* special-cased substrings.
*
* Special cases: Dots are handled by replacing them with '\1' which
* implies that short dot-delimited substrings are before long ones,
* e.g.
*
* a\1a (a.a)
* a-\1a (a-.a)
* aa\1a (aa.a)
*
* Numbers are handled by prepending to each number d-1 superdigits
* where d = number of digits in the number and SUPERDIGIT is a
* character with an integer value higher than any digit (for instance
* ':'). This ensures that single-digit numbers are sorted before
* double-digit numbers which in turn are sorted separately from
* triple-digit numbers, etc. To avoid strange side-effects when
* sorting strings that already contain SUPERDIGITs, a '\2'
* is also prepended, like this
*
* file\21 (file1)
* file\25 (file5)
* file\2:10 (file10)
* file\2:26 (file26)
* file\2::100 (file100)
* file:foo (file:foo)
*
* This has the side-effect of sorting numbers before everything else (except
* dots), but this is probably OK.
*
* Leading digits are ignored when doing the above. To discriminate
* numbers which differ only in the number of leading digits, we append
* the number of leading digits as a byte at the very end of the collation
* key.
*
* To try avoid conflict with any collation key sequence generated by libc we
* start each switch to a special cased part with a sentinel that hopefully
* will sort before anything libc will generate.
*/
I've not investigated whether this algorithm is truly effective, or whether the
result matches the algorithm. Maybe there's some issue with the 'collatable
substrings' followed by concatenation ? If anyone has time to make sense of
this, consider posting a glib bug-report ... e.g. strings that _begin_ with a
number could have extra '/' prepended 111 >> (/\2111) to sort them correctly ?
The backend seems to be built around wcsxfrm() or strxfrm() from [g]libc.
Regards
Tom
--
Users can unsubscribe from the list by sending email to
emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by
logging into the web interface.
Other related posts: