Re: filelist - sort order

  • From: <tpgww@xxxxxxxxxxx>
  • To: emelfm2@xxxxxxxxxxxxx
  • Date: Tue, 14 Apr 2009 21:20:09 +1000

On Tue, 14 Apr 2009 08:34:03 +0200
Liviu Andronic <landronimirc@xxxxxxxxx> wrote:

> On Sat, Apr 11, 2009 at 6:52 AM,  <tpgww@xxxxxxxxxxx> wrote:
> > I can't comment on the others, but e2 uses glib functionality which, 
> > according to API information, "compares strings for ordering using the 
> > linguistically correct rules for the current locale" but with special 
> > treatment of any ".". Maybe it's that special treatment which is 
> > mis-behaving ?
> >
> It seems there are issues with other characters:
> "R-help archive May 2004: Re: [R] Export summary statistics to latex_01.mht
> R help archive: Re: [R] Export to LaTeX.mht
> Ricci-refcard-regression.pdf
> Ricci-refcard-ts_1.pdf
> R-intro.pdf
> Rivera-Tutorial_Sweave.pdf
> R-lang.pdf
> R_language.pdf
> Rmanual.pdf
> Rnews_2004-2.pdf
> Rnews_2005-1.pdf
> Rnews_2006-2.pdf
> Rnews_2007-2.pdf
> Rnews_2007-3.pdf
> Rnews_2008-1.pdf
> [R-pkgs] Rcmdr 1.3-0 and RcmdrPlugins.TeachingDemos.mht
> R: Principal Component Analysis - dudi.pca.mht
> R_relative_statpack.pdf
> [R] RODBC fail install.html
> R tips & tricks.R
> rv.pdf
> rwiki-graphics-export.html.mht"
> 
> "R help*" and "R tips*" are separated by a big bunch of documents.
> "R-intro.pdf" is between "Ricci-refcard*" and "Rivera-Tutorial*".
> Many non "R-*" docs get between "R-help*" and "R-intro*".
> As the above for "R_language*" and "R_relative*".
> 
> I couldn't say what would be the right way to sort all this mess, but
> as it stands it doesn't seem right. However, the terminal orders them
> similarly:
> liviu@localhost /tmp/test $ ls
> R-help archive May 2004: Re: [R] Export summary statistics to latex_01.mht
> R help archive: Re: [R] Export to LaTeX.mht
> Ricci-refcard-regression.pdf
> Ricci-refcard-ts_1.pdf
> R-intro.pdf
> Rivera-Tutorial_Sweave.pdf
> R-lang.pdf
> R_language.pdf
> Rmanual.pdf
> Rnews_2004-2.pdf
> Rnews_2005-1.pdf
> Rnews_2006-2.pdf
> Rnews_2007-2.pdf
> Rnews_2007-3.pdf
> Rnews_2008-1.pdf
> [R-pkgs] Rcmdr 1.3-0 and RcmdrPlugins.TeachingDemos.mht
> R: Principal Component Analysis - dudi.pca.mht
> R_relative_statpack.pdf
> [R] RODBC fail install.html
> R tips & tricks.R
> rv.pdf
> rwiki-graphics-export.html.mht
> 
> But Thunar does not (see attached). Personally, I'd probably expect a
> Thunar-sort of sorting.

This is from glib source:

  /*
   * How it works:
   *
   * Split the filename into collatable substrings which do
   * not contain [.0-9] and special-cased substrings. The collatable 
   * substrings are run through the normal g_utf8_collate_key() and the 
   * resulting keys are concatenated with keys generated from the 
   * special-cased substrings.
   *
   * Special cases: Dots are handled by replacing them with '\1' which 
   * implies that short dot-delimited substrings are before long ones, 
   * e.g.
   * 
   *   a\1a   (a.a)
   *   a-\1a  (a-.a)
   *   aa\1a  (aa.a)
   * 
   * Numbers are handled by prepending to each number d-1 superdigits 
   * where d = number of digits in the number and SUPERDIGIT is a 
   * character with an integer value higher than any digit (for instance 
   * ':'). This ensures that single-digit numbers are sorted before 
   * double-digit numbers which in turn are sorted separately from 
   * triple-digit numbers, etc. To avoid strange side-effects when 
   * sorting strings that already contain SUPERDIGITs, a '\2'
   * is also prepended, like this
   *
   *   file\21      (file1)
   *   file\25      (file5)
   *   file\2:10    (file10)
   *   file\2:26    (file26)
   *   file\2::100  (file100)
   *   file:foo     (file:foo)
   * 
   * This has the side-effect of sorting numbers before everything else (except
   * dots), but this is probably OK.
   *
   * Leading digits are ignored when doing the above. To discriminate
   * numbers which differ only in the number of leading digits, we append
   * the number of leading digits as a byte at the very end of the collation
   * key.
   *
   * To try avoid conflict with any collation key sequence generated by libc we
   * start each switch to a special cased part with a sentinel that hopefully
   * will sort before anything libc will generate.
   */

I've not investigated whether this algorithm is truly effective, or whether the 
result matches the algorithm. Maybe there's some issue with the 'collatable 
substrings' followed by concatenation ? If anyone has time to make sense of 
this, consider posting a glib bug-report ... e.g. strings that _begin_ with a 
number could have extra '/' prepended 111 >> (/\2111) to sort them correctly ?

The backend seems to be built around wcsxfrm() or strxfrm() from [g]libc.

Regards
Tom


-- 
Users can unsubscribe from the list by sending email to 
emelfm2-request@xxxxxxxxxxxxx with 'unsubscribe' in the subject field or by 
logging into the web interface.

Other related posts: