[liblouis-liblouisxml] Re: Liblouis table header

From: Bert Frees <bertfrees@xxxxxxxxx>
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Date: Tue, 21 Oct 2014 16:45:25 +0200
Michael Whapples writes:

> To me special handling seems a bit like something we should not do, or 
> at least unless we have to.

Hmm, I don't quite agree with you here. I wanted a mechanism that is extensible,
but that doesn't mean we have to make it 100% generic. There's nothing wrong
with a little special handling. There's only a handful of metadata fields that
actually matter anyway and only a part of them need special attention. The goal
is not to be as generic as possible, but to be as simple as possible (especially
for the users and the client applications). And at the same time extensible.

Some reasons for special handling are:

1. "Fuzzy" matching of locales. This is what we discussed before:
   
       #+LOCALE:en-US matches the query "locale:en"
   
   Locales could be automatically normalized, so that underscores and hyphens
   are equivalent:
   
       #+LOCALE:en-US matches the query "locale:en_US"

2. Searching with "special" keys that are not explicitely used in table headers,
   for example "min-xxx" can be used to set a lower boundary on a feature:
   
       #+GRADE:3 matches the query "min-grade:2"
   
   Or, LOCALE could be automatically split into LANGUAGE, COUNTRY and VARIANT:
   
       #+LOCALE:en-US matches the query "country:US"
   
3. For some features you'll want to define a default value (other than
   "unspecified"), and not every feature would have the same default.

> So for the locale example where any locale for a language is acceptable, 
> you could use wild cards (eg. (locale:fr*) which can match fr, fr-FR, 
> fr_CA, etc).

I would like to avoid wildcards if possible. We can bring in all kinds of
generic and powerful stuff such as regexes and glob patterns but for me it's all
overkill. Apart from the locale issue, I don't think wildcards would have much
other use cases.

> However doing locale handling in the application, such as fallback 
> locales (eg. when searching for en_GB, being able to fallback to en) may 
> be the simplest option. The wild card example above would still be 
> needed for giving a general language though (it could get too much to 
> expect applications to do multiple queries for every possible locale for 
> a given language and it may miss one).
>
> I think wildcards solve another issue, whilst reducing special handling. 
> The issue of tags/keywords being handled differently.
>
> Under my plan it would be that keywords are simply keys with no value:
> #+eub:

OK, this could work, although I think it's a bit counter-intuitive. It would
mean that selecting a table with some tag would be semantically the same as
selecting a table with an empty value for some key. More intuitive would be that
in a query, the following are equivalent:

    "(<key>)" == "not (<key>:<default-value>)"

where "not" would be a hypothetical logical NOT, and where the default value
could be anything, not necessarily the empty value (e.g. "false" or "nil" or
"undefined").

> By using wildcards should you want to match all values rather than empty 
> values use a search like (locale:*). OK, that does slightly differ from 
> XPath's attribute handling.
>
> May be there is a question of whether * matches no value, may be it does 
> and may be we need a 1 or more wildcard (IE. +).
>
> I have not worked out a way to make localised values (eg. for 
> pretty-name) not need special handling, but I will give it thought. I 
> would not want pretty-name to be a special case incase there are other 
> keys which need localising in the future.


James Teh writes:

> As I've noted in the past, I'm very much in favour of something like 
> this in principle. I like your proposal. It's probably the most 
> versatile and extensible so far. However, it's always the tricky details 
> that we never quite manage to iron out.
>
> I have two major concerns:
>
> 1. I don't think the concept of grade applies well to all braille codes. 
> For example, grade 1 in English is uncontracted and grade 2 is 
> contracted. I believe there are other languages where grade 1 is 
> actually contracted. I think we need to come up with a concept that 
> applies more universally; e.g. computer braille, uncontracted, 
> contracted, special purpose. Of course, a given code might have more 
> than one table in each category, so the "grade" concept might still be 
> useful, but perhaps not as a primary means of searching.

You've made that clear in the past and I agree. The way I see it, we have two
options.

1. We try to come up with a universal definition of "grade", apply that
   definition to all our tables, and make sure all our users understand the new
   definition.

2. We accept the fact that grade means something else in each language. We use a
   new concept (computer vs. uncontracted vs. contracted) to categorize the
   tables, but we combine it with the language-specific grade concept as an
   alternative or more differentiating selection criterion. If somebody wants to
   select a table with a certain level of contraction (in a language that has
   more than one table for contracted braille) we can safely assume that person
   is familiar with the definition of grade in that language.

> 2. Some tables cover multiple locales; e.g. UEB would probably specify 
> multiple locales. Nothing in your proposal prohibits this, but it needs 
> to be taken into account as far as searching goes.

UEB would still only cover English though, right?

We could either solve this by allowing more than one locale in the table
header. Or by making "table aliases", e.g. en-AU-g2.ctb could look like this:
    
    #+LOCALE: en-AU
    #+GRADE: 2
    include en-UEB-g2.ctb

> 3. Minor, subjective point: Can we have display_name or friendly_name 
> instead of pretty_name? :)

Sure. Or nice_name?

> 4. It would certainly be good to take localisation of 
> pretty/friendly/display names into account as you have suggested. This 
> eliminates the need for every project to localise these itself (making 
> for wasteful duplication of effort) as is currently the case. Going 
> forward, one concern would be how to get these names localised. While 
> NVDA and other projects have translators, these use more standard 
> formats and have established workflows. We'd need to find some way to 
> make it easy for existing translators to provide localised table names.

As i said before we're trying to cover two completely different use cases here,
discovery vs. localization, and I'm really starting to think we should take care
of localization in a whole different way. Localization strings shouldn't have
any influence on table selection results. This constraints the matching function
a lot unless you treat nice-name as a special keyword.

To illustrate the requirement above: if two tables match a query, I think you'd
want the table with the least features not included in the query (and not having
the default value) to be the first match. Let's take the following two tables:

table_1:

             #+LOCALE: fr
             #+COMPUTER
             #+NICENAME[fr]: ...

and table_2:

             #+LOCALE: fr
             #+NICENAME[fr]: ...
             #+NICENAME[en]: ...
             #+NICENAME[de]: ...

For the query "(locale:fr)" the expected result would be table_2 first, then
table_1, and not the other way around. This means the NICENAME fields must be
ignored.

> 5. It's also worth noting that putting the localised names in the table 
> file would mean a fairly significant (potentially 50+) number of 
> key/value pairs in each table. Might this cause performance problems?

Right, that is a potential problem I've thought about too.


Larry Skutchan writes:

> It might also be useful to have a way to specify a table type. Types might
> include literary, math, and music.

Right, good idea. Although I still think tables for math and music really don't
belong in this library, and that they should move to liblouisxml (assuming you
mean MathML and MusicXML of course). But's that's another topic!


Ken Perry writes:

> I wonder though if this would not be better done as an extra xml file like an
> index card sort of like how a daisy book is done.  For example the xml file
> could have all the info for a certain set of tables. It could also list all
> the tables that are included in the grouping for example for ueb it could have
> all the latten files and the braille pattern file listed as files to include
> if you want contracted ueb.  Thus the xml tool could be used not only as a
> pretty way to show the name of the tables but also a way to group sets of
> tables together so you know what tables are affected when you make changes.
> An index card could also show relationships between different tables.  And you
> wouldn't have to have info at the top of every table just a card for each
> language or braille type.

We have considered the option of a central static database before but I believe
at the end the consensus was that the distributed/dynamic approach was the
best. (Take a look at the archives.)

The idea of showing relationships between tables is interesting. But maybe
that's a whole other subject?


Keith Creasy writes:

> As an XML fan I like the idea but really since LibLouis doesn't use XML
> anywhere else it makes more sense to me to keep it as a simple key/value
> table. Why add XML dependencies to something that has no compelling need to
> use it?

Yeah, I agree. It doesn't make the idea of a central database invalid, but I
think it does rule out the option of using XML to implement it.


Sorry for the long read :(
Bert
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com
References:
- [liblouis-liblouisxml] Liblouis table header
  - From: Bert Frees
- [liblouis-liblouisxml] Re: Liblouis table header
  - From: Michael Whapples
- [liblouis-liblouisxml] Re: Liblouis table header
  - From: Bert Frees
- [liblouis-liblouisxml] Re: Liblouis table header
  - From: Michael Whapples
[liblouis-liblouisxml] Re: Liblouis table header

Other related posts: