[liblouis-liblouisxml] Re: Liblouis table header

  • From: "Michael Whapples" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "mwhapples@xxxxxxx" for DMARC)
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Mon, 20 Oct 2014 20:12:41 +0100

Hello,
To me special handling seems a bit like something we should not do, or at least unless we have to.

So for the locale example where any locale for a language is acceptable, you could use wild cards (eg. (locale:fr*) which can match fr, fr-FR, fr_CA, etc).

However doing locale handling in the application, such as fallback locales (eg. when searching for en_GB, being able to fallback to en) may be the simplest option. The wild card example above would still be needed for giving a general language though (it could get too much to expect applications to do multiple queries for every possible locale for a given language and it may miss one).

I think wildcards solve another issue, whilst reducing special handling. The issue of tags/keywords being handled differently.

Under my plan it would be that keywords are simply keys with no value:
#+eub:
By using wildcards should you want to match all values rather than empty values use a search like (locale:*). OK, that does slightly differ from XPath's attribute handling.

May be there is a question of whether * matches no value, may be it does and may be we need a 1 or more wildcard (IE. +).

I have not worked out a way to make localised values (eg. for pretty-name) not need special handling, but I will give it thought. I would not want pretty-name to be a special case incase there are other keys which need localising in the future.

Michael Whapples
On 20/10/2014 19:51, Bert Frees wrote:
Hammer Attila writes:

Bert, this is good ydea my openion.
Now, for example in Orca Screen Reader some Liblouis table names marked
for translation in Orca side, but more tables not.
If when Orca future requesting table list from the louis Python3 binding
and the table list function returning the localized table name, Joanie
have possibility to fill Contraction table combo box with translated
table name.
If I understanding right your examples, of course, only presents the
localized table name when the equals system locale is used.
For example, when I future using hungarian locale and your example the
afrikaans table pretty-name is "Afrikaans ongekontrakteerde", possible
translating future hungarian locale the afrikaans table name with
"afrikai" table name, or simple presents the english table name future
when I selecting a table from the Orca preferences dialog contraction
table combo box?
Possible extending the pretty_name tag to pretty-name[locale] variant?
So, the header have possibility to add following style translations:
pretty-name[af]="Afrikaans ongekontrakteerde"
pretty-name[hu]="afrikai (irodalmi)"
This is examples only.
`pretty-name` could be treated as a special metadata field with a special API
call associated, named something like "get_localized_pretty_name(char* table,
char* locale)". For mapping locales to strings in the table header, I had
proposed to use keywords of the form "#+pretty-name-hu", but your idea
"#+pretty-name[hu]" would work equaly fine and looks a bit better.

I come to realize now that we're trying to cover two completely different use
cases here, namely "table discovery" vs. "table name localization". Although
they are both related to metadata, I wonder if it's such a good idea to mix the
two.

Michael Whapples writes:

Hello,
Much of that sounds quite good.

I have some questions.
1. What if one is doing a partial search of a value (eg. If asking for
(locale:en) which I might take to mean return any English table
regardless of country). The reverse may also be desired, where a less
specific match would be acceptable (eg. (locale:fr_FR) but if that
specific one cannot be matched then (locale:fr) will also be checked).
Locales come to mind because this comes up in other things (eg. Java
applications choosing locale resource bundles), but other criteria may
need this partial matching on values.
Good point. I would say we treat `locale` as a special keyword and implement
some kind of fallback mechanism. E.g. when the query is "(locale:fr_FR)", tables
with locale "fr_FR" will get the most points, then "fr_FR_*" (a variant,
e.g. "fr_FR_1694acad"), then "fr", and then possibly "fr_*"
(e.g. "fr_CA"). Applications can still check the actual locale of a matching
table and decide to not consider it a match after all.

An alternative is for applications to make several query calls and implement the
fallback mechanism theirselves. (To illustrate, in CSS, several media queries
can be combined in a comma separated list. If one or more of the queries match,
the whole list matches, otherwise not.)

We could also allow applications to override the "matching function" for a
particular keyword, although that's pretty advanced usage already and I would
rather keep it as simple as possible.

Yet another approach is to allow multiple locales in a table header. For
example, a translation table for Spanish braille could possibly also be used for
Catalan. There's no way an automatic fallback mechanism would cover this case.

2. Where you mention doing matches with single keywords, why not just be
like XPath's attribute matching, just check for a key regardless of its
value instead of a separate tags field?
The single keywords were meant for things that can't really have a value (apart
maybe for values that evaluate to true or false). For this reason I wanted to
also treat them different from key-value pairs. I wanted to avoid things like
"(locale)" matching all tables that have a locale value. (In CSS media queries,
features, as they are called there, without a value actually kind of behave like
this. E.g. "(color)" means "(min-color: 1)". But this only really makes sense if
the required value type is an integer.)

But I also see where you're coming from. It might simplify things if we could
eliminate the special-purpose keyword "#+tags".

What if we make "(some-tag)" match tables with the field "#+some-tag" but not
tables with the field "#+some-tag: some-value"?

3. I imagine the API would have a query for tables function (IE. I give
it a set of key, value pairs and it gives me a table which matches).
There may be queries which could give multiple tables (eg. (grade:2)) so
the function may return multiple tables. I think this would be better
than just a single table (eg. the first matching table) as then the
application could present the options to the user.
OK, makes perfect sense.

4. Thinking back to question 1, I think the indicator value may be
useful, even with the query for tables function (IE. one could list the
tables in order of best match). Also a function to check a named table
against criteria may also be useful (IE. if the application wants to
take more control over table handling, eg. caching query results).
OK.

On 20/10/2014 15:49, Bert Frees wrote:
Hi all,

I want to bring this subject up again because we've been discussing it so many
times and I think it's about time we finally do something. To recap, we need to
develop a header format that can contain metadata about the table, and an API
for extracting metadata and querying tables.

Greg's proposal with the single-line comment on the first line was a good
start. But I'd like to have something a bit more flexible/extensible. I have
worked out something and I'd like to have you guy's opinions and suggestions.

For my DAISY Pipeline 2 work I have been nurturing the idea of selecting
translators based on some kind of "translator query". The use case is quite the
same as we have here for liblouis. The syntax I am proposing for DAISY Pipeline
is inspired by CSS media queries. A query is basically a list of key-value pairs
or keywords. I'm not proposing to use exactly the same syntax for liblouis, but
I believe we need something similar/mappable.

Let's take Greg's example:

      #afr#1#Afrikaans Uncontracted#za#Afrikaans ongekontrakteerde

It consists of 5 metadata fields. 3 of them can be used for automatic table
selection. The two other are for pretty printing in graphical user
interfaces. Combining the two locale fields (language and country) into a single
tag, the corresponding CSS query would look like this:

      (locale:af-ZA) (grade:1)

The same key-value pairs could be put in liblouis table headers. I like the idea
of having the metadata in special comments, in order to assure backwards
compatibility of new tables with old library versions, and so that
implementations of the liblouis table format can choose whether or not they
support metadata.

A possible syntax could be `#+<KEY>: <value>`

      #+LOCALE: af-ZA
      #+GRADE: 1
      #+PRETTY_NAME: Afrikaans ongekontrakteerde
      #+PRETTY_NAME_EN: Afrikaans Uncontracted

(This is org-mode's syntax by the way.) The keywords would be
case-insensitive. Some keywords will be standard, such as LOCALE, GRADE and
PRETTY_NAME, but I wouldn't restrict the allowed keywords in any way, in order
to keep the system flexible.

The CSS query syntax also allows single keywords, without a value. For example:

      (ueb) (grade:2)

This could be reflected in a `#+TAGS` field with a list of space-separated
"tags":
#+LOCALE: en-US
      #+GRADE: 2
      #+TAGS: ueb

I haven't really thought about the API yet. It would be nice if one could
provide a list of key-value pairs and/or single keywords, together with a table
path, and get a table name back. The keys could possibly be sorted by
importance, and the API could possibly return some kind of "matching quotient"
that indicates whether the table is a good match for the query or not.


Thoughts?
Bert
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: