[liblouis-liblouisxml] Re: Liblouis table header

  • From: "Michael Whapples" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "mwhapples@xxxxxxx" for DMARC)
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Mon, 20 Oct 2014 16:26:10 +0100

Hello,
Much of that sounds quite good.

I have some questions.
1. What if one is doing a partial search of a value (eg. If asking for (locale:en) which I might take to mean return any English table regardless of country). The reverse may also be desired, where a less specific match would be acceptable (eg. (locale:fr_FR) but if that specific one cannot be matched then (locale:fr) will also be checked). Locales come to mind because this comes up in other things (eg. Java applications choosing locale resource bundles), but other criteria may need this partial matching on values. 2. Where you mention doing matches with single keywords, why not just be like XPath's attribute matching, just check for a key regardless of its value instead of a separate tags field? 3. I imagine the API would have a query for tables function (IE. I give it a set of key, value pairs and it gives me a table which matches). There may be queries which could give multiple tables (eg. (grade:2)) so the function may return multiple tables. I think this would be better than just a single table (eg. the first matching table) as then the application could present the options to the user. 4. Thinking back to question 1, I think the indicator value may be useful, even with the query for tables function (IE. one could list the tables in order of best match). Also a function to check a named table against criteria may also be useful (IE. if the application wants to take more control over table handling, eg. caching query results).

Michael Whapples
On 20/10/2014 15:49, Bert Frees wrote:
Hi all,

I want to bring this subject up again because we've been discussing it so many
times and I think it's about time we finally do something. To recap, we need to
develop a header format that can contain metadata about the table, and an API
for extracting metadata and querying tables.

Greg's proposal with the single-line comment on the first line was a good
start. But I'd like to have something a bit more flexible/extensible. I have
worked out something and I'd like to have you guy's opinions and suggestions.

For my DAISY Pipeline 2 work I have been nurturing the idea of selecting
translators based on some kind of "translator query". The use case is quite the
same as we have here for liblouis. The syntax I am proposing for DAISY Pipeline
is inspired by CSS media queries. A query is basically a list of key-value pairs
or keywords. I'm not proposing to use exactly the same syntax for liblouis, but
I believe we need something similar/mappable.

Let's take Greg's example:

     #afr#1#Afrikaans Uncontracted#za#Afrikaans ongekontrakteerde

It consists of 5 metadata fields. 3 of them can be used for automatic table
selection. The two other are for pretty printing in graphical user
interfaces. Combining the two locale fields (language and country) into a single
tag, the corresponding CSS query would look like this:

     (locale:af-ZA) (grade:1)

The same key-value pairs could be put in liblouis table headers. I like the idea
of having the metadata in special comments, in order to assure backwards
compatibility of new tables with old library versions, and so that
implementations of the liblouis table format can choose whether or not they
support metadata.

A possible syntax could be `#+<KEY>: <value>`

     #+LOCALE: af-ZA
     #+GRADE: 1
     #+PRETTY_NAME: Afrikaans ongekontrakteerde
     #+PRETTY_NAME_EN: Afrikaans Uncontracted

(This is org-mode's syntax by the way.) The keywords would be
case-insensitive. Some keywords will be standard, such as LOCALE, GRADE and
PRETTY_NAME, but I wouldn't restrict the allowed keywords in any way, in order
to keep the system flexible.

The CSS query syntax also allows single keywords, without a value. For example:

     (ueb) (grade:2)

This could be reflected in a `#+TAGS` field with a list of space-separated
"tags":
#+LOCALE: en-US
     #+GRADE: 2
     #+TAGS: ueb

I haven't really thought about the API yet. It would be nice if one could
provide a list of key-value pairs and/or single keywords, together with a table
path, and get a table name back. The keys could possibly be sorted by
importance, and the API could possibly return some kind of "matching quotient"
that indicates whether the table is a good match for the query or not.


Thoughts?
Bert
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: