Michael Whapples writes: > To me special handling seems a bit like something we should not do, or > at least unless we have to. Hmm, I don't quite agree with you here. I wanted a mechanism that is extensible, but that doesn't mean we have to make it 100% generic. There's nothing wrong with a little special handling. There's only a handful of metadata fields that actually matter anyway and only a part of them need special attention. The goal is not to be as generic as possible, but to be as simple as possible (especially for the users and the client applications). And at the same time extensible. Some reasons for special handling are: 1. "Fuzzy" matching of locales. This is what we discussed before: #+LOCALE:en-US matches the query "locale:en" Locales could be automatically normalized, so that underscores and hyphens are equivalent: #+LOCALE:en-US matches the query "locale:en_US" 2. Searching with "special" keys that are not explicitely used in table headers, for example "min-xxx" can be used to set a lower boundary on a feature: #+GRADE:3 matches the query "min-grade:2" Or, LOCALE could be automatically split into LANGUAGE, COUNTRY and VARIANT: #+LOCALE:en-US matches the query "country:US" 3. For some features you'll want to define a default value (other than "unspecified"), and not every feature would have the same default. > So for the locale example where any locale for a language is acceptable, > you could use wild cards (eg. (locale:fr*) which can match fr, fr-FR, > fr_CA, etc). I would like to avoid wildcards if possible. We can bring in all kinds of generic and powerful stuff such as regexes and glob patterns but for me it's all overkill. Apart from the locale issue, I don't think wildcards would have much other use cases. > However doing locale handling in the application, such as fallback > locales (eg. when searching for en_GB, being able to fallback to en) may > be the simplest option. The wild card example above would still be > needed for giving a general language though (it could get too much to > expect applications to do multiple queries for every possible locale for > a given language and it may miss one). > > I think wildcards solve another issue, whilst reducing special handling. > The issue of tags/keywords being handled differently. > > Under my plan it would be that keywords are simply keys with no value: > #+eub: OK, this could work, although I think it's a bit counter-intuitive. It would mean that selecting a table with some tag would be semantically the same as selecting a table with an empty value for some key. More intuitive would be that in a query, the following are equivalent: "(<key>)" == "not (<key>:<default-value>)" where "not" would be a hypothetical logical NOT, and where the default value could be anything, not necessarily the empty value (e.g. "false" or "nil" or "undefined"). > By using wildcards should you want to match all values rather than empty > values use a search like (locale:*). OK, that does slightly differ from > XPath's attribute handling. > > May be there is a question of whether * matches no value, may be it does > and may be we need a 1 or more wildcard (IE. +). > > I have not worked out a way to make localised values (eg. for > pretty-name) not need special handling, but I will give it thought. I > would not want pretty-name to be a special case incase there are other > keys which need localising in the future. James Teh writes: > As I've noted in the past, I'm very much in favour of something like > this in principle. I like your proposal. It's probably the most > versatile and extensible so far. However, it's always the tricky details > that we never quite manage to iron out. > > I have two major concerns: > > 1. I don't think the concept of grade applies well to all braille codes. > For example, grade 1 in English is uncontracted and grade 2 is > contracted. I believe there are other languages where grade 1 is > actually contracted. I think we need to come up with a concept that > applies more universally; e.g. computer braille, uncontracted, > contracted, special purpose. Of course, a given code might have more > than one table in each category, so the "grade" concept might still be > useful, but perhaps not as a primary means of searching. You've made that clear in the past and I agree. The way I see it, we have two options. 1. We try to come up with a universal definition of "grade", apply that definition to all our tables, and make sure all our users understand the new definition. 2. We accept the fact that grade means something else in each language. We use a new concept (computer vs. uncontracted vs. contracted) to categorize the tables, but we combine it with the language-specific grade concept as an alternative or more differentiating selection criterion. If somebody wants to select a table with a certain level of contraction (in a language that has more than one table for contracted braille) we can safely assume that person is familiar with the definition of grade in that language. > 2. Some tables cover multiple locales; e.g. UEB would probably specify > multiple locales. Nothing in your proposal prohibits this, but it needs > to be taken into account as far as searching goes. UEB would still only cover English though, right? We could either solve this by allowing more than one locale in the table header. Or by making "table aliases", e.g. en-AU-g2.ctb could look like this: #+LOCALE: en-AU #+GRADE: 2 include en-UEB-g2.ctb > 3. Minor, subjective point: Can we have display_name or friendly_name > instead of pretty_name? :) Sure. Or nice_name? > 4. It would certainly be good to take localisation of > pretty/friendly/display names into account as you have suggested. This > eliminates the need for every project to localise these itself (making > for wasteful duplication of effort) as is currently the case. Going > forward, one concern would be how to get these names localised. While > NVDA and other projects have translators, these use more standard > formats and have established workflows. We'd need to find some way to > make it easy for existing translators to provide localised table names. As i said before we're trying to cover two completely different use cases here, discovery vs. localization, and I'm really starting to think we should take care of localization in a whole different way. Localization strings shouldn't have any influence on table selection results. This constraints the matching function a lot unless you treat nice-name as a special keyword. To illustrate the requirement above: if two tables match a query, I think you'd want the table with the least features not included in the query (and not having the default value) to be the first match. Let's take the following two tables: table_1: #+LOCALE: fr #+COMPUTER #+NICENAME[fr]: ... and table_2: #+LOCALE: fr #+NICENAME[fr]: ... #+NICENAME[en]: ... #+NICENAME[de]: ... For the query "(locale:fr)" the expected result would be table_2 first, then table_1, and not the other way around. This means the NICENAME fields must be ignored. > 5. It's also worth noting that putting the localised names in the table > file would mean a fairly significant (potentially 50+) number of > key/value pairs in each table. Might this cause performance problems? Right, that is a potential problem I've thought about too. Larry Skutchan writes: > It might also be useful to have a way to specify a table type. Types might > include literary, math, and music. Right, good idea. Although I still think tables for math and music really don't belong in this library, and that they should move to liblouisxml (assuming you mean MathML and MusicXML of course). But's that's another topic! Ken Perry writes: > I wonder though if this would not be better done as an extra xml file like an > index card sort of like how a daisy book is done. For example the xml file > could have all the info for a certain set of tables. It could also list all > the tables that are included in the grouping for example for ueb it could have > all the latten files and the braille pattern file listed as files to include > if you want contracted ueb. Thus the xml tool could be used not only as a > pretty way to show the name of the tables but also a way to group sets of > tables together so you know what tables are affected when you make changes. > An index card could also show relationships between different tables. And you > wouldn't have to have info at the top of every table just a card for each > language or braille type. We have considered the option of a central static database before but I believe at the end the consensus was that the distributed/dynamic approach was the best. (Take a look at the archives.) The idea of showing relationships between tables is interesting. But maybe that's a whole other subject? Keith Creasy writes: > As an XML fan I like the idea but really since LibLouis doesn't use XML > anywhere else it makes more sense to me to keep it as a simple key/value > table. Why add XML dependencies to something that has no compelling need to > use it? Yeah, I agree. It doesn't make the idea of a central database invalid, but I think it does rule out the option of using XML to implement it. Sorry for the long read :( Bert For a description of the software, to download it and links to project pages go to http://www.abilitiessoft.com