[dokuwiki] Re: DokuWiki metadata handling

  • From: "Esther Brunner" <wikidesign@xxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Tue, 11 Apr 2006 12:06:25 +0200

Hi Chris

> I don't know anything about meta data standards.  I have taken a brief
> look at the two sites you mentioned.  Can you perhaps provide a little
> background judgement as to why you mention those two in particular.
> e.g. Are there others, are those two organisations and the work they do
> widely recognised and supported (at least in comparison with any others,
> if there are any).

The OAI (open archives initiative) protocol is used by many libraries
and scientific publications to make their content available to
specialised search engines (called harvesters). The repository must
support the requests ListIdentifiers, ListRecords, GetRecord,
Identify, ListMetadataFormats and ListSets and exchange a kind of XML
with the harvester. I'm in the train of writing / adapting such a
script for DokuWiki.

The OAI protocol is open for different types of structuring metadata,
but Dublin Core is what they highly recommend as 'resource discovery
lingua franca for metadata'. You probably know Dublin Core already
from RDF Site Summary XML used for feeds. It's an initiative to
establish a standard for metadata, see [1].

> Perhaps, and this is just *thinking out loud to provoke discussion*, it
> would make sense to use the generalised case of that...
> - we have an xhtml renderer for the main display of wiki pages.
> - add an RSS renderer for handling RSS feeds
> - add a meta renderer for production of meta data
> - ...

That could be an option as well, but I don't know how efficient it is
to set up an own renderer for metadata when most elements return
nothing. In the process of creating an XHTML page we're getting the
first heading and the TOC anyway and p_cached_instructions() can save
it to a file. The rest of the metadata (date created and modified,
creator and contributors) can be added in the save routine.

> Efficiency is obviously a consideration and if in the case of meta data
> it isn't expected to be dependent on other sources, maybe it is ok for
> meta data to be produced at instruction generation and hived off
> separately.  If so, what format should it be stored in? (It was that
> question that started the train of thought which lead to the idea of a
> meta data renderer).
> - XML?
> - serialised PHP?
> - handled like ACL, allowing some wiki's to store it in a database for
> more sophisticated indexing and/or combination with the same information
> from other sources.

I guess a serialised PHP array is the fastest and very flexible. For
external use a script is needed to generate XML or to store it in a
database.

> Lastly, if you're still reading :-), are there now two types of document
> meta data.
> - meta data used by dokuwiki, e.g. first heading (title), TOC, no cache,
> no toc,
> - meta data for external use, e.g. title (first heading), creator,
> keywords, TOC
>
> Does this argue for keeping a serialised version within the instruction
> list and an accessible version (location and format accessible)
> although not necessarily produced within the renderer....

In my opinion the instruction list should only contain the data
neccessary to generate the page contents. But this is note the sole
for purpose for using metadata within DokuWiki. If useheading is on,
we need to get the first heading also for the generation of other
pages and for the window title even before tpl_content() is called.
When we don't need to re-generate the page contents, we shouldn't have
to open the instructions file.

Best regards

-- esther

[1] http://www.xml.com/pub/a/2000/10/25/dublincore/index.html
‰.Z)"™¨¥Šx%ŠËf¢·¢ú¶m§ÿðŠH¬¦X­n¶¢žŠàÿ¤Šf¢–)à–+-

Other related posts: