Hi everybody, during the hackfest at the WikiFest Berlin we discussed a lot about the meta data system. As not everybody was there and we don't have anything really fixed I thought it might be a good idea to summarize the discussion in this rather long email so everyone can participate in the not yet finished discussion and implementation. From the beginning it was clear that sqlite is no option as it is only available on about 50% of all DokuWiki installations we have data about. We've searched for different solutions. An obvious one was Lucene, there exists a pure PHP implementation of it in the Zend Framework. We decided against it as it's documentation states that it consumes a lot of memory, data can't be reconstructed from partially corrupted data and it is generally not recommended as system for storing data. I would have allowed very nice features like a search in the produced HTML code, but well, that isn't what we were looking for. We found another pure PHP database called Flat File DataBase (FFDB) for PHP and can be found at http://www.sourceforge.net/projects/ffdb-php/, but it wasn't convincing as it is more or less unmaintained and seemed to have a poor performance. We also found SofaDB which claims to be like CouchDB, but is in an early alpha stage and seems to be more like our current meta data store implemented with JSON. We've also had a look at http://mimesis.110mb.com/ which is another pure PHP key-value store, but it didn't look that promising, too. Apart from that we agreed that the current concept that metadata is stored in relatively simple files along with the wiki pages is something that fits into the spirit of DokuWiki quite well and thus should be kept. We've had a discussion about the file format for these meta files and if it should be changed to JSON and eventually if instructions should be changed to JSON, too. This would make these files more human readable which would make debugging easier and also creating them from outside of DokuWiki e.g. in an import script would be easier. We agreed however that writing meta files from outside of DokuWiki is no common usecase and thus performance is the point that really matters. I therefore created some performance tests using a dump of dokuwiki.org as test data. These tests including detailed results from a couple of hundreds of pages can be found at https://github.com/michitux/dokuwiki-test-serialize. In short: For instructions, JSON encode seems to be twice as fast as serialize, but again in decoding unserialize is about 10% faster. This is for the native json function in PHP, but that is only a module and might be disabled. Therefore we have a PHP only implementation already in DokuWiki included, for decoding it is about 250 times slower than the native implementation. For metadata it is different, there serialize is about 10% faster than JSON encode and unserialize is almost twice as fast as JSON decode. This time the PHP JSON decoder is about 500 times slower. These numbers convinced us to leave the current file format both for metadata and instructions. If there should be reasons for changing that file format it would also be easy to do that as it is wrapped into a few helper functions. We however noticed that meta data isn't written where it should be written which causes a couple of bugs. Currently the meta data is written whenever it doesn't exist and is accessed or when the xhtml cache is not used. That means that the metadata is only updated when the page content is displayed, so it isn't updated when the blogtng plugin is used and when the title is updated and useheading is activated the old title is displayed on the first view. Andi and me thus discussed that updating it in saveWikiText and in the indexer when the page is newer than the metadata should be enough (and thus the old update call should be completely replaced by the two new ones). It might be that some plugins that disable the xhtml cache rely on the meta data being rendered on every view, but we hope such a plugin doesn't exist. If you know of something that might break because of that change please tell. Additionally we want to have a quick way to search for certain criterias in meta files so e.g. tagging could be implemented easily without additional structures. We had a look at the current DokuWiki indexer and have some quite concrete ideas how it can be used for metadata. Of course not all metadata fields shall be indexed. The best way we could find for selecting these fields is introducing an event that populates a list that is filled with some default entries and that can be extended by plugins. For each of these keys an index file shall be created in the same way as the current word indexes. Like for words there shall also be a reverse index from pages to the different meta properties for removing old data. The index shall be updated in lib/exe/indexer.php whenever the metadata file is newer than a special meta-indexed file in the same directory. When reading such an index a short check if the corresponding pages still exists shall be done and when a page no longer exists it shall be removed from the index. We are not completely sure what we shall do with indexes of uninstalled plugins, but we think just keeping them like persistent metadata will be the best/easiest solution. Do you have some additional ideas, comments, or do you even know the fast, low-memory, pure PHP (ideally document-oriented) database we've missed? Then please write about that! And btw. if you didn't understand anything of what I've written above (and yet read this) you should either dig deeper into the whole metadata and indexer stuff documented at http://www.dokuwiki.org/devel:metadata and http://www.dokuwiki.org/devel:fulltextindex or just ignore it as hopefully everything will work so well you won't notice anything on the next release except less bugs and new features. Michael -- DokuWiki mailing list - more info at http://www.dokuwiki.org/mailinglist