[usf-devel] embedded inline pictures

  • From: unmei <unmei@xxxxxxxxxxxx>
  • To: usf-devel@xxxxxxxxxxxxx
  • Date: Sat, 13 Aug 2005 19:35:05 +0200

Embedding lots of pictures into USF may be controversial and a very suboptimal way to have picture subtitle, agreed.
But there are few other options for picture subtitles the the current time - VobSub and that's about it.


For embedding lots of pictures into USF, using the already possible way with a <embedded> top-level element is not good. When muxing, this element is seperated from the timed subtitle data and stored somewhere else. *I assume* the former content of <embedded> is fully read into memory on startup, therefore seeking on the harddrive is not an issue, but there are two other problems:
1) The place where this embedded data goes is most likely not meant for storing huge amounts of data that is actually tightly related to timed events in this case it is even "use once and forget".
2) Pixifier loads all picture data provided in embedded <b64file> child elements into a internal "picture server" and there they are uncompressed! Another renderer may implement a solution where this data is not decompressed at startup but only on request, but this is then very "rerendered"-oriented and inferior for reused embedded pictures such as logos and animated pictures.


In order to provide a comparably low-ressource way for prerendered subtitles, the <image> element itself should be extended to allow picture data directly inside it.
This picture data has to be base64 encoded as well, therefore there is no gain on the file size. But it is a way where the situation is handled much more according the the semantics. Prerendered subtitle pictures are used only within "their" subtitle, only during that subtitle's on-screen time. There is no reason to keep them globally accessible and accessible over the entire script time.
Furthermore if you were to look at the "source", the USF in a text editor, it is much more "logical" to have such picture data where it is actually used - compared to having <image> elements all having only a filename attribute with distict value and all of the picture data in a huge <embedded> element.



Dropped ideas

*a child element of <image> analogue or identical to the <b64file> element in <embedded>. Problem: "element either has a child element XOR it has parseable character data" - maybe depending on presence of an attribute or only by detection. Having character data AND a child in the same <image> element would have to be prohibited - you cant have a reference plus inline data at the same time. It may be possible, but IMO it looks fishy. Also this additional element is additional overhead (even if only a few bytes, and it still is more complex).

*guess whether it is inline depending on the data.
Either by defining a length threshold where every content shorter than this is interpreted as reference and every content longer is interpreted as inline data. Problem: this threshold would have to be defined and it would restrict the freedom. No long filenames and no small inline files.
May not be too restricting, considering you seldom need filenames of 100+ characters and "reasonable" prerendered text pictures with less than maybe 300 byte are very rare, even if it is only a 2bit encoded short word. But as you can see the range where the threshold could be set is too large and one day there will be that picture that is smaller :)
Other detecting possibility is to take the content and try to interpret it as reference. Either you find a matching file, then assume it was a reference, or you don't find one in which case you try to un-base64 it and if successful try to detect the file type. You could also do it the other way round - trying to de-base64 it first. This approach as well is not good IMO because it is again too much guessing IMO.




Proposal

the <image> element should be changed in the following way:

*introduce a new attribute "content" (or suggest a better name)
This attribute is optional and defaults to the value "reference".
The possible values of this attribute are "reference"|"inline[/type]".

If the value of this attribute is "reference" (either explicit or implied), the image element contains a reference (file name) like it used to be.
But if this attribute is "inline" (possibly followed by a slash and a MIME-Type or file extension), the content character data of the element is a base64 encoded file.


This file is not to be assigned to a globally available and named image variable. It's scope only extends to this very <image> element and it cannot be referenced from the "outer world". It is decoded when this <image> element is first to be used and the decoded data is unloaded when the <image> element disappears from the screen.

Mentioning the type in the content attribute may be necessary to know what to do with the encoded file data. Remember that the encoded data does not include the file name. If you do not know the file type beforehand you eventually decompress the data only to discover you do not support that file format. Or you don't even know what format it is, until/unless you sucessfully found characteristics of a certain format in the file data. (actually this way is considered better than relying on the file extension, but it is more work).



Comment

The proposal is not strictly backward compatible.
But IMO it is about as safe as possible to use.
If a parser not knowing about inline reads a <image> element with inline data:
*it encounters an attribute it does not know. Possibly it will stop already here falling into some sort of error handling.
*If it continues, it reads the content in. It maybe unlucky that it incounters a (supposed) filename that is severeal kilobytes long, but it should have handling for this case. Then it either realises that this cannot be a filename due to some of the characters found may not be valid for a filename, or it tries to find a file of this name. If the supposed filename is long it is alomst impossible it finds such a file. If the supposed filename is very short there may be a slight chance it finds a match.


BUT a inline file will almost ever be at least a couple of hundred bytes long. The smallest sub picture i found is the word "Gut." in 2bit PNG or OPS - this is 300 byte (=400byte in base64). How large is the chance of finding a filename in the current path or embedded whos name maches all the 100+ characters.. it is about (number of files in search path)/(192^(length)). 192^10 is already ~ 70*10^21, 192^100 is 2*10^221. The chance that tomorrow will not happen is probably by orders bigger than that a useful base64 encoded file matches a filename.

I don't have file format recognition by only looking at the data in pixifier yet. I do rely on file extensions so far. But i will surely try to implement this better solution by the time i support inline image data (for the formats i support at all).

You are encouraged to comment on this proposal or suggest improvements. I am not fixed to this particular way, but i feel like having to implement inline pictures in one or the other way sometime soon.

Again, no answer means "go on" ;)

regards, unmei

http://usf.corecodec.org

Other related posts: