[gpodder-devel] Non-human readable directory and file names

pdedecker at gmail.com (Pieter De Decker) · Wed, 31 Oct 2007 15:11:37 +0100

Alright, I see your point now. :)

How about adding an option to the right-click menu of each channel
saying "Open channel folder"?

Pieter

---
Thomas Perl schreef:
> Hello, Jay, Ionut and Pieter!
> 
> This mail is not intended to be rude or harsh, I just want to bring up
> real problems with using content from RSS files as base for file naming.
> If you can come up with a stable, sane and secure scheme for creating
> human-readable file names for all possible RSS feeds, please tell me :)
> 
> On Wed, 2007-10-31 at 11:03 +0000, Jay Bradley wrote:
>> I was wondering why gpodder stores the downloads in crazily named 
>> directories? I realise that it is partly to ensure unique directories
>> so there are no clashes but it means that it is impossible to browse 
>> through the podcast files manually. I know I can sync to a filesystem
>> so I do this for my mp3 player but I also normally use a soft link to
>> the podcast downloads directory for my mythtv installation as well. 
>> Currently I'm changing the device directory and syncing to my mp3
>> player and changing the device directory again to a separate directory
>> for mythtv. If the directory names were human readable then it would
>> save me a lot of hassle.
> 
> I see you have read the mailing list and are aware of the alternatives
> (MP3 Player sync).
> 
> Anyway, this topic has been discussed several times on this list, I
> guess it's time for a FAQ on the gPodder website.. ;)
> 
> First of all, here are some relevant postings related to the topic.
> Please read through them to get an overview of what has been proposed
> and discussed already:
> 
> https://lists.berlios.de/pipermail/gpodder-devel/2006-November/000283.html
> https://lists.berlios.de/pipermail/gpodder-devel/2007-June/000723.html
> https://lists.berlios.de/pipermail/gpodder-devel/2007-July/000756.html
> 
> Script that tries to solve that problem:
> 
> http://lists.berlios.de/pipermail/gpodder-devel/2007-August/000911.html 
> 
> I'm going to describe the problem you mention a bit further...
> 
> Basically, it's hard to create human-readable names because of the
> nature of RSS feeds. It's like with HTML - if browsers were going to
> reject non-standard HTML, all documents on the web would adhere to the
> standards, but thanks to such "useful" features as quirks mode, browsers
> try to fix the shortcomings of bad markup in the parser code.
> 
> But the problems with RSS feeds doesn't lie in bad markup. Most of the
> time, fields are not set (no <title> element in <item>), fields have
> empty value (<title> exists, but is empty) or very stupid usage of
> fields (just recently, we had a feed where <title> contained a
> description of the episode, a very long string).
> 
> There are two options here:
> 
>  a) reject any feeds that have no title, have a too long title or have 
>     some other weird properties that are not usual RSS practice
>  b) accept all feeds and try to make the best of "what we have"
> 
> gPodder tries to to the "b)" route and so we have to be prepared to
> accept feeds without <title>. As you can read from the november 2006
> post above (I think one of the inital thoughts about hashed filenames),
> hashing feed and episode URLs always gives us strings that have some
> sane and stable properties:
> 
>  1.) (high probability of) uniqueness
>  2.) sane length (even fixed, but at least not empty or too long)
>  3.) sane alphabet (hexadecimal, i.e. only the characters 0-9 and a-f)
> 
> So, for every given URL (and _every_ feed has an URL), we have a sane
> "ID" that we can use to identify that feed.
> 
> When depending on human-readable strings (i.e. title, etc..) we run into
> several problems:
> 
>  i.) what is the directory name of feeds with "<title></title>"??
>  ii.) what is the directory name of feed A with title "radio x podcast" 
>       when there already is a feed B with title "radio x podcast"?
>  iii.) what is the directory name of a feed with a loooong title?
>  iv.) what is the directory name of a feed with chinese characters as
>       title (from the top of my head, imagine (e.g. "???") when 
>       using FAT32 as file system?
> 
> We might be able to create a unique filename for a podcast episode from
> the basename of its url, but is there always an unique basename of the
> podcast feed? It might be "index.xml" or "podcast.rss".
> 
>> I never understand why some programs add a layer of complexity which 
>> removes the user one step from their files. I believe programs should
>> be as transparent as possible to allow people to do what they like
>> with the data produced by that program.
> 
> gPodder is transparent in that the user doesn't have to care about the
> directory layout, as the user can use the gPodder GUI to browse and
> listen to feeds - all feed information is displayed in the GUI.
> 
> You can always determine feed and episode info for given hashes:
> 
>  -> Hash (md5) the URLs in ~/.config/gpodder/channels.opml
>  -> MD5 of URL = directory name of feed
> 
>  -> Open the file "index.xml" in the feed download directory
>  -> Hash (md5) the URLs in that file
>  -> MD5 of URL + extension of basename of URL = filename of episode
> 
> In pseuco-code, this is something like:
> 
> opml_file = $HOME + '/.config/gpodder/channels.opml'
> 
> ( ... feed_url is to be obtained from opml_file ... )
> feed_directory = gpodder_download_dir + '/' + md5sum( feed_url )
> feed_index = feed_directory + '/index.xml'
> 
> ( ... episode_url is to be obtained from feed_index ... )
> extension = file_extension_of( basename( episode_url ) )
> episode_name = feed_directory + '/' + md5sum ( episode_url ) + extension
> 
>> Using the non-human readable directory and filenames stops users from
>> accessing their files except through one program (gpodder) which is a
>> shame.
> 
> You can use the above method to find more information (metadata) for the
> files than you can with human readable directories, including the title
> and description of episodes.
> 
>> I've looked through the source code but cannot find where the
>> directory names are set. I'm an okay programmer so could do this
>> myself if someone could point me in the right direction. I'd do it to
>> just my local copy if this wasn't something anyone else would be
>> interested in.
> 
> Please, by all means try to do it. If it works for all RSS feeds, I
> would be very happy to merge it into gPodder, as it would be the better
> solution than what we have now. But because of the reasons I mentioned
> above, I am very skeptic if this is possible at all.
> 
> The directory name for channels is determined by the "get_filename"
> function of the class "podcastChannel" in "src/gpodder/libpodcasts.py".
> The attributes that _should_ be available when this function is called
> are "url", "title" and "description" (i.e. "self.title").
> 
> The filename for an episode is determined by the "local_filename"
> function of the class "podcastItem" in "src/gpodder/libpodcasts.py".
> Only the "url" attribute is guranteed to be available, for all other
> properties, the best possible value is extracted from the RSS feed, but
> you can expect the "title" value to be somewhat identifying, but not
> unique. You also have to be aware that the "title" value _could_ be very
> long (think of a description field value that has been misplaced).
> 
>> I realise there may be some other reason why the names are non-human 
>> readable so if I've missed it then please could someone let me know.
> 
> Apart from the practical reasons I mentioned above, there is no real
> reason why the hashes are chosen. It was a simple and straightforward
> solution to a problem for which we have not yet found a better solution.
> 
> It would be quite cool if you could come up with something friendlier :)
> 
> If you want, please send the modifications you make to make gPodder's
> directory structure human-readable. It will be a nice-to-have patch for
> interested people :)
> 
> 
> Thanks and Good Luck!
> Thomas
> _______________________________________________
> gpodder-devel mailing list
> gpodder-devel at lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/gpodder-devel

[gpodder-devel] Non-human readable directory and file names

Other related posts: