Sorry to comment on my own message, but now that I think of it a bit more, why not just use the entire URL of the actual podcast, instead of the feed. For example, the podcast downloaded from: http://www.hbo.com/video/podcasts/billmaher/637314_dl.mp3 would be stored at the following location in the filesystem: www.hbo.com/video/podcasts/billmaher/637314_dl.mp3 Every podcast must have a unique URL, so you know it's always unique, and always exists. Chris Chris McCabe wrote: > Here's a quick thought: > > How about creating the directory name from the URL of the feed, which > will always exist. So for example, the podcasts from the feed: > http://www.hbo.com/apps/podcasts/podcast.xml?a=2 > > would all be saved in the directory (relative to the download directory): > www.hbo.com/apps/podcasts/podcast.xml?a=2/ > > It would end up creating a few more directory levels than are really > necessary, but it would be guaranteed to be unique, and would make it > easy to find the podcasts. It would also automatically group feeds > from the same website together. > You would still have the problem of naming each individual podcast > from that feed, but at least half the problem is solved. > > For naming the podcasts, one easy scheme would be to name it with the > release date of the podcast, or if not available, the download date, > with an extra number to make it unique if necessary. For example: > 2007.10.31.001.mp3 > > This has the advantage that the alphabetical directory listing will > list the podcasts in order. It has the disadvantage that you wouldn't > be able to match with certainty podcasts to filenames without > additional information. > > > Just some thoughts. > > Chris > > > Thomas Perl wrote: >> Hello, Jay, Ionut and Pieter! >> >> This mail is not intended to be rude or harsh, I just want to bring up >> real problems with using content from RSS files as base for file naming. >> If you can come up with a stable, sane and secure scheme for creating >> human-readable file names for all possible RSS feeds, please tell me :) >> >> On Wed, 2007-10-31 at 11:03 +0000, Jay Bradley wrote: >> >>> I was wondering why gpodder stores the downloads in crazily named >>> directories? I realise that it is partly to ensure unique directories >>> so there are no clashes but it means that it is impossible to browse >>> through the podcast files manually. I know I can sync to a filesystem >>> so I do this for my mp3 player but I also normally use a soft link to >>> the podcast downloads directory for my mythtv installation as well. >>> Currently I'm changing the device directory and syncing to my mp3 >>> player and changing the device directory again to a separate directory >>> for mythtv. If the directory names were human readable then it would >>> save me a lot of hassle. >>> >> >> I see you have read the mailing list and are aware of the alternatives >> (MP3 Player sync). >> >> Anyway, this topic has been discussed several times on this list, I >> guess it's time for a FAQ on the gPodder website.. ;) >> >> First of all, here are some relevant postings related to the topic. >> Please read through them to get an overview of what has been proposed >> and discussed already: >> >> https://lists.berlios.de/pipermail/gpodder-devel/2006-November/000283.html >> >> https://lists.berlios.de/pipermail/gpodder-devel/2007-June/000723.html >> https://lists.berlios.de/pipermail/gpodder-devel/2007-July/000756.html >> >> Script that tries to solve that problem: >> >> http://lists.berlios.de/pipermail/gpodder-devel/2007-August/000911.html >> I'm going to describe the problem you mention a bit further... >> >> Basically, it's hard to create human-readable names because of the >> nature of RSS feeds. It's like with HTML - if browsers were going to >> reject non-standard HTML, all documents on the web would adhere to the >> standards, but thanks to such "useful" features as quirks mode, browsers >> try to fix the shortcomings of bad markup in the parser code. >> >> But the problems with RSS feeds doesn't lie in bad markup. Most of the >> time, fields are not set (no <title> element in <item>), fields have >> empty value (<title> exists, but is empty) or very stupid usage of >> fields (just recently, we had a feed where <title> contained a >> description of the episode, a very long string). >> >> There are two options here: >> >> a) reject any feeds that have no title, have a too long title or >> have some other weird properties that are not usual RSS practice >> b) accept all feeds and try to make the best of "what we have" >> >> gPodder tries to to the "b)" route and so we have to be prepared to >> accept feeds without <title>. As you can read from the november 2006 >> post above (I think one of the inital thoughts about hashed filenames), >> hashing feed and episode URLs always gives us strings that have some >> sane and stable properties: >> >> 1.) (high probability of) uniqueness >> 2.) sane length (even fixed, but at least not empty or too long) >> 3.) sane alphabet (hexadecimal, i.e. only the characters 0-9 and a-f) >> >> So, for every given URL (and _every_ feed has an URL), we have a sane >> "ID" that we can use to identify that feed. >> >> When depending on human-readable strings (i.e. title, etc..) we run into >> several problems: >> >> i.) what is the directory name of feeds with "<title></title>"?? >> ii.) what is the directory name of feed A with title "radio x >> podcast" when there already is a feed B with title "radio x >> podcast"? >> iii.) what is the directory name of a feed with a loooong title? >> iv.) what is the directory name of a feed with chinese characters as >> title (from the top of my head, imagine (e.g. "???") when >> using FAT32 as file system? >> >> We might be able to create a unique filename for a podcast episode from >> the basename of its url, but is there always an unique basename of the >> podcast feed? It might be "index.xml" or "podcast.rss". >> >> >>> I never understand why some programs add a layer of complexity which >>> removes the user one step from their files. I believe programs should >>> be as transparent as possible to allow people to do what they like >>> with the data produced by that program. >>> >> >> gPodder is transparent in that the user doesn't have to care about the >> directory layout, as the user can use the gPodder GUI to browse and >> listen to feeds - all feed information is displayed in the GUI. >> >> You can always determine feed and episode info for given hashes: >> >> -> Hash (md5) the URLs in ~/.config/gpodder/channels.opml >> -> MD5 of URL = directory name of feed >> >> -> Open the file "index.xml" in the feed download directory >> -> Hash (md5) the URLs in that file >> -> MD5 of URL + extension of basename of URL = filename of episode >> >> In pseuco-code, this is something like: >> >> opml_file = $HOME + '/.config/gpodder/channels.opml' >> >> ( ... feed_url is to be obtained from opml_file ... ) >> feed_directory = gpodder_download_dir + '/' + md5sum( feed_url ) >> feed_index = feed_directory + '/index.xml' >> >> ( ... episode_url is to be obtained from feed_index ... ) >> extension = file_extension_of( basename( episode_url ) ) >> episode_name = feed_directory + '/' + md5sum ( episode_url ) + extension >> >> >>> Using the non-human readable directory and filenames stops users from >>> accessing their files except through one program (gpodder) which is a >>> shame. >>> >> >> You can use the above method to find more information (metadata) for the >> files than you can with human readable directories, including the title >> and description of episodes. >> >> >>> I've looked through the source code but cannot find where the >>> directory names are set. I'm an okay programmer so could do this >>> myself if someone could point me in the right direction. I'd do it to >>> just my local copy if this wasn't something anyone else would be >>> interested in. >>> >> >> Please, by all means try to do it. If it works for all RSS feeds, I >> would be very happy to merge it into gPodder, as it would be the better >> solution than what we have now. But because of the reasons I mentioned >> above, I am very skeptic if this is possible at all. >> >> The directory name for channels is determined by the "get_filename" >> function of the class "podcastChannel" in "src/gpodder/libpodcasts.py". >> The attributes that _should_ be available when this function is called >> are "url", "title" and "description" (i.e. "self.title"). >> >> The filename for an episode is determined by the "local_filename" >> function of the class "podcastItem" in "src/gpodder/libpodcasts.py". >> Only the "url" attribute is guranteed to be available, for all other >> properties, the best possible value is extracted from the RSS feed, but >> you can expect the "title" value to be somewhat identifying, but not >> unique. You also have to be aware that the "title" value _could_ be very >> long (think of a description field value that has been misplaced). >> >> >>> I realise there may be some other reason why the names are non-human >>> readable so if I've missed it then please could someone let me know. >>> >> >> Apart from the practical reasons I mentioned above, there is no real >> reason why the hashes are chosen. It was a simple and straightforward >> solution to a problem for which we have not yet found a better solution. >> >> It would be quite cool if you could come up with something friendlier :) >> >> If you want, please send the modifications you make to make gPodder's >> directory structure human-readable. It will be a nice-to-have patch for >> interested people :) >> >> >> Thanks and Good Luck! >> Thomas >> _______________________________________________ >> gpodder-devel mailing list >> gpodder-devel at lists.berlios.de >> https://lists.berlios.de/mailman/listinfo/gpodder-devel >> > >