At 12:01 6.11.2002 +0000, James wrote: >In the Zip format, both the file headers and directory records contain the >whole path. This makes the notional movement of files from one directory >to another rather tricky since it means rewriting the whole archive. Including updating for example all the StartAddr values in the Central Direcory. >Suppose, the central directory contained an indexed list of paths >(directories). A non-zero value for this index in a directory record would >indicate which directory the file currently belonged to. Of course, each >path entry would have to consist of three fields being > ParentIndex : Word32; { index of parent directory } > Length : Word16; > Name : variable length char string; What do mean by ParentIndex ? Do you mean to store the path Documents\ November in two entries ? >The file header need only contain the original file name. All rename/path >changes could be stored in the central directory entry exclusively. This >means that the name located in a central directory entry might be >different to that stored in the corresponding file header. This is ok but >some system of error checking would probably be a good idea. Looks good. I first thought of putting only some id if the file header, but for redandancy I later came to the conclusion that it might not be bad to have a (the original) filename available here. It might even have an advantage to have the original filename available after a rename because it gives some extra information. I also would not want to be too dependent on an index. I want to avoid a single point of failure. Some clear instructions must be included with the format how an application would need to interprete these fields. Since the id to the path index i fixed size, it could easily be put also in the file header, and updating this field on a change could be optional. (I'm not sure if this would be a good idea or not) >This also means that empty directories would no longer need to be >represented by zero-length files. I like this solution for empty directories. >This still leaves the thorny issues of adding/deleting files (space >recovery and fragmentation) but it is potentially a step towards creating >a versatile archive format. I have been checking the RAR format and if I understand correctly this is not using a Central Directory at all. It just contains different kind of blocks. New blocks are added at the end and blocks to delete are marked for deletion. On a next rewrite of the file these blocks will not be copied. (BTW. RAR is licensed, only readers can be written free) I would like to use a Central Directory though, especially for big files. However we could consider if a Central Directory could be optional for small, simple files (Z++ Light) Solutions for space reuse I have used in the past are: (mostly for object databases) 1) keep information on unused space in the header and reuse when writing new blocks. To avoid defragmentation I had a policy of taking a gap that was a close fit and big enough for the object, otherwise the object would be written at the end. This worked well for small objects, always at least 85 % of the file was used without any rewrite ever, but I'm not sure if it will work for the bigger compressed file obects 2) marking blocks for deletion and keep some empty space counter in the header, if a certain treshold is reached the file is rewritten Are there other suggestions ? Blocks I like the idea of writing blocks, each with a signature and some general block info. This helps to recover when a file gets corrupted. (same concepts are used for resynchronize in network communication) . The program can search for signatures and try to recover block by block. Basic block info could be: block signature - indicates the type of block and the (possible) start of a new block block size - for coping alien blocks (unknown to the version the program is using) on a rewrite and recovery block flags - mark for use/not in use, other flags, .. block crc - validity check of the block, also usefull for recovery ... Gerard. PS. I don't want to be on an isle, but I also don't want to polute the delphizip list. Because this list is more a support for Delphi Zip, what do you think (Eric, Russell, and others), should we discuss this things here (where the potential users of the format are) or move to another list ?