[delphizip] Re: RE : New format - internal directory structure

  • From: Gerard <gslurink@xxxxxx>
  • To: delphizip@xxxxxxxxxxxxx
  • Date: Wed, 06 Nov 2002 17:11:55 +0200

At 12:01 6.11.2002 +0000, James wrote:
>In the Zip format, both the file headers and directory records contain the 
>whole path. This makes the notional movement of files from one directory 
>to another rather tricky since it means rewriting the whole archive.

Including updating for example all the StartAddr values in the Central 
Direcory.


>Suppose, the central directory contained an indexed list of paths 
>(directories). A non-zero value for this index in a directory record would 
>indicate which directory the file currently belonged to. Of course, each 
>path entry would have to consist of three fields being
>   ParentIndex : Word32; { index of parent directory }
>   Length : Word16;
>   Name : variable length char string;

  What do mean by ParentIndex ? Do you mean to store the path Documents\ 
November in two entries ?


>The file header need only contain the original file name. All rename/path 
>changes could be stored in the central directory entry exclusively. This 
>means that the name located in a central directory entry might be 
>different to that stored in the corresponding file header. This is ok but 
>some system of error checking would probably be a good idea.

Looks good. I first thought of putting only some id if the file header, but 
for redandancy I later came to the conclusion that it might not be bad to 
have a (the original) filename available here. It might even have an 
advantage to have the original filename available after a rename because it 
gives some extra information. I also would not want to be too dependent on 
an index. I want to avoid a single point of failure.
Some clear instructions must be included with the format how an application 
would need to interprete these fields.
Since the id to the path index i fixed size, it could easily be put also in 
the file header, and updating this field on a change could be optional. 
(I'm not sure if this would be a good idea or not)

>This also means that empty directories would no longer need to be 
>represented by zero-length files.

I like this solution for empty directories.


>This still leaves the thorny issues of adding/deleting files (space 
>recovery and fragmentation) but it is potentially a step towards creating 
>a versatile archive format.


I have been checking the RAR format and if I understand correctly this is 
not using a Central Directory at all. It just contains different kind of 
blocks. New blocks are added at the end and blocks to delete are marked for 
deletion. On a next rewrite of the file these blocks will not be 
copied.  (BTW. RAR is licensed, only readers can be written free)

I would like to use a Central Directory though, especially for big files. 
However we could consider if a Central Directory could be optional for 
small, simple files  (Z++ Light)

Solutions for space reuse I have used in the past are: (mostly for object 
databases)

1) keep information on unused space in the header and reuse when writing 
new blocks. To avoid defragmentation I had a policy of taking a  gap that 
was a close fit and big enough for the object, otherwise the object would 
be written at the end. This worked well for small objects, always at least 
85 % of the file was used without any rewrite ever, but I'm not sure if it 
will work for the bigger compressed file obects

2) marking blocks for deletion and keep some empty space counter in the 
header, if a certain treshold is reached the file is rewritten

Are there other suggestions ?

Blocks
I like the idea of writing blocks, each with a signature and some general 
block info. This helps to recover when a file gets corrupted. (same 
concepts are used for resynchronize in network communication) . The program 
can search for signatures and try to recover block by block. Basic block 
info could be:
block signature - indicates the type of block and the (possible) start of a 
new block
block size - for coping alien blocks (unknown to the version the program is 
using) on a rewrite and recovery
block flags - mark for use/not in use, other flags, ..
block crc   - validity check of the block, also usefull for recovery
...

Gerard.

PS.  I don't want to be on an isle, but I also don't want to polute the 
delphizip list. Because this list is more a support for Delphi Zip, what do 
you think (Eric, Russell, and others), should we discuss this things here 
(where the potential users of the format are) or move to another list ?




Other related posts: