DATA Chunk

See more on Chunks in general.

See the DATA-nd chunk.

The DATA chunk holds the actual content of an archived file. Generally1, each file in the archive is stored as one DATA instance.

For example, suppose a ZIP2 archive is created with three files: foo.txt, bar.txt, and baz.txt. The archiver would create three instances of DATA, and store the complete contents of foo.txt in DATA#1, store bar.txt in DATA#2, and baz.txt in DATA#3. These may then be emitted as three separate chunks, each with its payload compressed and/or encrypted. Or, the orignal data may be concatenated together and compressed/encrypted as one chunk labeled as DATA#1-3.

Or, perhaps DATA#1 is too large to fit on the specified media even after compressing, so it is split into two chunks, DATA#1p1 and DATA#1p2, each of which is compressed/encrypted independantly and emitted into different parts of a split archive.

The point is, each file is stored in one DATA instance, not necessarily one physical chunk having one chunk header. Instances may be combined or split, so any number of physical chunks is possible. But there is still logically one DATA#1 instance in the archive containing the contents of foo.txt, regardless of whether it was “solidly packed” with other DATA instances, or split into many smaller parts.

The association of a stored file in the archive to a distinct DATA instance number is a fundimental principle of the ZIP2 archive structure. Other information, such as the file name and attributes, will be associated using this same number.

DATA Chunk structure

Flags

Flagusage
a (correlated) Cleared.
b (subtype) Cleared.
r (range of instances) all used in the general manner.
p (multi-part) & c
y (payload specification)
i (instance sizes) Set when r is set, Clear otherwise.
n (pointer) Cleared.
DATA-nd has a distinct meaning, documented below.
d (redundant)

The y flag is typically present, as this payload contains the principle data (the file content) that is to be compressed and/or encrypted.

DATA-nd Chunk

DATA-nd indicates that the specified instances are in a different file in a multi-part archive.

Flags

DATA-nd may only have the r (range) flag set, besides the n and d flags. When the n flag is used, the d flag must also be used.

DATA-nd Chunk Payload

The Payload contains a single uintV that specifies the file number of the multi-part archive that is expected to contain all the instances noted by this chunk.

Usage Notes

The DATA-nd chunk is used to inform the program that DATA chunks may be found in a different part of a multi-part archive, without having to scan all the files first. So, if the user invokes the program giving it one disc in the set, and the part on that disc contains a DATA-nd#45 for example, stating that DATA#45 may be expected in file 47, then the implementation can prompt for disc number 47 and not have to ask for every disc in turn until it finds the right one.

Note that this is only a hint. If the program scans part 47 and does not find DATA#45 in it, then it is not an error. Perhaps that file was updated and not all the discs were rewritten. To this end, file 47 of the multi-part archive may have another DATA-nd#45 as a “forwarding address”, and the user is again prompted for a different disc. Or, with no further information to go on, the program has no choice but to start looking at all the discs in turn until it finds it, which is exactly what it would have to do if there was no DATA-nd chunk in the first place.

An implementation that chases forwarding DATA-nd chunks must watch for cycles and treat a DATA-nd instance that closes a cycle as if there was no such DATA-nd instance (that is, it gives up).

An implementation that prompts for media change must provide a way for the user to say “I don’t have it”. For DATA-nd chunks, cancelling the media change or not finding an expected file must be treated similarly to not finding the corresponding DATA chunk, as opposed to being a fatal error. That is, the implementation can give up and search in other files.

It is suggested that DATA-nd chunks be written to the same file in a multi-part archive that contains the INDX#0 chunk.

Examples

For example, the following chunk states that DATA#17 can be expected in file number 3.

05  ; size of chunk
18 01  ; DATA-nd type
11  ; Instance #17
03  ; Payload is the number 3
2E  ; checksum

This one states that DATA#1 through DATA#42 inclusive may be found in file number 4. Note that file number 4 of the multi-part archive doesn’t have to contain a corresponding DATA-r#1-42 chunk, but can contain individual chunks for each instance or whatever.

06   ; size of chunk
98 01  ; DATA-ndr
01 2A  ; Instances #1 through 42
04  ; Payload is the number 4 
70  ; checksum

Footnotes

Each file in the archive is stored as one DATA instance

It’s also possible for a compression or encryption algorithm to use additional DATA instances, so each entry actually points to a main DATA instance but may refer to others as well.


Valid HTML 4.01!

Page content copyright 2003 by John M. Dlugosz. Home:http://www.dlugosz.com, email:mailto:john@dlugosz.com