ENDX Chunk

See more on Chunks in general.

This is a “end of file”, used to facilitate embedding a ZIP2 file inside a larger data stream, especially to append the data file to an executable to produce a self-extracting executable that can still be recognised as a ZIP2 file and manipulated.

When a ZIP2 manipulation program is presented a file that doesn’t begin with the expected non-chunk header and signature, it can still easily see if this ENDX Chunk is at the end of the file, and from there unambiguously locate the beginning of the archive data. This is designed for self-extracting archive programs.

ENDX Chunk structure

Special Rule

This must be the last chunk in a file. By definition, it is the last chunk, since an implementation must stop reading when ENDX is encountered.

Size

The Size must use the 1-byte encoding form (so, it’s limited to 127 bytes).

Flags

The d (redundant) Flag may be set. All other flags must be cleared. So, there can be no payload specification, no continuation, etc.

The meaning of the d (redundant) flag is only significant when the zip2 file is embedded in a larger file. That is, the offset doesn’t equal the file seek position, or this ENDX chunk is not the physical end of the file.

If the d (redundant) flag is set, then a program that manipulates the zip2 file can assume that it may alter the file, including insertions and deletions as long as the stuff before and after the zip2 data stream is kept intact. Anything after the ENDX is moved as the file is lengthened or shortened, so whatever immediatly followed the ENDX continues to do so. A self-extracting archive will be so-flagged, so a program may update the archive without destroying the self-extracting features, even if the attached executable is written for a different platform.

In contrast, if the d flag is not set, then a program may not alter the zip2 data without special knowledge of the surrounding context. That is, changing the zip2 content presumably requires changing stuff in the sourrounding data as well.

Instance Number

The Instance Number must be 0. There is only one instance per archive file (in a multi-part archive, each portion file can have its own).

ENDX Chunk Payload

The Payload is a single uintV containing this ENDX chunk’s own offset. Knowing this and the actual file seek position at which it was found, a program can know the base position of the archive within the larger file.

This is used for two purposes. If the zip2 data is not at the beginning of the physical file but is at the physcial end of the file (that is, there is stuff before it but not after it) then a program can easily find the beginning without scanning the whole file for the signature.

If there is stuff after the zip2 content, then the ENDX chunk tells the program where to stop. The non-chunk header also indicates the logical end of the data stream, so either mechanism is optional and the other will still allow file content to follow the zip2 data.

Usage Notes

To look for the archive data in a self-extracting archive (or other case where the file doesn’t begin with the expected signature), look at the last few bytes of the file for the ENDX chunk.

The size of the ENDX Chunk must be at least 6 bytes, and should only be several bytes longer to encode the value in the Payload, and the number so-contained is necessarily smaller than the file size. If an implementation is prepared to cope with 64-bit file sizes, that would use (at largest) a 9-byte form for this value, so the ENDX Chunk must be between 6 and 14 bytes long.

The first byte of the ENDX Chunk will be the Chunk’s own (remaining) length, which is known to end at the physical end of file. So, scan backwards from EOF-6 through EOF-14 until you find a byte equal to (N-1) at position (filesize-N).

Once a candidate length byte is found, check that the next 3 bytes are 10 04 00 (for ENDX-d) or 00 04 00 (for d flag cleared). If that is found, then validate the Checksum. If that checks, you found the chunk at the end of the file. Otherwise, keep scanning as the length byte may have been a false hit.

Of course, a file that is not a ZIP2 file at all may conincedently end in this pattern, too. So, the program shouldn’t accept it until the contents of the Payload are found to make sence; that is, to point to the signature at the beginning of the archive data.

A ENDX-d chunk’s Payload data value must match the "Offset to ENDX" value in the non- chunk header, if present. A plain ENDX (no d flag) might play tricks with this value: Imagine the file contains [ stuff1] [ZIP2 archive] [stuff2] [fake ENDX]. The Fake ENDX points to the beginning of the ZIP2 archive data, but that states the real end, before stuff2. So, the end as noted in the non-chunk header or a (different) scanned ENDX chunk should take precedence over the physical end-of-file containing an ENDX pointer. For an ENDX-d, no such tricks are allowed and both values must match.

Examples

Here is a ZIP2 file that is concatenated after a 4K program file to make a self-extracting archive.

0000: code file
 ...
1000: 5A 49 50 32   ; "ZIP2"
1004: 1A 04 FE  ; seperator
1007: 92 34  ; offset of ENDX= 0x1234
1009: 00  ; offset to TOCN not given
 ...
2234: 06  ; 5 more bytes in the chunk
2235: 10 04  ; ENDX-d
2237: 00  ; no Instance number
2238: 92 34  ; my own relative offset
223a: AD  ; checksum
=== EOF.  File length is 0x223B bytes.

Valid HTML 4.01!

Page content copyright 2003 by John M. Dlugosz. Home:http://www.dlugosz.com, email:mailto:john@dlugosz.com