See more on Chunks in general.

This is an “Eight Dot Three Name”, used to store “8.3”-style short name aliases

Contents

What is an “Eight Dot Three” Name?

Microsoft’s DOS operating system and versions of Windows prior to 1995 had a very severe limits on names used in the file system. It is composed of 8-bit characters that may be upper-case ASCII letters, digits, characters with codes > 127, space, or any of $ % ' - _ @ ~ ` ! ( ) { } ^ # &. The file or directory name is composed on two parts. The base name is limited to 8 characters, and the extension is limited to 3 characters. The two parts are separated by a dot. There is no distinction between a name with no extension and a trailing dot or the base name without the trailing dot.

To preserve compatibility, Windows associates an alias with file names that are not compatible with the old naming system. For example, if an old program does’t accept a file name of “C:\Program Files\longname.tar.gz”, you can give it (possibly) “C:\PROGRA~1\LONGFI~1.GZ” instead. The alias is simply another name, essentially a hard link, automatically generated for every file or directory name.

If the Subtype is 70 (as opposed to 69), then the alias is necessary for the correct restoration of the file and the ZIP2 extractor should use suitable technology of the target platform and file system to emulate this (e.g. a hard link).

8D3N Chunk

8D3N Chunk structure

Flags

Flagusage
a (correlated) Set.
b (subtype) Set.
r (range of instances) all used in the general manner.
p (multi-part) & c
y (payload specification)
i (instance sizes) Cleared.
n (pointer) Cleared.
8D3N-nd has a distinct meaning, documented below.
d (redundant)

The y flag may be present. A 8D3N-r containing several instances is designed to compress well, as it will typically contain mostly repititions of the same byte.

Subtype

The Subtype may be either 69 or 70. The record structure is the same either way. This is a case where the Subtype is being used as a flag that affects all instances in a chunk that uses the -r flag to hold many small instances.

A Subtype of 69 means that the short name aliases are being provided for completeness or convenience only. A Subtype of 70, on the other hand, indicates that the aliases are really necessary for the semantics of the restored files, and it is an error not to restore the alias along with the regular name. See the usage notes for more information.

Instance Number

The Instance Number matches the Instance Number of the DATA it applies to, as with any a-flagged chunk type.

The choice between two Subtypes is being used as a flag to apply to all instances in a chunk that uses the -r flag to hold many small instances. As such, a ZIP2 archive is ill-formed if the same instance number is present twice, with different Subtypes, even though that is possible in the underlying Chunk storage system.

8D3N Chunk Payload

The Payload contains 11 bytes stating the 8 characters of the base name and 3 characters of the extension.

Note that the dot is not stored. If a field is shorter than the allowed 8 or 3 characters, it is right-padded with 0 bytes. The FAT filesystem uses single-byte characters that is a superset of ASCII, but the interpretation of characters outside the ASCII range varies with the DOS code page configured on the computer reading the files. Matching of file names is done by matching the bytes, though it would look different on different machines.

For example, the short name alias “ABCDEFGH.TXT” would be represented as hex 41 42 43 44 45 46 47 48 54 58 54, “HELLO.H” as hex 48 45 4C 4C 4F 00 00 00 48 00 00, and “DIRNAME” as hex 44 49 52 4E 41 4D 45 00 00 00 00.

In addition, the stored bytes may contain only the differences from a predicted string based on the regular name. The record is still 11 bytes, but many of the characters will be the same special code value and this will compress well.

The backslash character (‘\’), code value hex 5C, is used to mean “the predicted character in this position”.

regular namepredictionactual valuestored
nametoolong.txtNAMETO~1.TXTNAMETO~1.TXT\\\\\\\\\\\
nametoolongtoo.txtNAMETO~1.TXTNAMETO~2.TXT\\\\\\\2\\\
my π digits.datMYDIGI~1.DATPIDIGITS.DATPI\\\\TS\\\

The backslash is used as the special character because it’s the one character we positivly know cannot be part of the file name, since that is used as the directory separator. Other characters, such as those with ASCII codes below 32, are not allowed, but you never know what you might really find.

The Predicted Alias

The prediction is generated in the following manner:

First, separate the regular name at the last dot. So, “my π.yy.dat” will work on “my π.yy” as the base name and “dat” as the extension.

Next, delete all characters that are not printable ASCII characters in the range of codes hex 21 through hex 7E inclusive. Also remove all dots (code hex 2E). This gives us “myyy” and “dat”.

Truncate the base name to 6 characters and the extension to 3, if necessary.

Convert lower case letters to upper case. This gives “MYYY” and “DAT”.

If the base name is shorter than 3 characters, append “FFFF”.

Append “~1” (hex 7E 31) to the base name. This produces “MYYY~1” in our example.

Usage Notes

The Subtype indicates whether the 8.3 names are actually required for correct functioning of the extracted files. For example, suppose there are two stored files “Calc Π Digits.cpp” and “Calc Π Digits.h” The former contains the text #include "CALCDI~1.H" because the compiler or other tool didn’t like the file name that contains spaces, characters outside the current code page, or other issues. Using the 8.3 alias is a handy work-around for these things.

When the files are extracted, they will not function properly unless the .h file has the expected alias. On a Windows machine, we won’t rely on luck to get the same names they had on the original system. On a different platform, we need to emulate the feature by using a symbolic link or hard link or whatever is available.

The 8D3N chunk has a Subtype of 70 in the above scenario to indicate that establishing the alias is necessary to give the files the proper meaning. Non-Windows platforms should emulate the feature. Windows systems should give an error if the alias could not be restored properly.

On the other hand, a 8D3N chunk with a Subtype of 69 indicates a more casual use of this attribute. Perhaps the aliases are there for convienience only, or for completeness because the archive is being used as a backup. Other platforms may safely ignore 8D3N chunks of Subtype 69 without any diagnostic, and Windows platforms should generate a warning if the aliases are not restored as expected.

Of course, a stored file does not have to have an associated 8D3N record at all, in which case Windows systems will take no special steps and will naturally get whatever alias the OS assigns when restoring. This offers the possibility that when extracting multiple files, the luck-of-the-draw assignment of one file may prevent the required assignment of a subsequent file. So, it is reccomented that implementations restore files with associated 8D3N-ab(70) chunks first, then restore files with 8D3N-ab(69) files, then restore files without 8D3N records.

Examples

Need to add examples.

8D3N-nd Chunk

A portion number, as with DATA.


Valid HTML 4.01!

Page content copyright 2003 by John M. Dlugosz. Home:http://www.dlugosz.com, email:mailto:john@dlugosz.com