KHSH Chunk

KHSH stands for “Keyed Hash”. It is used as an authentication/validation mechanism.

See more on Chunks in general.

Contents

Algorithm for Keyed Hash

Updated 18 September 2003

The keyed hash value is computed using CrypHash (K || m) where K is the key, m is the content to perform the keyed hash upon, and || is concatenation. This is used instead of the traditional HMAC (RFC 2104) because of uniformity. By using CrypHash everywhere, this documentation does not have to explain whether/how an HMAC is affected by a global change to the hashing algorithm used. The design of HMAC avoided problems with existing hash functions, and the definition of CrypHash avoids the same issues. CrypHash is defined as SHAd-256, which hashes the result of hashing the message. HMAC does the same thing, with the addition of using the key again for the second hash. Using the key again is to avoid collisions, needed when hash functions were 128 bits. Using a 256-bit hash primitive avoids this need.

A message shorter than MinMessSize=1024 bytes is iterated through the keyed hash several times. This is to prevent a keyed hash on a very short chunk from facilitating a rapid brute-force search for a password. That is, it guarantees a minimum amount of work required to verify that a key is correct, but doesn’t have a penalty if the message is already longer.

Formally, let the number of iterations N equal the smallest integer so that the message length times N ≥ MinMessSize. Let K0= the K presented as input. Kn= CrypHash (Kn-1 || m). Return KN. Note that for longer messages, N becomes 1 and the function is exactly the same as the simpler case presented initially.

Here is the same definition as pseudocode:

	function KHSH ( K, m )
		N ← round_up_to_integer (length(m) ÷ MinMessSize)
		repeat N times
			K ← CrypHash (concatenate (K, m))
		return K

KHSH-ab Chunk for extracted file

Flags

Flagusage
a (correlated) Set. If cleared, see plain KHSH instead.
b (subtype) Must be set.
r (range of instances) Allowed.
i (instance sizes) Set when r is set.
p (multi-part) & c used in the general manner.
y (payload specification)
n (pointer) KHSH-abnd points to the portion file that contains the chunk, in the usual manner.
d (redundant)

Subtype

Subtypedescription
66Chunk contains the keyed hash of the associated DATA instance.
67Chunk contains a list of required Chunk IDs and a computed keyed hash value for their combined content.
68Chunk lists instance numbers of other (non-a-flagged) KHSH chunks.

Instance Number

The Instance Number matches the Instance Number of the DATA it applies to, as with any a-flagged chunk type.

KHSH-a Chunk Payload (subtype 66)

The Payload contains the key instance for the hash, and the keyed hash of the data.

Key Instance

This is a uintV indicating the instance number of a KEYD (key definition) chunk.

Keyed Hash Value

The hash is computed over the original contents of the DATA instance, before it is compressed and otherwise transformed and broken into one or more chunks. That is, compute the hash on the file after extracting it and compare against this value.

The result of the hashing algorithm may be truncated to any desired length greater than 3. The size of the payload defines the significance stored, as the Keyed Hash Value takes up the remaining space in the payload. You can thus trade off security for file size. However, an implementation must provide a mechanism to establish a minimum-acceptable length, and KHSH values shorter than this will be considered un-authenticated.

KHSH-ab(67) Chunk Payload

The payload is structured exactly the same as the plain KHSH-ab(67) chunk, consisting of a Key Instance, a list of Chunk IDs and the keyed hash value for the combined Chunks.

The difference is that this -a form is tied to the matching DATA instance, and the list of Chunk IDs indicates chunks that must be present (and correct) when extracting this file from the archive.

KHSH-ab(68) Chunk Payload

This does not contain a keyed hash value, but rather provides a level of indirection. It serves to associate multiple KHSH records with the same DATA chunk. Using KHSH-ab(67), there could only be one such chunk with the corresponding instance number. Plain KHSH-ab(67) is the same type of record but with an arbitrarily-assigned instance number. You can have as many of those as you like. This (subtype 68) form lists which plain ones apply.

The payload contains 1 or more uintV values, which are interpreted as instance numbers of plain KHSH-ab(67) chunks. The union of all the Chunk IDs listed in those indirect chunks is the list of chunks that must be present (and correct).

The same Chunk ID may be listed in multiple records, both refered to from this chunk. One use is to give the same list of Chunk IDs in each record, but computed for different keys. Another use is to break up the list of chunks into smaller lists.

KHSH Chunk for other target chunks

Flags

Flagusage
a (correlated) Cleared. If set, see KHSH-a instead.
b (subtype) Must be set.
r (range of instances) Allowed.
i (instance sizes) Set when r is set.
p (multi-part) & c used in the general manner.
y (payload specification)
n (pointer) KHSH-bnd points to the portion file that contains the chunk, in the usual manner.
d (redundant)

Subtype

Set to 67.

Instance Number

The Instance Number is matched against values listed in KHSH-ab(67) chunks. Instance numbers hex 40 through hex FFF (64 through 4095) are used to mean global or common KHSH chunks that are always checked, without needing to be listed as part of the associated KHSH-a chunk.

KHSH Chunk Payload

The Payload contains a key instance for the hash, a list of target chunk IDs, a key ID, and a computed keyed hash value for the target chunks.

Key Instance

This is a uintV indicating the instance number of a KEYD (key definition) chunk.

Target Chunk ID List

This contains a uintV indicating the number of Chunk IDs that follow, followed by that may Chunk IDs. Each Chunk ID is identical to the Chunk ID definition used by the TOCN chunk.

Keyed Hash Value

The hash algorithm is applied over the entire content of the target chunks, from the first byte of the size through the checksum inclusive. It is computed as if the target chunks are concatenated together in the order in which they are listed in the Chunk ID List.

The output may be truncated to any desired length greater than 3. The size of the payload defines the significance stored, as the Keyed Hash Value takes up the remaining space in the payload. You can thus trade off security for file size. However, an implementation must provide a mechanism to establish a minimum-acceptable length, and KHSH values shorter than this will be considered un-authenticated.

Usage for all KHSH forms

Usage Notes

The proper use of KHSH chunks is necessary for proper security when using encrypted data chunks. Because the nature of the ZIP2 file is to have everything separate and related without explicit cross-references, the KHSH is used to tie all related chunks together to detect tampering.

When extracting a file from the archive, suppose DATA#42 is used for the file data. To perform the security check, look for KHSH-ab#42. This will contain (possibly with more indirection) a list of all related chunks and authentication for those chunks.

To use this security feature, follow two rules:

  1. Assure that every chunk mentioned by the KHSH-ab#42 chunk exists and gives a correct keyed hash value; and
  2. Assure that every chunk touched in the process of restoring file #42 is authenticated in some manner. Having a valid KHSH using the same base key that was used to encrypt DATA#42 is one way to authemticate a chunk.

An archive is ill-formed if there are multiple KHSH-ab#42 chunks with subtype 67 and 68, even though it may be legal in the underlying file structure since the chunks have different subtypes. However, there may exist an KHSH-ab(66)#42 as well.

If KHSH-ab#42 is found with subtype 67, then that chunk directly contains the list of manditory chunks that must be checked. If instead KHSH-ab#42 is found with subtype 68, there is another layer of indirection, and note that this indirect list is subject to rule 2 above.

There may be any number of KHSH-ab chunks, possibly checking the same chunks with same or different keys. However, an implementation will not necessarily be aware of them for purposes of meeting Rule 2 above. Rather, the process is kicked off by the KHSH-a chunk with the matching instance number. It doesn’t scan/verify all KHSH chunks just because they are there (it does check KHSH-ab(67) with Instance Numbers 64 through 4095).

The same chunk ID may be listed in different KHSH chunks, using different keys. Only one of them needs to have the key available for Rule 1 above to be satisfied. If a keyed hash is listed with additional keys and those keys are not accessible (e.g. password not entered), there is no issue. However, if multiple keyed hash values are given with known keys, all the ones that can be checked (because the key is accessible) must be checked and if any one fails the chunk is considered to have failed authentication.

If the DATA#42 chunk (or chunks, if the p (multi-part) flag is used) is not mentioned among the chunk IDs listed (possibly indirectly) by the KHSH-ab#42 chunk, then the KHSH-ab(66)#42 is implicitly needed. If all relevant DATA chunks are listed, then an KHSH-ab(66)#42 is not required, but is still verified if present and uses a known key.

Examples

Example 1

There exists a Chunk with ID KHSH-ab(67)#42. It lists the IDs of SECW#42, DATA#42, and TIME#42, and the keyed hash for the content of those chunks performed using the same key that is used to encrypt DATA#42.

When the user extracts the file corresponding to DATA#42, the program looks for KHSH-ab(either 67 or 68)#42, so finds this chunk. The three nominated chunks (SECW#42, DATA#42, and TIME#42) are read and the keyed hash verified. If those three chunks don’t exist, it is a security error. Note that DATA#42 might not exist, even though that’s what started the search! The check is done against an actual physical chunk with the exact matching ID. The DATA#42 information might be stored as part of a DATA-r#40-45, or there may be several DATA-p#42 chunks, or both r (range) and p (part) features could be used together for an even more complex situation. In these cases, the content of DATA#42 isn’t stored in a single physical chunk with that ID. So in this example, the extractor is expecting a single DATA#42 chunk, and if it finds something else instead it is an error.

If the extraction program encounters other chunks that are associated with DATA#42 (because the -a flag is set and the instance number matches), this could also be a security issue, but might not be. If the content of that chunk does not affect the writing of the extracted file, then it is not an error. For example, an ICON or CMNT is used by the user interface when viewing the archive, but does not affect the restored file. UNIX and MACF chunks are ignored by this extractor running on a Windows system, so don’t affect the extracted file.

In general, note that chunks having Type 1 are things that are restored as part of the file, though may be applicable to different platforms or specifically disabled by the user. Chunks having Type 7 are not restored as part of the file.

If the extraction were more complex, and referred to a FORM chunk, this would be a security error because the FORM chunk is not on the list and not authenticated.

What about the INDX and ROOT chunks? These are used to formulate the file name to restore to, and it could be bad news to restore to the wrong file, perhaps overwriting something important. Those chunks were not mentioned in the KHSH-ab(67)#42. They certainly could have been, but that would be inefficient since every DATA chunk would need to re-scan the same INDX, and the ID would need to be present in every KHSH-a chunk, increasing its size.

So, those common chunks can be checked using a global KHSH. The extractor program sees a chunk KHSH-ab(67)#64, and sure enough it lists INDX#0 and ROOT#0, and gives a keyed hash for them. So how did the extractor know to check KHSH-ab(67)#64 without checking every KHSH chunk in the archive? Because KHSH-a (without the -a flag set) reserves instance numbers in the usual reserved range (hex 40 through hex FFF (64 through 4095)) for common things that should always be checked. #64 was the first one it looked at, so the search was brief.

Since we said the DATA chunk is encrypted, there must be a KEYD chunk involved somewhere. The KEYD chunks will typically be authenticated with global KHSH-ab(67) chunks on the same key.

Example 1b

Suppose the situation in Example 1 were the same, except that DATA#42 was not encrypted. Then, it is not a security error if all the listed chunks are not validated, though the implementation can certainly still treat that as a warning. The implementation is not required to do this validation at all, or even solicit the key from the user, though the user may provide the key and have the implementation validate the contents anyway. Note that digital signatures (SIGN chunks) also provide authentication, so if you want to authenticate without encrypting, this might be a better choice. Using a SIGN for this purpose also allows the authentication to take place without prompting for a key.

Example 2

DATA chunks not listed, checked implicitly via b(66).


Valid HTML 4.01!

Page content copyright 2003 by John M. Dlugosz. Home:http://www.dlugosz.com, email:mailto:john@dlugosz.com