Skip to content

Method: CRC hash

Robert Jordan edited this page May 14, 2021 · 3 revisions

Method: CRC hash

CRC is a common method used by the Majiro engine to store text names in 4-byte or 8-byte data types, in-order to speed up comparisons and save space. Although some usages still compare text with the original data, many will only rely on comparing the hash values.

Implementation

Both the CRC-32 and CRC-64 implementations used in Majiro function in the same manor. The only difference is the integer width, and polynomial value.

CRC-32

Standard CRC-32, as seen in Zlib.

Width: 32 bits
Polynomial: 0xEDB88320
Init/XorOut: 0xFFFFFFFF
RefIn/RefOut: true

// Calculate value of Crc32Table[index]
// Where num is an index: 0 - 255
static uint Calculate32(int index) {
    const uint poly = 0xEDB88320u;
    uint num = (uint)index;
    for(int j = 0; j < 8; j++) {
        if((num & 0x1) != 0)  // Check LSB
            num = (num >> 1) ^ poly;
        else
            num >>= 1;
    }
    return num;
}

public static uint Crc32(byte[] data, uint init = 0) {
    var crc = ~init;  // Init/XorOut
    foreach(byte b in data)
        crc = (crc >> 8) ^ Calculate32((byte)(crc ^ b));
    return ~crc;  // XorOut
}

CRC-64

Non-standard CRC-64!

Notes on in-engine implementation

In assembly, this uses the polynomial: 0x85E1C3D753D46D27, and bitshifts by 1 after the XOR with poly. This behavior is identical to the normal reverse CRC-64 implementation with the common forward polynomial: 0x42F0E1EBA9EA3693.

Width: 64 bits
Polynomial: 0x42F0E1EBA9EA3693
Init/XorOut: 0xFFFFFFFFFFFFFFFF
RefIn/RefOut: true

// Calculate value of Crc64Table[index]
// Where num is an index: 0 - 255
static ulong Calculate64(int index) {
    const ulong poly = 0x42F0E1EBA9EA3693ul;
    ulong num = (ulong)index;
    for(int j = 0; j < 8; j++) {
        if((num & 0x1) != 0)  // Check LSB
            num = (num >> 1) ^ poly;
        else
            num >>= 1;
    }
    return num;
}

public static ulong Crc64(byte[] data, ulong init = 0) {
    var crc = ~init;  // Init/XorOut
    foreach(byte b in data)
        crc = (crc >> 8) ^ Calculate64((byte)(crc ^ b));
    return ~crc;  // XorOut
}

Tables

Lookup tables are generally used with CRC to speed up processing.

static uint[]  Table32 = Enumerable.Range(0, 256)
                   .Select(Calculate32).ToArray();

static ulong[] Table64 = Enumerable.Range(0, 256)
                   .Select(Calculate64).ToArray();

All usages of Calculate can then be replaced with its respective Table:

foreach(byte b in data)
    crc = (crc >> 8) ^ Table[(byte)(crc ^ b)];

Reverse engineering

There are many cases when determining the original data used to generate a CRC hash is desirable. Most commonly when trying to understand the behavior or usage of an identifier.

Currently, CRC-64 is only used by Majiro for file names in newer Arc archive versions. These always contain the original name, so only CRC-32 reversing implementations will be used here.

static byte InverseIndex32(int msByte) {
    for(int i = 0; i < 256; i++) {
        if((Calculate32(i) >> 24) == (byte)msByte)
            return (byte)i;
    }
    throw new ArgumentOutOfRangeException();
}

static byte[] Index32 = Enumerable.Range(0, 256)
               .Select(InverseIndex32).ToArray();

static uint InverseCrc32(byte[] data, uint init) {
    uint crc = ~init;  // XorOut
    foreach(byte b in data.Reverse())
        uint index = Index32[crc >> 24];
        crc = ((crc ^ Table32[index]) << 8) | (index ^ b);
    return ~crc;  // Init/XorOut
}

See also

External links

Clone this wiki locally