-
Notifications
You must be signed in to change notification settings - Fork 3
Method: CRC hash
CRC is a common method used by the Majiro engine to store text names in 4-byte or 8-byte data types, in-order to speed up comparisons and save space. Although some usages still compare text with the original data, many will only rely on comparing the hash values.
Both the CRC-32 and CRC-64 implementations used in Majiro function in the same manor. The only difference is the integer width, and polynomial value.
Standard CRC-32, as seen in Zlib.
Width: 32 bits
Polynomial: 0xEDB88320
Init/XorOut: 0xFFFFFFFF
RefIn/RefOut: true
// Calculate value of Crc32Table[index]
// Where num is an index: 0 - 255
static uint Calculate32(int index) {
const uint poly = 0xEDB88320u;
uint num = (uint)index;
for(int j = 0; j < 8; j++) {
if((num & 0x1) != 0) // Check LSB
num = (num >> 1) ^ poly;
else
num >>= 1;
}
return num;
}
public static uint Crc32(byte[] data, uint init = 0) {
var crc = ~init; // Init/XorOut
foreach(byte b in data)
crc = (crc >> 8) ^ Calculate32((byte)(crc ^ b));
return ~crc; // XorOut
}
Non-standard CRC-64!
Notes on in-engine implementation
In assembly, this uses the polynomial: 0x85E1C3D753D46D27
, and bitshifts by 1 after the XOR with poly. This behavior is identical to the normal reverse CRC-64 implementation with the common forward polynomial: 0x42F0E1EBA9EA3693
.
Width: 64 bits
Polynomial: 0x42F0E1EBA9EA3693
Init/XorOut: 0xFFFFFFFFFFFFFFFF
RefIn/RefOut: true
// Calculate value of Crc64Table[index]
// Where num is an index: 0 - 255
static ulong Calculate64(int index) {
const ulong poly = 0x42F0E1EBA9EA3693ul;
ulong num = (ulong)index;
for(int j = 0; j < 8; j++) {
if((num & 0x1) != 0) // Check LSB
num = (num >> 1) ^ poly;
else
num >>= 1;
}
return num;
}
public static ulong Crc64(byte[] data, ulong init = 0) {
var crc = ~init; // Init/XorOut
foreach(byte b in data)
crc = (crc >> 8) ^ Calculate64((byte)(crc ^ b));
return ~crc; // XorOut
}
Lookup tables are generally used with CRC to speed up processing.
static uint[] Table32 = Enumerable.Range(0, 256)
.Select(Calculate32).ToArray();
static ulong[] Table64 = Enumerable.Range(0, 256)
.Select(Calculate64).ToArray();
All usages of Calculate
can then be replaced with its respective Table
:
foreach(byte b in data)
crc = (crc >> 8) ^ Table[(byte)(crc ^ b)];
There are many cases when determining the original data used to generate a CRC hash is desirable. Most commonly when trying to understand the behavior or usage of an identifier.
Currently, CRC-64 is only used by Majiro for file names in newer Arc archive versions. These always contain the original name, so only CRC-32 reversing implementations will be used here.
static byte InverseIndex32(int msByte) {
for(int i = 0; i < 256; i++) {
if((Calculate32(i) >> 24) == (byte)msByte)
return (byte)i;
}
throw new ArgumentOutOfRangeException();
}
static byte[] Index32 = Enumerable.Range(0, 256)
.Select(InverseIndex32).ToArray();
static uint InverseCrc32(byte[] data, uint init) {
uint crc = ~init; // XorOut
foreach(byte b in data.Reverse())
uint index = Index32[crc >> 24];
crc = ((crc ^ Table32[index]) << 8) | (index ^ b);
return ~crc; // Init/XorOut
}