Skip to content

Latest commit

 

History

History
95 lines (52 loc) · 6.4 KB

REFERENCE.md

File metadata and controls

95 lines (52 loc) · 6.4 KB

Types

hll

The HLL data structure. Casts between bytea and hll are supported, should you choose to generate the contents of the hll outside of the normal means. See STORAGE.markdown.

SELECT hll_cardinality(E'\\xDEADBEEF');

OR

SELECT hll_cardinality(E'\\xDEADBEEF'::hll);

hll_hashval

Represents a hashed data value. Backed by a 64-bit integer (int8in). Typically only output by the hll_hash_* functions. bigint and integer can both be cast to it if you want to skip hashing those values with the typical 123::hll_hashval. Note that an integer that is cast will also be cast, with sign extension, to a 64-bit integer.

Defaults Functions

All defaults for the hll_empty and hll_add_agg functions are in the C file, not in the SQL control file. The defaults can be changed (per connection) with:

SELECT hll_set_defaults(log2m, regwidth, expthresh, sparseon);

This returns a 4-tuple with the values of the prior defaults in the same order as the arguments.

Basic Operational Functions

hll_cardinality(hll) - returns NULL if the hll's type is UNDEFINED. Returns a double precision floating point value otherwise. The prefix operator # may be used as shorthand.

hll_union(hll, hll) - returns the union (as an hll) of two hlls. The infix operator || may be used as shorthand.

hll_add(hll, hll_hashval) - adds the hll_hashval to the hll and returns the new representation of the hll. The infix operator || may be used as shorthand, like hll || hll_hashval or hll_hashval || hll.

hll_empty([log2m[, regwidth[, expthresh[, sparseon]]]]) - returns an empty hll of the specified parameters. Any number of the parameters may be left blank and the default values will be used. See hll_set_defaults.

hll_eq(hll, hll) - returns a boolean indicating whether the two hlls match when their binary representations are compared. The infix operator = may be used as shorthand.

hll_ne(hll, hll) - returns a boolean indicating whether the two hlls do not match when their binary representations are compared. The infix operator <> may be used as shorthand.

hll_union_agg(hll) - aggregate function for hlls that unions the hlls in the input set and returns the hll representing their union.

hll_add_agg(hll_hashval, [log2m[, regwidth[, expthresh[, sparseon]]]]) - aggregate function for hll_hashvals that inserts each element in the input set into an hll whose parameters are specified by the four optional arguments. If any of the four optional arguments are not specified, the defaults set with hll_set_defaults() will be used. Returns the hll representing the input set.

Debugging Functions

hll_print(hll) - pretty-prints the hll in a different way based on its type.

Metadata Functions

hll_schema_version(hll) - returns the schema version value (integer) of the hll.

hll_type(hll) - returns the schema version-specific type value (integer) of the hll. See the storage specification (v1.0.0) for more details.

hll_regwidth(hll) - returns the register bit-width (integer) of the hll.

hll_log2m(hll) - returns the log-base-2 of the number of registers of the hll. If the hll is not of type FULL or SPARSE it returns the log2m value which would be used if the hll were promoted.

hll_expthresh(hll) - returns a 2-tuple of the specified and effective EXPLICIT promotion cutoffs for the hll. The specified cutoff and the effective cutoff will be the same unless expthresh has been set to 'auto' (-1). In that case the specified value will be -1 and the effective value will be the implementation-dependent number of explicit values that will be stored before an EXPLICIT hll is promoted.

hll_sparseon(hll) - returns 1 if the SPARSE representation is enabled for the hll, and 0 otherwise.

Override Functions

SELECT hll_set_output_version(int) - sets the output schema version to the specified value and returns the previous value. The value set only applies within your connection.

SELECT hll_set_max_sparse(int) - sets the maximum number of materialized registers in a SPARSE hll before it is promoted to a FULL hll for all hlls that have sparseon enabled. If -1 is provided, the cutoff will be determined based on storage efficiency and is implementation-dependent. If 0 is provided, the SPARSE representation will be skipped and FULL will be used instead. If any value greater than zero or less than 2^log2m is provided, promotion will occur after that number of materialized registers. If any value greater than or equal to 2^log2m is used, promotion to FULL will never occur.

Hash Functions

All values inserted into an hll should be hashed, and as a result hll_add and hll_add_agg only accept hll_hashvals. We do not recommend hashing floating point values raw as their bit-representation is not well-suited to hashing. Consider converting them to a reproducible, comparable binary representation (such as the IEEE 754-2008 interchange format) before hashing.

All the hll_hash_* functions below accept a seed value, which defaults to 0. We discourage negative seeds in order to maintain hashed-value compatibility with the Google Guava implementation of the 128-bit version of Murmur3. Negative hash seeds will produce a warning when used.

hll_hash_boolean(boolean) - hashes the boolean value into a hll_hashval.

hll_hash_smallint(smallint) - hashes the smallint value into a hll_hashval.

hll_hash_integer(integer) - hashes the integer value into a hll_hashval.

hll_hash_bigint(bigint) - hashes the bigint value into a hll_hashval.

hll_hash_bytea(bytea) - hashes the bytea value into a hll_hashval.

hll_hash_text(text) - hashes the text value into a hll_hashval.

hll_hash_any(scalar) - hashes any PG data type by resolving the type dynamically and dispatching to the correct function for that type. This is significantly slower than the type-specific hash functions, and should only be used when the input type is not known beforehand.