Rank-based outcome space #421
Labels
encodings
Related to the Encodings API
enhancement
New feature or request that is non-breaking
new outcome space
Describe the feature you'd like to have
While implementing the Chatterjee coefficient of correlation in Associations.jl, I needed to compute ranks of an input vector, according to some method. That made me think: why not have an encoding and an outcome space in ComplexityMeasures.jl that does the same? I imagine something like this:
The encoding takes as an input the input data, which allows a mapping encoding the raw values
xi
onto an indexidx_i
, and decode that index ontoxi
. Iflength(x) == length(rank(x))
, then there are as many outcomes as there are values inx
(so the outcome space is a bit useless for complexity quantification). However, in datasets with repetitions, we can havelength(rank(x)) << length(x)
. Then each rank index is an outcome, and we meaningfully estimate counts/probabilities by counting how many input data points map to each rank index.Cite scientific papers related to the feature/algorithm
I have no references atm. This may or may not have been done by someone before.
If possible, sketch out an implementation strategy
Relatively straight-forward: follow the dev docs on implementing the encoding and outcome space.
A few notes:
T <: RankType
). StatsBase.jl implements a few such ranking methods we can use for inspiration. I also implemented a randomization-tie-breaking variant for the Chatterjee coefficient that we would use.The text was updated successfully, but these errors were encountered: