The fasthplus
R package provides fast approximations for metrics of
discordance or dissimilarity.
The metric G+ was introduced by W. T. Williams in 1971 as a way to measure the discordance or dissimilarity between two different classifications (where the classification consists of distance matrix and a set of predicted labels for each observation).
Here, we introduce the H+, a discordance metric modified from G+. This metric can be used (1) to evaluate the discordance between two arbitrary sets or (2) to evaluate label fitness (clustering) for a generalized dissimilarity matrix.
At present, our package is available only via github installation using the devtools
package.
library(devtools)
install_github(repo="ntdyjack/fasthplus", ref = "main")
After installation, the package can be loaded into R
library(fasthplus)
The main functions in the fasthplus package are hpe()
and hpb()
.
The hpe()
function accepts either (1) two sets (A
and B
) or
(2) a distance matrix (D
) and set of labels (L
).
With additional arguments alg
(algorithm choice) alpha
and gammas
(see vignette).
The hpb()
function accepts (1) a data matrix (D
) and set of labels (L
).
To run the hpe()
function with two sets (A
and B
) and the number of p
+ 1 percentiles:
a <- rnorm(n=500,mean=0)
b <- rnorm(n=500,mean=1)
h <- hpe(A=a,B=b,p=101,alg=1)
To run the hpe()
and hpb()
with D
(dissimilarity or data respectively) and set of labels (L
):
# Two sets
a <- sapply(1:100, function(i) rnorm(n=50,mean=0.0,sd=1))
b <- sapply(1:100, function(i) rnorm(n=50,mean=0.0,sd=1))
x <- t(cbind(a,b))
# Create a set of labels
l <- c(rep(0,100),rep(1,100))
#hpb estimate
hpb(D=x,L=l,t=10,r=10)
# Calculate dissimilarity matrix
d <- dist(x)
#hpe estimate
hpe(D=d,L=l,p=251)
Please use https://github.com/ntdyjack/fasthplus/issues to submit issues, bug reports, and comments.