Injecting StatsBase functions into any SQLite database in Julia.
Makes it possible to call
select MEDIAN(fieldname) from tablename
in Julia where median is defined in Julia and related packages and the function is injected to use within SQLite. Database file is not modified.
julia> using Pkg
julia> Pkg.add("Sqlite3Stats")
using SQLite
using Sqlite3Stats
using DataFrames
# Any SQLite database
# In our case, it is dbfile.db
db = SQLite.DB("dbfile.db")
# Injecting functions
Sqlite3Stats.register_functions(db)
using SQLite
using Sqlite3Stats
using DataFrames
db = SQLite.DB("dbfile.db")
# Injecting functions
Sqlite3Stats.register_functions(db)
# 1st Quartile
result = DBInterface.execute(db, "select Q1(num) from table") |> DataFrame
# 2st Quartile
result = DBInterface.execute(db, "select Q2(num) from table") |> DataFrame
# Median (Equals to Q2)
result = DBInterface.execute(db, "select MEDIAN(num) from table") |> DataFrame
# 3rd Quartile
result = DBInterface.execute(db, "select Q3(num) from table") |> DataFrame
# QUANTILE
result = DBInterface.execute(db, "select QUANTILE(num, 0.25) from table") |> DataFrame
result = DBInterface.execute(db, "select QUANTILE(num, 0.50) from table") |> DataFrame
result = DBInterface.execute(db, "select QUANTILE(num, 0.75) from table") |> DataFrame
# Covariance
result = DBInterface.execute(db, "select COV(num, other) from table") |> DataFrame
# Pearson Correlation
result = DBInterface.execute(db, "select COR(num, other) from table") |> DataFrame
# Spearman Correlation
result = DBInterface.execute(db, "select SPEARMANCOR(num, other) from table") |> DataFrame
# Kendall Correlation
result = DBInterface.execute(db, "select KENDALLCOR(num, other) from table") |> DataFrame
# Median Absolute Deviations
result = DBInterface.execute(db, "select MAD(num) from table") |> DataFrame
# Inter-Quartile Range
result = DBInterface.execute(db, "select IQR(num) from table") |> DataFrame
# Skewness
result = DBInterface.execute(db, "select SKEWNESS(num) from table") |> DataFrame
# Kurtosis
result = DBInterface.execute(db, "select KURTOSIS(num) from table") |> DataFrame
# Geometric Mean
result = DBInterface.execute(db, "select GEOMEAN(num) from table") |> DataFrame
# Harmonic Mean
result = DBInterface.execute(db, "select HARMMEAN(num) from table") |> DataFrame
# Maximum absolute deviations
result = DBInterface.execute(db, "select MAXAD(num) from table") |> DataFrame
# Mean absolute deviations
result = DBInterface.execute(db, "select MEANAD(num) from table") |> DataFrame
# Mean squared deviations
result = DBInterface.execute(db, "select MSD(num) from table") |> DataFrame
# Mode
result = DBInterface.execute(db, "select MODE(num) from table") |> DataFrame
# WMEAN for weighted mean
result = DBInterface.execute(db, "select WMEAN(num, weights) from table") |> DataFrame
# WMEDIAN for weighted mean
result = DBInterface.execute(db, "select WMEDIAN(num, weights) from table") |> DataFrame
# Entropy
result = DBInterface.execute(db, "select ENTROPY(probs) from table") |> DataFrame
# Slope (a) of linear regression y = b + ax
result = DBInterface.execute(db, "select LINSLOPE(x, y) from table") |> DataFrame
# Intercept (b) of linear regression y = b + ax
result = DBInterface.execute(db, "select LININTERCEPT(x, y) from table") |> DataFrame
This family of functions implement QXXX(), PXXX(), and RXXX() for a probability density or mass function XXX. Q for quantile, p for propability or cdf value, R for random number.
QNORM(p, mean, stddev)
returns the quantile value
whereas
PNORM(q, mean, stddev)
returns
and RNORM(mean, stddev)
draws a random number from a Normal distribution with mean mean
( stddev
(
and
# Quantile of Normal Distribution with mean 0 and standard deviation 1
result = DBInterface.execute(db, "select QNORM(0.025, 0.0, 1.0) from table") |> DataFrame
# Probability of Normal Distribution with mean 0 and standard deviation 1
result = DBInterface.execute(db, "select PNORM(-1.96, 0.0, 1.0) from table") |> DataFrame
# Random number drawn from a Normal Distribution with mean * and standard deviation 1
result = DBInterface.execute(db, "select RNORM(0.0, 1.0) from table") |> DataFrame
Note that Q, P, and R prefix correspond to Quantile, CDF (Probability), and Random (number), respectively.
QT(x, dof)
,PT(x, dof)
,RT(dof)
for Student-T DistributionQCHISQ(x, dof)
,PCHISQ(x, dof)
,RCHISQ(dof)
for ChiSquare DistributionQF(x, dof1, dof2)
,PF(x, dof1, dof2)
,RF(dof1, dof2)
for F DistributionQPOIS(x, lambda)
,RPOIS(x, lambda)
,RPOIS(lambda)
for Poisson DistributionQBINOM(x, n, p)
,PBINOM(x, n, p)
,RBINOM(n, p)
for Binomial DistributionQUNIF(x, a, b)
,PUNIF(x, a, b)
,RUNIF(a, b)
for Uniform DistributionQEXP(x, theta)
,PEXP(x, theta)
,REXP(theta)
for Exponential DistributionQBETA(x, alpha, beta)
,PGAMMA(x, alpha, beta)
,RGAMMA(alpha, beta)
for Beta DistributionQCAUCHY(x, location, scale)
,PCAUCHY(x, location, scale)
,RCAUCHY(location, scale)
for Cauchy DistributionQGAMMA(x, alpha, theta)
,PGAMMA(x, alpha, theta)
,RGAMMA(alpha, theta)
for Gamma DistributionQFRECHET(x, alpha)
,PFRECHET(x, alpha)
,RFRECHET(alpha)
for Frechet DistributionQPARETO(x, alpha, theta)
,PPARETO(x, alpha, theta)
,RPARETO(alpha, theta)
for Pareto DistributionQWEIBULL(x, alpha, theta)
,PWEIBULL(x, alpha, theta)
,RWEIBULL(alpha, theta)
for Weibull Distribution
JB(x)
for Jarque-Bera Normality Test (returns the p-value)
The package mainly uses the register
function. For example, a single variable
function MEDIAN
is registered as
SQLite.register(db, [],
(x,y) -> vcat(x, y),
x -> StatsBase.quantile(x, 0.50),
name = "MEDIAN")
whereas, the two-variable function COR
is registered as
SQLite.register(db, Array{Float64, 2}(undef, (0, 2)),
(x, a, b) -> vcat(x, [a, b]'),
x -> StatsBase.cor(x[:,1], x[:,2]),
name = "COR", nargs = 2)
for Pearson's correlation coefficient.