Johnson-Lindenstrauss transform (JLT), random projection (RP), fast Johnson-Lindenstrauss transform (FJLT), and randomized Hadamard transform (RHT) in python 3.x
Supports linear mappings and radial basis function (RBF) mappings (a.k.a. Random Fourier Features) that reduce dimensionality while preserving the square of the
Created by: Ben Fauber, Dell Technologies, 02Apr2023
Provides python 3.x functions based on the Johnson-Lindenstrauss (JL) lemma. The Johnson-Lindenstrauss Transform (JLT) preserves pair-wise distances with bounded error
At a high level, the Johnson-Lindenstrauss transform (JLT) is a dimensionality-reduction technique as illustrated below, where
Specifically, a Johnson-Lindenstrauss transform (JLT)
The above equation can be further simplified where
The figures below illustrate: 1) JLT algorithm runtimes; and 2) preservation of the square of the
JLT has applications in linear mappings, random projections, locality-sensitive hashing LSH, matrix sketching, low-rank matrix approximations, and sparse recovery.
For more info, check out the survey article, "An Introduction to Johnson–Lindenstrauss Transforms" (2021) by Casper Benjamin Freksen for a nice overview of JLT methods, their variations, and applications.
Python 3.x packages math
, numpy
, scipy.sparse
, and fht
(https://github.com/nbarbey/fht)
- Clone the
linearMapping.py
python file to your working directory using either:
- Python command line
git clone https://github.com/dell/jlt.git
or
- Jupyter Notebook
import os, sys
path = os.getcwd()
os.chdir(path)
!git clone https://github.com/dell/jlt.git
sys.path.insert(0, path+'\jlt')
- Import the module into your script:
[in]> from linearMapping import linearMapping, rbfMapping
Produces linear mapping of input vector or array from d
dimensions into k
dimensions, typically applied where k
is determined automatically (i.e., k=None
), via the method of Dasgupta and Gupta, with user-defined eps
(
[in]> linearMapping(A, k=None, eps=0.1, method='FJLT', p=2, random_seed=21)
[out]> # d-to-k linear mapping of A
A
is the input vector
method
accepts one of several variants of the JLT: JLT
, SparseRP
, VerySparseRP
, FJLT
, or SRHT
. See References section below for more details on each method.
p
is the FJLT
method.
random_seed
is the random seed value for the generator function that randomizes the Gaussian and/or the row-selector function, based on the method
employed.
Defaults are: k=None
, eps=0.1
, method=FJLT
, p=2
, and random_seed=21
. Code is fully commented -- variables and helper functions are further defined within the PY file.
The user can further edit the code to specify sampling with replacement swr
or sampling without replacement swor
for either faster or more accurate calculations, respectively. NOTE: swor
is recommended when solving for inverse matrices with iterative solvers (e.g., compressed sensing applications).
Produces radial basis function (RBF) mapping (a.k.a. Random Fourier Features) of input vector or array from d
dimensions into k
dimensions, typically applied where k
is determined automatically (i.e., k=None
), via the method of Dasgupta and Gupta, with user-defined eps
(
[in]> rbfMapping(A, k=None, method='SRHT-RFF', gamma=1.0, random_seed=21)
[out]> # d-to-k radial basis function mapping of A
A
is the input vector
method
accepts two variants of the RBF: RFF
or SRHT-RFF
. See References section below for more details on each method.
gamma
is the standard deviation of the Gaussian distribution.
random_seed
is the random seed value for the generator function that randomizes the Gaussian and/or the row-selector function, based on the method
employed.
Defaults are: k=None
, method=SRHT-RFF
, gamma=1.0
, and random_seed=21
. Code is fully commented -- variables and helper functions are further defined within the PY file.
The user can further edit the code to specify sampling with replacement swr
or sampling without replacement swor
for either faster or more accurate calculations, respectively. NOTE: swor
is recommended when solving for inverse matrices with iterative solvers (e.g., compressed sensing applications).
JLT
W. B. Johnson and J. Lindenstrauss, "Extensions of Lipschitz mappings into a Hilbert Space." Contemp. Math. 1984, 26, 189-206. link to paper
SparseRP
Dimitris Achlioptas, "Database-friendly random projections: Johnson-Lindenstrauss with binary coins." J. Comput. Syst. Sci. 2003, 66(4), 671-687. link to paper
VerySparseRP
L. Peng, T. J. Hastie, K. W. Church, "Very sparse random projections." KDD 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 2006, pages 287–296. link to paper
FJLT
N. Ailon and B. Chazelle, "Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform." STOC’06, May21–23, 2006, Seattle, Washington, USA. link to paper
SRHT
F. Krahmer and R. Ward, "New and improved Johnson-Lindenstrauss embeddings via the restricted isometry property." SIAM J. Math. Anal. 2011, 43(3), 1269–1281. link to paper
SRHT
N. Ailon and E. Liberty, "Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform." ACM Trans. Algorithms 2013, 9(3), 1–12. link to paper
RFF
A. Rahimi and B. Recht. "Random Features for Large-Scale Kernel Machines." NeurIPS 2007. link to paper
SRHT-RFF
Y. Cherapanamjeri and J. Nelson. "Uniform Approximations for Randomized Hadamard Transforms with Applications." 2022 Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 659–671. link to paper
k
S. Dasgupta and A. Gupta. "An elementary proof of the Johnson-Lindenstrauss Lemma." 1999. link to paper
Tight lower bounds for k
K. G. Larsen and J. Nelson. "Optimality of the Johnson-Lindenstrauss Lemma." 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). link to paper
@misc{FauberJLT2023,
author = {Fauber, B. P.},
title = {Johnson-Lindenstrauss Transforms},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/dell/jlt}}
}