Supervised (kernel) PCA #235

n0rdp0l · 2024-09-30T08:19:26Z

Is your feature request related to a problem? Please describe.

Scikit-learn does not currently support supervised Principal Component Analysis (PCA) as it is considered not broadly applicable in the general context of machine learning see this discussion. However, within climate science, supervised dimensionality reduction techniques are more commonly used to capture dependencies between variables, as evidenced by studies like this one and this one.

Describe the solution you'd like

I would like to propose the integration of supervised dimensionality reduction mechanisms into xeofs, or an evaluation of the feasibility of implementing these techniques. This would allow users in specialized fields, like climate science, to leverage the power of supervised PCA for more effective data modeling.

Additional context

If there is interest in integrating supervised PCA or similar methods, I would be happy to contribute to the implementation and assist in adapting the techniques to fit the structure and goals of xeofs.
Here is the original paper referenced in the scikit post and here is a tutorial paper already including python code for implementation of the techniques.

The text was updated successfully, but these errors were encountered:

nicrie · 2024-10-01T11:05:23Z

Hi @n0rdp0l, thanks for the suggestion and the provided details! I wasn't familiar with supervised PCA, but I’ve briefly skimmed the papers and it does seem quite interesting. I agree that this could be valuable for the climate/forecasting community. From what I’ve seen so far, it looks feasible to implement SPCA within the existing xeofs framework.

Unfortunately, I’m a bit tied up over the next few months, so I’m unlikely to be able to lead this effort. However, I’d be happy to offer guidance and support to ensure a smooth integration into xeofs.

Here are a few starting points (though please note, I’m not yet fully familiar with the internals of supervised PCA and I’m unsure about the exact preprocessing steps for target data Y):

You’ll probably want to inherit from the BaseModelSingleSet class. We would need to extend the fit and fit_transform methods to include an optional Y argument.
According to the paper, an eigendecomposition of Q is needed, and the provided implementation uses np.linalg.eigh. xeofs has a general SVD class, which operates similarly to scikit-learn’s PCA, with the added advantage of supporting Dask and complex-valued data. If you prefer simplicity over potential performance gains by using eigh, you can use the existing SVD class -- just compute matrix Q and pass it in. The U_ and s_ attributes will give you your eigenvectors and eigenvalues, respectively.
In the paper/code, it is assumed that rows in X are features and columns are samples. Note that in BaseModelSingleSet, the preprocessed data (self.data["input_data"]) is provided in n_samples x n_features format.

That’s what I have in mind for now. If you decide to move forward with this, I’d be more than happy to assist further.

n0rdp0l · 2024-10-14T13:23:35Z

Hey @nicrie,

Thanks for the quick response! I’ll be applying this technique in a research project focused on a climate ML pipeline that also uses xarray. This will take precedence over the coming weeks, but afterward, I should be in a good position—both in terms of time and understanding—to implement this into xeof. I'll keep you posted.

I was particularly interested in the kernelized version of this approach. However, one challenge I'm encountering is that in kernel (supervised) PCA, the projection directions U aren’t directly available (page 10). From what I understand, this happens because the SVD is performed on the kernel matrix rather than the mapped data in the higher-dimensional space. This seems to complicate the kernelized implementation, especially since xeof’s component scores and resulting plots depend on those projection directions, right?

That said, I noticed you previously worked on ROCK PCA, where you were able to extract U using SVD, so I’m wondering if perhaps I’ve misunderstood something here.

Edit: Actually after going into it a bit more now i saw that you said you had issues with the ROCK implementation and I might be wrong about this but in the matlab code of rock they use eig() which extracts the right eigenvectors, i.e., V, while the python adaptation that you used is extracting the left eigenvectors U, via numpy.linalg.svd().

nicrie · 2024-10-16T09:10:11Z

Pretty good point that you raise there. I did not think too much about the kernelized version but now that you mention it, yeah I remember that this was a tricky part...

Disclaimer: I'm not an expert in kernel PCA.

I will start answering your message from the end:

in the matlab code of rock they use eig() which extracts the right eigenvectors, i.e., V, while the python adaptation that you used is extracting the left eigenvectors U, via numpy.linalg.svd().

I'm not sure what you mean with right eigenvectors? Perhaps I'm wrong here, but let's consider ~~data~~ covariance (or kernel) matrix $X$ with eigenvectors $V$ and the eigenvalues as a diagonal matrix $\Lambda$, then

$XV = V \Lambda $

right? Then, right-multiplying the equation by $V^T$ will give you

$X = V \Lambda V^T$

so the left singular vectors of $X$ equals the eigenvector of $X$. For complex numbers (as in the general case of ROCK-PCA) the transpose should be replaced by the conjugate transpose. In any case, the different between left and right singular vectors is just a (conjugate) transpose.

the projection directions U aren’t directly available (page 10). From what I understand, this happens because the SVD is performed on the kernel matrix rather than the mapped data in the higher-dimensional space.

Yeah, that's also my understanding. I cannot currently open the tutorial paper you linked to, but from my understanding of kernel PCA the eigenvectors of the kernel matrix are the non-linear kernel PCs, i.e. the time series. This is because kernel PCA works with the dual-representation of the problem. In general, with kernel PCA, you don't have direct access to the spatial patterns. When you are with standard (linear) PCA, you can just project your data onto your eigenvectors (no matter what they actually represent, the spatial or temporal patterns (similar to PCA S-mode vs. T-mode) to obtain the corresponding missing part. In kernel PCA, as far as I know, you cannot, which leads you to what is often called the pre-image problem.

And it's here where my superficial understanding quickly gets muddy. If I remember correctly, the ROCK-PCA paper just took the linear projection as in standard PCA as an approximation for the spatial patterns. Another approach by A. Hannachi et al. is to make composite maps based on the kernel PCs (e.g. Fig. 4 c-f).

n0rdp0l · 2024-10-30T16:18:32Z

Hi @nicrie,

Thanks for the explanation!

While trying to wrap my head around the kernelized approach, I might have confused myself a bit by taking "the projection directions U aren’t directly available" too literally. I realized that the roles of U and V in the SVD can change of course depending on whether the input matrix is transposed, which is exactly the case in xeof as you mentioned.

Additionally, I overlooked that in the ROCK code, the SVD is applied to the kernel matrix rather than the (subsequent) $\mathbf{\Psi}$ matrix in kernel PCA, that i was thinking about. Since the kernel matrix is symmetric and Hermitian, the SVD simplifies to:

$$ \mathbf{V} \cdot \mathbf{\Sigma} \cdot \mathbf{V}^\top $$

instead of the more general:

$$ \mathbf{U} \cdot \mathbf{\Sigma} \cdot \mathbf{V}^\top $$

, i.e., the left and right singular vectors are essentially the same (up to a transpose), as you mentioned above.

Apologies for the confusion!

nicrie added the enhancement New feature or enhancement label Oct 1, 2024

nicrie changed the title ~~suoervised (kernel) PCA~~ Supervised (kernel) PCA Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervised (kernel) PCA #235

Supervised (kernel) PCA #235

n0rdp0l commented Sep 30, 2024

nicrie commented Oct 1, 2024

n0rdp0l commented Oct 14, 2024 •

edited

Loading

nicrie commented Oct 16, 2024 •

edited

Loading

n0rdp0l commented Oct 30, 2024 •

edited

Loading

Supervised (kernel) PCA #235

Supervised (kernel) PCA #235

Comments

n0rdp0l commented Sep 30, 2024

nicrie commented Oct 1, 2024

n0rdp0l commented Oct 14, 2024 • edited Loading

nicrie commented Oct 16, 2024 • edited Loading

n0rdp0l commented Oct 30, 2024 • edited Loading

n0rdp0l commented Oct 14, 2024 •

edited

Loading

nicrie commented Oct 16, 2024 •

edited

Loading

n0rdp0l commented Oct 30, 2024 •

edited

Loading