Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervised (kernel) PCA #235

Open
n0rdp0l opened this issue Sep 30, 2024 · 4 comments
Open

Supervised (kernel) PCA #235

n0rdp0l opened this issue Sep 30, 2024 · 4 comments
Labels
enhancement New feature or enhancement

Comments

@n0rdp0l
Copy link

n0rdp0l commented Sep 30, 2024

Is your feature request related to a problem? Please describe.

Scikit-learn does not currently support supervised Principal Component Analysis (PCA) as it is considered not broadly applicable in the general context of machine learning see this discussion. However, within climate science, supervised dimensionality reduction techniques are more commonly used to capture dependencies between variables, as evidenced by studies like this one and this one.

Describe the solution you'd like

I would like to propose the integration of supervised dimensionality reduction mechanisms into xeofs, or an evaluation of the feasibility of implementing these techniques. This would allow users in specialized fields, like climate science, to leverage the power of supervised PCA for more effective data modeling.

Additional context

If there is interest in integrating supervised PCA or similar methods, I would be happy to contribute to the implementation and assist in adapting the techniques to fit the structure and goals of xeofs.
Here is the original paper referenced in the scikit post and here is a tutorial paper already including python code for implementation of the techniques.

@nicrie
Copy link
Contributor

nicrie commented Oct 1, 2024

Hi @n0rdp0l, thanks for the suggestion and the provided details! I wasn't familiar with supervised PCA, but I’ve briefly skimmed the papers and it does seem quite interesting. I agree that this could be valuable for the climate/forecasting community. From what I’ve seen so far, it looks feasible to implement SPCA within the existing xeofs framework.

Unfortunately, I’m a bit tied up over the next few months, so I’m unlikely to be able to lead this effort. However, I’d be happy to offer guidance and support to ensure a smooth integration into xeofs.

Here are a few starting points (though please note, I’m not yet fully familiar with the internals of supervised PCA and I’m unsure about the exact preprocessing steps for target data Y):

  • You’ll probably want to inherit from the BaseModelSingleSet class. We would need to extend the fit and fit_transform methods to include an optional Y argument.
  • According to the paper, an eigendecomposition of Q is needed, and the provided implementation uses np.linalg.eigh. xeofs has a general SVD class, which operates similarly to scikit-learn’s PCA, with the added advantage of supporting Dask and complex-valued data. If you prefer simplicity over potential performance gains by using eigh, you can use the existing SVD class -- just compute matrix Q and pass it in. The U_ and s_ attributes will give you your eigenvectors and eigenvalues, respectively.
  • In the paper/code, it is assumed that rows in X are features and columns are samples. Note that in BaseModelSingleSet, the preprocessed data (self.data["input_data"]) is provided in n_samples x n_features format.

That’s what I have in mind for now. If you decide to move forward with this, I’d be more than happy to assist further.

@nicrie nicrie added the enhancement New feature or enhancement label Oct 1, 2024
@nicrie nicrie changed the title suoervised (kernel) PCA Supervised (kernel) PCA Oct 1, 2024
@n0rdp0l
Copy link
Author

n0rdp0l commented Oct 14, 2024

Hey @nicrie,

Thanks for the quick response! I’ll be applying this technique in a research project focused on a climate ML pipeline that also uses xarray. This will take precedence over the coming weeks, but afterward, I should be in a good position—both in terms of time and understanding—to implement this into xeof. I'll keep you posted.

I was particularly interested in the kernelized version of this approach. However, one challenge I'm encountering is that in kernel (supervised) PCA, the projection directions U aren’t directly available (page 10). From what I understand, this happens because the SVD is performed on the kernel matrix rather than the mapped data in the higher-dimensional space. This seems to complicate the kernelized implementation, especially since xeof’s component scores and resulting plots depend on those projection directions, right?

That said, I noticed you previously worked on ROCK PCA, where you were able to extract U using SVD, so I’m wondering if perhaps I’ve misunderstood something here.

Edit: Actually after going into it a bit more now i saw that you said you had issues with the ROCK implementation and I might be wrong about this but in the matlab code of rock they use eig() which extracts the right eigenvectors, i.e., V, while the python adaptation that you used is extracting the left eigenvectors U, via numpy.linalg.svd().

@nicrie
Copy link
Contributor

nicrie commented Oct 16, 2024

Pretty good point that you raise there. I did not think too much about the kernelized version but now that you mention it, yeah I remember that this was a tricky part...

Disclaimer: I'm not an expert in kernel PCA.

I will start answering your message from the end:

in the matlab code of rock they use eig() which extracts the right eigenvectors, i.e., V, while the python adaptation that you used is extracting the left eigenvectors U, via numpy.linalg.svd().

I'm not sure what you mean with right eigenvectors? Perhaps I'm wrong here, but let's consider data covariance (or kernel) matrix $X$ with eigenvectors $V$ and the eigenvalues as a diagonal matrix $\Lambda$, then

$XV = V \Lambda $

right? Then, right-multiplying the equation by $V^T$ will give you

$X = V \Lambda V^T$

so the left singular vectors of $X$ equals the eigenvector of $X$. For complex numbers (as in the general case of ROCK-PCA) the transpose should be replaced by the conjugate transpose. In any case, the different between left and right singular vectors is just a (conjugate) transpose.

the projection directions U aren’t directly available (page 10). From what I understand, this happens because the SVD is performed on the kernel matrix rather than the mapped data in the higher-dimensional space.

Yeah, that's also my understanding. I cannot currently open the tutorial paper you linked to, but from my understanding of kernel PCA the eigenvectors of the kernel matrix are the non-linear kernel PCs, i.e. the time series. This is because kernel PCA works with the dual-representation of the problem. In general, with kernel PCA, you don't have direct access to the spatial patterns. When you are with standard (linear) PCA, you can just project your data onto your eigenvectors (no matter what they actually represent, the spatial or temporal patterns (similar to PCA S-mode vs. T-mode) to obtain the corresponding missing part. In kernel PCA, as far as I know, you cannot, which leads you to what is often called the pre-image problem.

And it's here where my superficial understanding quickly gets muddy. If I remember correctly, the ROCK-PCA paper just took the linear projection as in standard PCA as an approximation for the spatial patterns. Another approach by A. Hannachi et al. is to make composite maps based on the kernel PCs (e.g. Fig. 4 c-f).

@n0rdp0l
Copy link
Author

n0rdp0l commented Oct 30, 2024

Hi @nicrie,

Thanks for the explanation!

While trying to wrap my head around the kernelized approach, I might have confused myself a bit by taking "the projection directions U aren’t directly available" too literally. I realized that the roles of U and V in the SVD can change of course depending on whether the input matrix is transposed, which is exactly the case in xeof as you mentioned.

Additionally, I overlooked that in the ROCK code, the SVD is applied to the kernel matrix rather than the (subsequent) $\mathbf{\Psi}$ matrix in kernel PCA, that i was thinking about. Since the kernel matrix is symmetric and Hermitian, the SVD simplifies to:

$$ \mathbf{V} \cdot \mathbf{\Sigma} \cdot \mathbf{V}^\top $$

instead of the more general:

$$ \mathbf{U} \cdot \mathbf{\Sigma} \cdot \mathbf{V}^\top $$

, i.e., the left and right singular vectors are essentially the same (up to a transpose), as you mentioned above.

Apologies for the confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants