We welcome the ascension of our artificial overlords!

Back

6th August 2024 - Dimensionality Reduction: Manifold Learning

I blink once, one month has passed. I blink twice, years had gone by.

Imagine being tasked with analyzing vast datasets of celestial objects. Traditional methods like linear regression or classification might struggle to identify subtle patterns or underlying structures within this high-dimensional data. To overcome this, the team needs a technique that can uncover hidden relationships between these spectral signatures, potentially revealing new classes of celestial objects or understanding the evolution of existing ones.

Manifold learning

Manifold learning is a dimensionality reduction technique that seeks to uncover the underlying low-dimensional structure of high-dimensional data. Unlike linear methods like PCA, manifold learning assumes that the data lies on a nonlinear manifold embedded in a higher-dimensional space.

The goal of manifold learning is to unroll or flatten this manifold into a lower-dimensional space while preserving the intrinsic geometric relationships between data points. This is analogous to flattening a rolled-up Swiss roll (see below), where nearby points on the roll remain close in the flattened version.

Variations of manifold learning

There are numerous manifold learning techniques, and for this example, we will apply several of them to an S-curved dataset separately. Here is a list of their goals and visualizations:

Multidimensional Scaling (MDS)
- Goal: preserves pairwise distances between data points.
- Visualization: shows a reasonable representation of the S-curve, but with some distortion, especially in the denser regions.
Isomap
- Goal: preserves geodesic distances (shortest path distances) between data points.
- Visualization: provides a better representation of the S-curve than MDS, capturing the underlying manifold structure more accurately.
Locally Linear Embedding (LLE)
- Goal: preserves local linear relationships between neighboring data points.
- Visualization: effectively captures the S-curve structure, with a clear representation of the underlying manifold.
Hessian LLE
- Goal: improves upon LLE by incorporating second-order information.
- Visualization: similar to LLE, Hessian LLE provides a good representation of the S-curve, potentially with slightly better local structure preservation.
Modified LLE
- Goal: a variation of LLE with potential improvements in performance or computational efficiency.
- Visualization: shows a reasonable representation of the S-curve, but without significant visual differences compared to LLE.
LTSA
- Goal: preserves local linear relationships while also considering the global structure of the data.
- Visualization: constructs local coordinate systems around each data point and aligns these coordinate systems to find a global embedding.
Laplacian Eigenmaps
- Goal: preserves local and global structure by using the Laplacian matrix.
- Visualization: effectively captures the S-curve, with a clear representation of the underlying manifold.
t-SNE
- Goal: preserves local structure while allowing for non-linear embeddings.
- Visualization: often provides visually appealing and informative low-dimensional representations.

Here is an array of techniques being applied onto a Swiss Roll dataset:

And here is another array of techniques being applied onto a severed sphere dataset:

Manifold learning on Python

The new modules that will be used for this instance of manifold learning are matplotlib's NullFormatter and sklearn's manifold. The former removes all labels on the ticks (values on axis); the latter imports premade syntax for a rapid setup.

From 24th July, the 3D cluster plot requires x-, y-, and z-axis values…

For the flat S-curve, you only need two axis values instead.

The same is said for the other 9 perplexity plots below. Onto the topic of perplexity, as perplexity increases per consecutive plot, the representation of the S-curve becomes more spread out and less defined. Lower perplexity values tend to preserve local structure better, while higher values focus on capturing global patterns, and can lead to increased distortion of the original S-curve shape.

To recap for human interpretation, a perplexity plot typically shows how the quality of a language model changes as the model complexity increases. The x-axis often represents model complexity (e.g., number of parameters, model size). The y-axis represents perplexity, a measure of how well the model predicts the next word in a sequence; lower perplexity indicates better performance.

Page updated

Google Sites

Report abuse