Classical Multidimensional Scaling (MDS), also known as Principal Coordinates Analysis (PCoA), is a dimensionality reduction technique that visualizes the similarity or dissimilarity between objects in a lower-dimensional space. Given a distance or dissimilarity matrix between pairs of items, MDS finds a configuration of points in a target dimensional space (typically 2D or 3D) such that the distances between points approximate the original dissimilarities as closely as possible.
The Technical Problem: MDS solves the problem of locating points in space such that the distances between the points in that space correspond as closely as possible to the input distances. This is essentially a "map-making" problem - given only the distances between locations, can we reconstruct their relative positions?
Input: An n × n distance or dissimilarity matrix D, where D[i,j] represents the distance between objects i and j.
Output: A configuration of n points in k-dimensional space (where k << n) that preserves the distance relationships as faithfully as possible.
Goal: Find coordinates X = [x₁, x₂, ..., xₙ]ᵀ in ℝᵏ such that the Euclidean distances between points ||xᵢ - xⱼ|| approximate the original distances dᵢⱼ.
Let's apply Classical MDS to a simple example with 4 cities using their straight-line (great circle) distances:
Input Distance Matrix (in km):
London Paris Berlin Rome
London 0 344 933 1434
Paris 344 0 878 1106
Berlin 933 878 0 1184
Rome 1434 1106 1184 0Step-by-step computation:
Final 2D coordinates:
City X (dim 1) Y (dim 2)
London -456.2 243.8
Paris -312.5 -89.3
Berlin 287.4 195.6
Rome 481.3 -350.1Resulting MDS Map:
300 |
| • London • Berlin
200 |
|
100 |
|
0 |----------------------
| • Paris
-100 |
|
-200 |
|
-300 | • Rome
|
-400 |
-500 -300 -100 100 300 500Validation: Computing distances between points in the MDS solution:
Note: The 2D representation cannot perfectly preserve all distances because European cities don't lie on a flat plane (Earth is spherical). The MDS solution finds the best 2D approximation, with some inevitable distortion. The relative positions (London northwest, Berlin northeast, Paris west-central, Rome southeast) correctly reflect geographic relationships.
This example illustrates both the power and limitations of MDS: it recovers the general geographic structure using only distances, but perfect reconstruction in 2D is impossible for points on a sphere.
The quality of an MDS solution is often evaluated using Kruskal's stress formula:
Stress-1: $$S = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$
Where:
Interpretation:
Problem: Visualizing relationships between cities based on travel times or distances. Application: Given a matrix of driving times between major cities, MDS can create a 2D map that approximates their relative positions, revealing geographic patterns and transportation accessibility.
Problem: Understanding how people perceive similarity between stimuli. Application: Participants rate the similarity between pairs of faces, colors, or sounds. MDS reveals the underlying perceptual dimensions (e.g., discovering that people organize colors by hue and brightness).
Problem: Visualizing evolutionary relationships between species or genetic samples. Application: Using genetic distance measures (e.g., number of differing nucleotides), MDS creates phylogenetic maps showing evolutionary relationships, population structure, or disease subtypes.
Problem: Understanding brand positioning and consumer preferences. Application: Consumers rate similarity between products or brands. MDS reveals the competitive landscape and identifies market gaps or positioning opportunities.
Problem: Visualizing semantic relationships between words or documents. Application: Using cosine distance between word embeddings or document vectors, MDS creates semantic maps showing related concepts, useful for exploratory text analysis.
Problem: Visualizing social distance and community structure. Application: Using shortest path distances in a social network, MDS reveals community clusters, central actors, and structural patterns in social relationships.
Problem: Understanding brain connectivity patterns. Application: Using correlation-based distances between brain regions' activity patterns, MDS visualizes functional brain networks and identifies regions with similar activation patterns.
Problem: Visualizing molecular similarity for drug design. Application: Using molecular fingerprint distances, MDS maps chemical compounds to identify clusters of similar molecules, aiding in lead optimization and scaffold hopping.
Most statistical software packages provide MDS implementations:
cmdscale() function or smacof packagesklearn.manifold.MDScmdscale() functionMultivariateStats.jl packageThis handout provides a foundation for understanding classical MDS. For specific applications in your research domain, consider exploring variations like weighted MDS, landmark MDS, or non-metric MDS as appropriate.