Self-Organizing Maps (Kohonen Maps): How They Work and When to Use Them

By: Werda Shermeen

Published: June 19, 2026

High-dimensional data is the norm in analytics, not the exception. Customer records with dozens of attributes. Genomic datasets with thousands of gene expression values. Sensor streams with hundreds of concurrent measurements. The challenge is not storing or processing this data, it is extracting structure from it in a way that humans can understand and act on.

Self-organizing maps (SOMs) are one of the oldest and most practically useful tools for this problem. Developed by Finnish professor Teuvo Kohonen in the 1980s, they project high-dimensional data onto a two-dimensional grid in a way that preserves the topological relationships in the original data. (Source: Kohonen, T., “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, Vol. 43, pp. 59–69, 1982)

This guide explains what SOMs are, how the Kohonen algorithm works step by step, how SOMs compare to alternative approaches like k-means and PCA, where they are genuinely useful, and where they fall short.

What Is a Self-Organizing Map?

A self-organizing map is a type of artificial neural network that performs unsupervised learning; it does not require labelled training data.

Its purpose is dimensionality reduction with topology preservation. Given high-dimensional input data, the SOM learns to map that data onto a two-dimensional grid of neurons (called a lattice or map) such that data points that are similar in the original high-dimensional space end up near each other on the 2D grid.

This topology-preserving property is the key distinction. Other dimensionality reduction techniques like PCA project data into a lower-dimensional space, but do not guarantee that neighbouring points in the reduced space were neighbours in the original space. A SOM does. The spatial layout of the resulting map encodes meaningful similarity relationships.

The result is a “map” in the literal sense a two-dimensional representation where proximity indicates similarity, allowing a human analyst to navigate and explore the structure of a dataset that would otherwise be incomprehensible at its native dimensionality.

The Algorithm: How a Kohonen SOM Learns

The SOM training algorithm is iterative and competitive. Understanding it requires understanding three concepts: the lattice, the Best Matching Unit, and the neighbourhood function.

The lattice

The SOM consists of a 2D grid of neurons (also called nodes or units). Each neuron has a weight vector of the same dimensionality as the input data. If your input data has 50 features, every neuron in the lattice has a 50-dimensional weight vector.

Common grid shapes are rectangular and hexagonal. Hexagonal grids are preferred for most applications because each neuron has six equidistant neighbours rather than the four cardinal neighbours plus four diagonal neighbours of a rectangular grid, producing more uniform coverage of the input space.

Grid size is a hyperparameter. A common rule of thumb is to use approximately 5 * sqrt(N) neurons, where N is the number of training data points. (Source: Vesanto, J. and Alhoniemi, E., “Clustering of the Self-Organizing Map,” IEEE Transactions on Neural Networks, Vol. 11, No. 3, 2000) Larger grids capture finer structure but take longer to train and can overfit sparse data.

Training: competition, cooperation, and adaptation

SOM training iterates through three stages for each input vector in the training set.

Competition: For a given input vector, compute the Euclidean distance between that vector and the weight vector of every neuron in the lattice. The neuron whose weight vector is closest to the input is designated the Best Matching Unit (BMU). This is the “winner takes all” competition step.

Cooperation: The BMU and the neurons in its neighbourhood are identified. The neighbourhood is defined by a radius that shrinks over training iterations — initially covering a large portion of the map, gradually narrowing to just the immediate neighbours of the BMU.

Adaptation: The weight vectors of the BMU and all neurons within the neighbourhood radius are updated and moved closer to the input vector. Neurons closer to the BMU are updated more strongly than those at the edge of the neighbourhood. This is controlled by the neighbourhood function, typically a Gaussian that decreases with distance from the BMU.

This process repeats for every training example, across many epochs. As training progresses, the neighbourhood radius and the learning rate both decay. The map gradually self-organizes — neurons drift toward the clusters in the data, and the topological structure of those clusters is reflected in the spatial layout of the map.

Convergence and the final map

After training converges, each neuron in the lattice represents a prototype, a typical example of the region of input space it covers.

A new, unseen input vector is classified by finding its BMU, the neuron whose weight vector is most similar. That neuron’s position on the 2D grid is the data point’s mapping.

The trained map can be visualized as a heat map (where colour encodes some property of the prototype e.g., distance to neighbours) or used for cluster assignment (where contiguous regions of the map that activate for similar inputs define clusters).

SOM vs K-Means vs PCA: When to Use Which

SOMs occupy a specific niche among unsupervised learning methods. Understanding the tradeoffs with k-means and PCA determines when a SOM is the right tool.

Dimension	SOM	K-Means	PCA
Output type	2D spatial map of prototypes	Cluster labels and centroids	Linear projections / principal components
Topology preservation	Yes — spatial layout encodes similarity	No — centroids are independent	Partial — global structure only
Number of clusters	Not required in advance; emerges from map	Must be specified (k)	Not applicable — produces continuous projections
Interpretability	High — visual map can be explored directly	Medium — centroids are interpretable	Medium — requires understanding of principal components
Nonlinear structure	Captures nonlinear relationships in data	Assumes roughly spherical clusters	Linear only; misses curved or manifold structure
Sensitivity to outliers	Moderate — neighbourhood smoothing helps	High — outliers pull centroids	Moderate — affected by high-variance outlier dimensions
Computational cost	Higher than k-means; scales with map size	Lower; fast for moderate k	Moderate; dominated by eigende composition
Best for	Exploratory analysis of complex high-dim data	Well-defined cluster assignment at scale	Linear feature extraction and noise reduction

The SOM’s primary advantage over k-means is that you do not need to specify the number of clusters in advance, and the resulting map reveals structure at multiple scales you can identify a few broad clusters by looking at large regions of the map, or finer sub-clusters by examining local structure within those regions.

Its advantage over PCA is the ability to capture nonlinear structure. PCA is a linear technique that finds the directions of maximum variance in the data, which works well when the underlying structure is roughly linear. Data with curved, manifold, or otherwise nonlinear organization requires a nonlinear method like SOM.

The tradeoff is computational cost and interpretability complexity. K-means is faster and the output is simpler to use downstream. For large-scale production applications where speed matters, k-means is usually preferred. SOMs are most valuable for exploratory analysis, understanding the structure of a dataset before applying more targeted methods.

Practical Use Cases for Self-Organizing Maps

Customer segmentation

Customer segmentation is one of the most common SOM applications in data analytics.

A retailer with millions of customers and dozens of purchase behavior features can use a SOM to project all customers onto a 2D map. Similar customers cluster near each other. The resulting map allows analysts to visually explore the segmentation identifying a high-value segment in one corner, a price-sensitive casual buyer segment in another, and understanding the gradual transition between them.

The advantage over k-means is that the map shows how segments relate to each other. A k-means result tells you there are five clusters. A SOM tells you that cluster 2 and cluster 4 are more similar to each other than either is to cluster 1, and that there are customers that sit on the boundary between them.

Anomaly detection

After a SOM is trained on normal data, anomalous inputs map to regions of the lattice that are poorly represented neurons whose weight vectors are far from the input.

This property makes SOMs useful for anomaly and fraud detection. Network intrusion patterns that do not match normal traffic profiles activate unusual regions of the map. Financial transactions with unusual feature combinations land far from any high-density cluster.

The quantisation error the distance between an input and its BMU serves as an anomaly score. High quantisation error indicates a data point that the SOM has not seen during training, which is a signal for further investigation.

Document and text clustering

Document maps SOMs trained on text features (TF-IDF vectors, word embeddings) produce spatial organizations where semantically related documents cluster near each other.

This is useful for organising large document collections, scientific literature, customer feedback, news archives for exploratory navigation. A user can scan regions of the map to understand what topics are covered, then drill into regions of interest.

Kohonen himself demonstrated this application with SOMs that organized Finnish phonemes spatially, and it has since been applied extensively to news categorization, patent mapping, and customer feedback analysis. (Source: Kohonen, T., “Self-Organizing Maps,” 3rd ed., Springer Series in Information Sciences, 2001)

Bioinformatics and gene expression analysis

Gene expression data is inherently high-dimensional typical microarray or RNA-seq experiments measuring the expression level of thousands of genes across a smaller number of samples.

SOMs are used to cluster genes with similar expression profiles across experimental conditions. Genes that cluster together on the map often participate in the same biological pathways or respond to the same stimuli. This spatial organization of gene expression patterns is a starting point for hypothesis generation about molecular mechanisms.

Visualization of complex datasets

Beyond specific analytical tasks, SOMs are used as a general exploratory visualization tool for any high-dimensional dataset where understanding the overall structure is the first priority.

The resulting 2D map can be colored by any attribute average feature values, class labels from external data, outcome rates to reveal how that attribute varies across the data space. Different coloring schemes applied to the same trained map reveal different aspects of structure without retraining.

Strengths and Limitations of SOMs

Strengths

No pre-specified number of clusters: The map discovers structure from the data without requiring the analyst to guess k in advance.
Topology preservation: The spatial layout of the map encodes meaningful similarity. Adjacent map regions contain similar data points.
Robust to noise: The neighbourhood update mechanism provides a smoothing effect that makes SOMs less sensitive to individual outliers than k-means.
Interpretable visual output: The 2D map is directly interpretable by analysts without specialist knowledge of the underlying algorithm.
Flexible distance metrics: SOMs can be adapted to non-Euclidean distances and non-vectorial data types, including sequences and strings.

Limitations

Computationally expensive: Training a large SOM on a large dataset is slower than k-means. For production-scale clustering applications, k-means or approximate nearest-neighbour methods are usually faster.
Sensitive to initialization and hyperparameters: Different random initializations can produce different maps. Grid size, learning rate schedule, and neighbourhood function parameters all affect the result and require tuning.
No probabilistic output: Unlike Gaussian mixture models, SOMs do not provide probability estimates for cluster membership. A data point is assigned to its BMU deterministically.
Difficult to evaluate objectively: Unlike supervised learning, there is no single “correct” SOM. Evaluation requires domain knowledge to assess whether the discovered structure is meaningful.
Fixed topology: The grid topology is fixed before training. If the data has a natural structure that does not fit a rectangular or hexagonal grid, the SOM may not capture it accurately.

Implementing a SOM in Python

The minisom library is the most commonly used Python implementation of self-organizing maps. It provides a straightforward API with support for rectangular and hexagonal topologies, batch and online training, and standard visualization utilities.

The basic workflow: initialize the SOM with grid dimensions and input dimensionality; train on a normalized input matrix; retrieve the BMU for each data point to assign cluster membership; visualize the trained map using the distance map (u-matrix) to reveal cluster boundaries.

The u-matrix visualization colors each neuron by its average distance to its neighbours. Low values (lighter colors) indicate neurons in dense cluster regions. High values (darker colors) indicate neurons at cluster boundaries. This visualization is the primary tool for identifying how many meaningful clusters the SOM has found.

For data normalization: SOMs are distance-based and therefore sensitive to feature scales. Normalize all input features to the same range before training zero mean and unit variance, or min-max scaling to [0, 1].

SOMs in the Context of Modern AI

Self-organizing maps predate the deep learning era by several decades. In the age of transformers and large language models, it is reasonable to ask whether they remain relevant.

They do but in a specific role. SOMs are not competitive with deep learning for supervised tasks like image classification or natural language understanding. They are competitive with k-means and other classical unsupervised methods for exploratory analysis of structured tabular data.

In 2026, a common pattern is to use deep learning or embedding models to produce compact, meaningful vector representations of complex data (text documents, images, customer behavior sequences), and then apply a SOM to the resulting embeddings to produce an interpretable spatial map of the embedding space.

This combination captures the representational power of deep learning and the interpretability of the SOM the embedding model extracts meaningful features, and the SOM reveals the structure of those features in a form that a human analyst can navigate.

Final Thoughts

Self-organizing maps are one of the most enduring algorithms in data analysis introduced in the 1980s, still actively used and extended in 2026, for good reason.

They solve a genuine problem: making the structure of high-dimensional data visible to humans. When the question is “what does this data look like?” rather than “which cluster does this point belong to?”, a SOM is often the most informative tool available.Their value is greatest in the exploratory phase of data work before you know what questions to ask. A SOM does not answer questions; it reveals what questions are worth asking.

For data teams building analytics capabilities, customer segmentation models, or anomaly detection systems, understanding where SOMs fit relative to k-means, PCA, and modern embedding-based methods is part of choosing the right tool for the specific analytical task.

If you are building data infrastructure or analytical platforms and want to discuss how unsupervised learning methods fit into your analytics architecture, Data Pilot’s data strategy consulting helps teams make these design decisions based on the actual structure of their data and the analytical questions they need to answer.