Viscovery self-organizing maps SOMs
|
Understanding SOM visualisation
A SOM may be the
most compact way to represent a data
distribution. Because SOMs represent
complex data in an intuitive two-dimensional
perceptional space, data dependences
can be understood easily if one is
familiar with the map visualisation.
The following example
provides an intuitive explanation
of the basics of Viscovery visualisation.
|
 |
|
Imagine 1000 people on a football
field. We define a number of attributes
(e.g. gender, age, family status,
income) and ask the people on the
field to move closer to other people
who are most similar to them according
to these attributes.
After a while, everyone on the field
is surrounded by those people that
share similar attribute values. This
configuration is an example of a two-dimensional
representation of multi-dimensional
data points.
|
 |
|
Now imagine that, looking over the
crowd, you ask everyone to raise a
coloured flag according to their age
(blue for <20, green for 20 to 29,
yellow for 30 to 39, orange for 40
to 49, and red for 50 and over).
The pattern of colour that you see
corresponds to the distribution of
the attribute “Age” in the football
field. Next you ask the crowd to remain
in place and raise a coloured flag
according to their income, and so
on for other attributes. For each
attribute, you take a photo of the
colour distribution in the field.
This colour pattern corresponds to
the colour-coded maps visualised within
Viscovery software.
|
 |
|
Finally, you can put all the photos
side by side and inspect the dependences.
For example, you might see clusters
of younger people (blue/green) as
well as clusters of older people (orange/red).
Further, you could detect some correlation
between age clusters and income clusters:
e.g., higher incomes occur in older
groups. Continuing in this manner,
you will discover further relationships
among the defined attributes.
|
|
Analytical applications based
on SOMs
The unique SOM representation and
visualisation are powerful instruments
for data modelling and exploration.
However, the above mentioned visualisation
is just the starting point for much
more extensive and in-depth data mining
and predictive modelling. Through the
combination of the compact SOM data
representation with the strength of
classical statistics, Viscovery provides
a unique approach to data analysis
and predictive modelling which is
unique in terms of intuition and effectiveness.
The following have been chosen from
a multitude of analytics capabilities
to provide an overview of some prominent
fields of application.
|
 |
|
Clustering
SOMs simplify clustering and allow
the user to identify homogenous data
groups visually. In Viscovery, several
clustering algorithms (SOM Single
Linkage, Ward, and SOM-Ward) are available
for automatically building clusters.
|
 |
|
Prediction
Viscovery combines the non-linear
data representation of the SOM with
linear statistical prediction methods
for each homogenous sub-group to improve
prediction accuracy.
|
 |
|
Data representation
Data are highly compressed using statistical
methods, allowing a single map that
uses only a few megabytes of space
to represent databases that are orders
of magnitude larger.
|
 |
|
Real-time classification
New data can be located in the map
extremely quickly - up to 100,000
previously unseen data records can
be classified per second - allowing
real-time assessment of new data.
|
Specific benefits of the Viscovery
solution
-
Viscovery is the leading commercial solution
for data mining applications based on
SOMs.
-
Advantages in terms of technological superiority
include the following:
-
Quick and concise model creation even
for voluminous data sets
-
Superb visualisation of complex data and
dependences Integration of conventional
statistics with innovative methods of
data representation
-
Intuitive representation of abstract models
and analysis results Integration of expert
knowledge during the modelling process
-
Outstanding prediction accuracy due to
patented procedure for the extraction
of non-linear relations
-
Full workflow orientation
Self-organising Maps - Technical
Self-organizing maps (Ms, also referred
to as Kohonen maps) are used to create an
ordered representation of multi-dimensional
data which simplifies complexity and reveals
meaningful relationships. SOMs are a particularly
robust form of unsupervised neural networks
that, since their introduction by
Prof.
Teuvo
Kohonen
in the early 1980s, have been the technological
basis of countless applications as well as
the subject of many thousands of publications.
The SOM method can be viewed as a non-parametric
regression technique that converts multi-dimensional
data spaces into lower dimensional abstractions.
Much like a regression plane being an abstraction
of the original data, a SOM generates a representation
of the data distribution, however, with the
crucial difference that this representation
is non-linear.
For
data mining purposes, it has become a standard
to approximate the SOM by a two-dimensional
hexagonal grid. The “nodes” on the grid are
associated so-called “reference vectors” which
point to distinct regions in the original
data space. Starting with sets of numerical,
multivariate data, these reference vectors
on the grid gradually adapt to the intrinsic
shape of the data distribution, whereby the
reference vectors of neighboured nodes point
to adjacent regions in the data space. Thus
the order on the grid reflects the neighbourhood
within the data, such that data distribution
features can be read directly from the emerging
landscape on the grid.
This powerful method of data representation
is provided by many leading data mining suites
on the market. For the Viscovery system, the
SOM method is the basis on which a multitude
of analytical and statistical techniques are
applied. Viscovery systematically combines
Ms with classical statistical methods in
an intuitive visual environment that allows
anyone to understand the resulting analytical
model, regardless of their statistics background:
there is no need for familiarity with the
basic Kohonen algorithm. In Viscovery, the
details of the SOM creation process are shielded
from the user, who is guided through the application
in an environment of well-balanced settings
and defaults.
In Viscovery, the data representation contained
in the trained SOM is systematically converted
for use across a broad spectrum of visualisation
techniques. When Viscovery is used to evaluate
dependences, to investigate properties of
the data distribution, to search for clusters,
or to monitor new data - just to mention a
few options - an intuitive and inspiring interactive
process emerges.
In addition to the capabilities for data exploration,
Viscovery employs a multitude of statistical
techniques for the creation and application
of classification and prediction models, all
embedded in a workflow-guided project environment.
The Viscovery data mining products offer comprehensive
technical features for the generation of predictive
models, such as scoring models or segmentations,
as well as their application and real-time
integration into an operational environment.
Much of the
theoretical background as well as of innovative
algorithms in the field of SOMs is owed to
Prof. Kohonen, who, as the Head of the Laboratory
of Computer and Information Science at the
Helsinki University of Technology, prominently
contributed to the creation, evolution, and
spread of SOM technology. As the originator
of several new concepts, Prof. Kohonen is
the author of hundreds of scientific papers
as well as of several text books.
His manifold contributions to scientific progress
have been multiply awarded and honoured.
Please call
+44 (0)1733 890790
or
CLICK HERE
to
request
more details and/or a web-based demonstration of Viscovery
Content on this web page is reproduced with kind
permission of
Viscovery
Software GmbH, Vienna, Austria.
Viscovery® is a registered trademark of
Viscovery
Software GmbH in Austria and other
countries. Tech4T is an independent reselling
partner for
Viscovery
Software GmbH in the United
Kingdom.
Viscovery application examples - click here
Viscovery Predictor - click here
Viscovery Profiler - click here
Viscovery Data Mining
Suite - click here
Tech4T Viscovery home page - click here