|
What is Data
Data
is the life-blood of every business, but why?
Simply because
customer knowledge lies at the heart of every organisation
and used correctly can make sales, marketing, business
and customer development more productive.
This intelligence
however is seldom visible or held in a format that
is easy to interpret and apply. Very often it also
has to be derived from some kind of data transformation
or analysis process.
Therefore any company
or organisation that wants to benefit from its in-house
intelligence needs to look to its data.
The following gives
a very brief overview as to some of the basics but
if you want further explanation please give Tech4T
a call on +44 (0)1733 890790 and we will give you
some free advice.
Postcodes and postal geography
The Royal
Mail maintains a UK-wide system of postcodes to
identify postal delivery areas. As most people know
their postcode, varying organisations use this as
their main geographic reference when collecting
data. This reference can be related to any geographic
unit used for statistical production, such as a
district or electoral ward. In addition, when de-duplicating
customer and prospect name and address records,
the Postcode can be used to help qualify the match
process. Postal geography is thus very valuable.
The UK Postcode
has 5 components. 4 are visible for addressing purposes
as set out below, the 5th - delivery point suffix
(or address key) is used by the Royal Mail to define
a unique address delivery point.
Postcode structure
Postcodes
are alphanumeric references comprising an outward
code of 2-4 characters and an inward code of 3 characters.
For example:
PO16 7DZ PO16 = outward
code 7DZ = inward code
The postcode
is structured hierarchically, supporting 4 levels
of geographic unit:
As of
April 2010, the breakdown is as follows:
|
Geographic unit
|
Number in UK
|
Approx delivery points
|
|
PO - Postcode area (one or two characters)
|
124
|
231,425
|
|
PO16 - Postcode district (one or two numbers)
NB. Sometimes presented as a number followed
by a single character in London for example
- EC1W
|
2,983
|
9,620
|
|
PO16 7 - Postcode sector
|
11,198
|
2,562
|
|
PO16 7DZ - Unit postcode
|
1,755,127
|
16.35
|
These
1.75 million postcodes cover over 28 million
delivery points and comprise 1.6 million small user and
0.25 million large user postcodes (see below).
Of the 1.75 million Postcodes, there are circa
93,000
PO Box or non geographic Postcodes.
Unit postcodes
Unit
postcodes are the base unit of postal geography
and fall into two types:
Large
user postcodes: allocated to single addresses receiving
at least 500 mail items per day (e.g. business addresses).
Small
user postcodes: collections of (usually) adjacent
addresses. A single small user postcode may contain
up to 80 addresses, but 16 is a more typical number.
Note:
It is possible for large buildings with many separate
delivery points (e.g. a tower block) to have more
than one unit postcode within the building.
Postcode structure recoding
As postcode
components can vary in length, to enable appropriate
alignment for sorting (for de-duplication or data
analysis) a fixed structure is preferred.
One option is as follows:
Left justify Postcode area (one or two characters)
in positions 1 and 2
Right justify postcode district (1 or 2 digits
or 1 digit and a letter for London) in position
3 and 4
Sector (single digit) goes in position 5
Unit, 2 letters, go in position 6 and 7
B1
2NZ becomes B__12NZ B11 3NQ becomes B_113NQ
What does data
look like?
Data is information that has been captured and then
stored electronically in a variety of ways and formats.
e.g. information (data) captured from:
-
Data entry screens when processing orders
-
From
Customer Relationship Management (CRM) incorporating
sales lead systems
-
Accounts
-
Web-based systems
-
Disparate customer or acquisition databases
-
Technical
support systems
-
Electronic points of sale (tills that capture
customer information)
-
Market
research surveys
-
Spreadsheets
-
Mailing lists, etc.
How is data stored - terminology
Data is stored in files as rows and columns;
these files are often referred to as tables
or lists.
·
A row is usually referred to as a
record or case (in statistical
terms). A row could represent a single customer
record or a sales transaction - all information
about a single customer or a single sales transaction
in a single row
·
A column is usually referred to as
a field or a variable
(in statistical terms). A column could represent
a single piece of information within a row (record).
For example, ‘Title’, ‘FirstName’, ‘Surname’, ‘Address
Line 1’, etc., are all fields
Where data is stored as text (ASCII), columns
are either separated by either a delimiter or are
aligned in fixed positions, known as a fixed field
format.
A delimiter is usually a tab, comma, pipe '|', right
square bracket ']' and to avoid columns being incorrectly
split, text - such as address lines - that may contain
a character used as a delimiter, are enclosed with
quotes. Also a header record is often included which
describes the data.
Below is an example
data file of a
comma separated file
with surrounding quotes
and a header record:
|
"URN","Name","Address1","Address2","Town","Postcode"
|
|
"1","Joe
Brown","Flat
3, The White Cottage","Oscar
Road","Newtown","BD1
4LW"
|
|
"2","Heny
Smith","11
Chestnut Drive","","Peterborough","PE1
2UT"
|
Other file formats
are unique to the software being used to store the
data and can usually be identified by the file extension.
Text (ASCII) files typically have the extension
.txt or .dat or .tab or
.csv. dBASE files use .dbf, MS Excel
.xls, SPSS .sav, MS Access .mdb
and so on.
The information stored in the columns and rows can
be held in varying formats - the three most common
being ‘Text’, ‘Values’ and ‘Dates’.
Often constraints are applied when information is
captured to ensure data is always stored in the
correct format. e.g. forcing UPPER case when capturing
a Postcode; Proper case (upper and lower casing)
on names and addresses; forcing the correct number
of decimal places for numeric data such as sales
values; selecting items to enter from a pick list
(look-up file or list) to ensure only allowed items
can be added such as a list of products, media source
codes, gender descriptions, etc.
The UK Postcode
has its own unique structure (as above) and in itself is a
powerful marketing, analysis and selection source.
Database
A
database comprises a collection of files (or tables)
that are linked in a particular way to eliminate
repeating data and facilitate extraction of selected
information to form a particular customised ‘view’
or to create a new file containing just the selected
fields and data.
Where multiple databases need to be incorporated
in a company wide information system, a large database
is created to act as a central store - taking its
data feeds from company-wide operational databases.
This is termed a data warehouse. A subset of this,
designed for a particular department or a specific
application, is termed a data mart.
The usual database structure is termed a relational
(or normalised) database where, for example, an
address for a company is held only once in one table
(or file), and all contacts within that company
(stored in a separate table) relate to that address
record.
The example above shows a few fields from a customer
record (one record per customer) linked to multiple
sales transactions.

The two tables link on what is termed a URN - unique
reference number. This is the main way a record
(or case) is identified - by its URN.
For analytics and fast data interrogation, however,
(to find, for example, all customers who spent over
£1000 last month) the less tables that need to be
interrogated will result in a faster result.
Therefore data files need to be combined in as few
tables as possible with summarised and repeated
data, This is also how data is best presented for
statistical analysis.
This process, termed de-normalisation, can also
include file flattening (reducing for example, a
number of sales transactions into a single customer
summarised record).
Technologies4Targeting usually de-normalise data
for analysis and targeting.
Using your data
The process of
transforming your data into intelligence can be
complex and best left to data experts. Our website
denotes much of its space to this process but the
following steps are usually what most companies
need to follow:
-
Define your
business and marketing goals
-
Identify what
intelligence you need to meet your goals
-
Work out what
data might give you the basis of the intelligence
you need and identify all relevant sources
-
Extract the
data - samples to start with - and work out
how it does or doesn't link together. Then once
you are clear what you need, extract and audit
the data, identifying and labelling what each
piece of information is, what it relates to
and why it was captured if possible
-
Then validate
and clean any addresses and standardise formats
of all other fields - especially any coding
or descriptions
-
Next you need
to remove duplicate records and merge the varying
data files, even if they don't link and remembering
not to lose any information from removed records,
or, leaving any transactions that were previously
linked to a deleted contact without a parent!
-
Now enhance
the data using external lists and suppression
files
-
Next comes
the transformation process to prepare the data
for analysis. Hopefully you will now have a
single view of all data that, with the right
analytics, can provide the intelligence you
need
-
Nearly done.
Finish the analysis - keeping in mind your business
goals - and create the output. This could be
simple reports but more likely it will a file
of some kind to help you apply your findings
-
Finally make
the intelligence work for you company - whatever
it takes!
|