|
INTRODUCTION Data is the life-blood of every business, but why? Simply because customer knowledge lies at the heart of every organisation and used correctly can make sales, marketing, business and customer development more productive. This intelligence however is seldom visible or held in a format that is easy to interpret and apply. Very often it also has to be derived from some kind of data transformation or analysis process. Therefore any company or organisation that wants to benefit from its in-house intelligence needs to look to its data. The following gives a very brief overview as to some of the basics but if you want further explanation please give Tech4T a call on +44 (0)1733 890790 and we will give you some free advice. WHAT DOES DATA LOOK LIKE? Data is information that has been captured and then stored electronically in a variety of ways and formats. e.g. information (data) captured from:
HOW IS IT STORED - TERMINOLOGY Data is stored in files as rows and columns; these files are often referred to as tables or lists.
· A
row is usually referred to as a record or
case (in
statistical terms).
· A
column is usually referred to as a
field
or a
variable (in
statistical terms). Where data is stored as text (ASCII), columns are either separated by either a delimiter or are aligned in fixed positions, known as a fixed field format. A delimiter is usually a tab, comma, pipe '|', right square bracket ']' and to avoid columns being incorrectly split, text - such as address lines - that may contain a character used as a delimiter, are enclosed with quotes. Also a header record is often included which describes the data. Below is an example of a comma separated file with surrounding quotes and a header record:
"URN","Name","Address1","Address2","Town","Postcode" Other file formats are unique to the software being used to store the data and can usually be identified by the file extension. Text (ASCII) files typically have the extension .txt or .dat or .tab or .csv. dBASE files use .dbf, MS Excel .xls, SPSS .sav, MS Access .mdb and so on. The information stored in the columns and rows can be held in varying formats - the three most common being ‘Text’, ‘Values’ and ‘Dates’. Often constraints are applied when information is captured to ensure data is always stored in the correct format. e.g. forcing UPPER case when capturing a Postcode; Proper case (upper and lower casing) on names and addresses; forcing the correct number of decimal places for numeric data such as sales values; selecting items to enter from a pick list (look-up file or list) to ensure only allowed items can be added such as a list of products, media source codes, gender descriptions, etc. The UK Postcode has its own unique structure and in itself is a powerful marketing, analysis and selection source. POSTCODES AND POSTAL GEOGRAPHY The Royal Mail maintains a UK-wide system of postcodes to identify postal delivery areas. As most people know their postcode, varying organisations use this as their main geographic reference when collecting data. This reference can be related to any geographic unit used for statistical production, such as a district or electoral ward. In addition, when de-duplicating customer and prospect name and address records, the Postcode can be used to help qualify the match process. Postal geography is thus very valuable. The UK Postcode has 5 components. 4 are visible for addressing purposes as set out below, the 5th - delivery point suffix (or address key) is used by the Royal Mail to define a unique address delivery point. Postcode structure Postcodes are alphanumeric references comprising an outward code of 2-4 characters and an inward code of 3 characters. For example: PO16 7DZ PO16 = outward code 7DZ = inward code The postcode is structured hierarchically, supporting 4 levels of geographic unit: As of September 2004, the breakdown is as follows:
These 1.76 million postcodes cover over 27.5 million delivery points and comprise 1.55 million small user and 0.21 million large user postcodes (see below). Unit postcodes Unit postcodes are the base unit of postal geography and fall into two types: Large user postcodes: allocated to single addresses receiving at least 500 mail items per day (e.g. business addresses). Small user postcodes: collections of (usually) adjacent addresses. A single small user postcode may contain up to 100 addresses, but 15 is a more typical number. Note: It is possible for large buildings with many separate delivery points (e.g. a tower block) to have more than one unit postcode within the building. Postcode structure recoding As postcode components can vary in length, to enable appropriate alignment for sorting (for de-duplication or data analysis) a fixed structure is preferred. One option is as follows:
DATABASE
Where multiple databases need to be incorporated in a company wide information system, a large database is created to act as a central store - taking its data feeds from company-wide operational databases. This is termed a data warehouse. A subset of this, designed for a particular department or a specific application, is termed a data mart. The usual database structure is termed a relational (or normalised) database where, for example, an address for a company is held only once in one table (or file), and all contacts within that company (stored in a separate table) relate to that address record. The example above shows a few fields from a customer record (one record per customer) linked to multiple sales transactions. The two tables link on what is termed a URN - unique reference number. This is the main way a record (or case) is identified - by its URN.
Therefore data files need to be combined in as few tables as possible with summarised and repeated data, This is also how data is best presented for statistical analysis. This process, termed de-normalisation, can also include file flattening (reducing for example, a number of sales transactions into a single customer summarised record). Technologies4Targeting usually de-normalise data for analysis and targeting. USING YOUR DATA The process of transforming your data into intelligence can be complex and best left to data experts. Our website denotes much of its space to this process but the following steps are usually what most companies need to follow:
|