|
Why: Information in a database needs some pre-processing to work with modelling engines. The statistician traditionally performs a lot of this tedious work manually. How: K2C builds consistent coding schemes for all nominal, ordinal and continuous attributes from a training data set to analyze a given business question. The actual encoding process depends on the components that are using this data. Most of the time nominal and categorical variables are transformed into numerical values. Continuous variables are either normalized or recoded using piecewise continuous transforms in order to detect any non-linearity within the data. Benefits for the business user: Encoding the data is one of the most delicate phases of the data mining process, as it will determine the performance of the following processes. K2C encodes the data automatically, fast and without manual intervention. Continuous variables are automatically sorted into meaningful usability ranges, a process called binning. For example, when studying purchasing behaviour, a variable range of age 8 to 13 can be more meaningful than from 5 to 10 and 11 to 15 years. By drilling down on the variable contribution graph, the business user can see if a certain value has a positive or negative impact on the business question. Benefits for the Data Mining expert: K2C automates and speeds up encoding the data in order to accelerate the entire Data Mining process. K2C groups categories of data and identifies robust segments of continuous variables that produce the best compromise between fit and robustness. Benefits for the Integration specialist and IT: K2C is automatically integrated within the KXEN Analytic Framework when a component such as K2R requires encoded data. There is no additional integration work to be done for encoding of data, replacing missing values, or processing of out-of-range values. Copyright © 2002–2003 KXEN Inc. All rights reserved. Back to KXEN Key Components
|