|
Why: The information necessary to build predictive models is often spread across a table containing “static” information such as customer demographics or equipment specifications and a log of transactions such as purchase history, service call history or equipment alarms. To build predictive models, this data must be compressed and combined into a single row, representing both the static reference information and the event history. How: KEL creates aggregates on user defined periods. Period length can be day, week, month, etc. They are computed from a reference date that can be fixed or specific to each of the reference cases (e.g., date of first purchase for a customer). KEL is programmable and you can specify the aggregates (min, max, sum, count, etc.). Benefits for the business user: KEL does not require programming to perform this sophisticated aggregation. Due to the speed of KEL, several aggregation options can be tested ad-hoc to find the most meaningful solution. Benefits for the Data Mining expert: KEL enables the Data Mining professional to include additional historical data in the analysis process, resulting in better models. KEL is fast and can handle very large data sets. Benefits for the Integration specialist and IT: Only one pass of the log table is required, using an efficient internal data representation. Building transactional aggregates can be done in minutes instead of days, and can be used to prototype permanent ETL processes. No changes to the underlying schema are required. Example: For CRM, the most valuable information is how a customer has interacted with a company and its products. This information is typically stored as a purchase history, or call center log. When performing an analysis to predict customer churn, a customer's actions with respect to the time they left can be critical for maximizing model quality. This requires an event aggregation based on the churn date. Customers churn at different times, so aggregating on a fixed date, such as January 2001, is not necessarily meaningful for the analysis. In this case, the count of purchases and complaint calls, and the sum of purchases could be automatically aggregated for each month in the year before the churn date. Once this is done by KEL, K2R could be used to predict churn. Example: In a different scenario, when predicting machine part failure, the static information about a particular piece of equipment (lot number, manufacture date, etc.) is not nearly as important as how the equipment has been used. The operating logs, with conditions such as temperature and pressure can be utilized by KEL. Again, a relative date such as the day the equipment was entered into service is the appropriate point for aggregation. A series of alarms in a new machine can be very different than the same set of alarms in a ten-year-old machine. Alarm counts along with maximum pressure and temperature for each quarter over the first five years of service life could be automatically created by KEL. In this case, K2S might be used in addition to create segments of equipment with high risk and low risk for failure. Back to KXEN Key Components
|