Phase 1: Business understanding

The initial phase focuses on understanding the project objectives and requirements and converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.

Adapting this phase to address context changes and model reuse handling involves:

  • New specialised tasks for identifying long term reusability business goals with respect to context changes.
  • Determining both data mining goals and success criteria when we address a context-aware data mining problem.
  • Perform an initial assessment of available context-aware techniques and describing the intended plan for achieving the data business goals.


  • Task: The first task is to thoroughly understand, from a business perspective, what the client really wants to accomplish, and thus try to gain as much insight as possible into the business goals for data mining. For that it is necessary to gather background information about the current business situation, document specific business objectives and agree upon criteria used to determine data mining success from a business perspective.
  • Outputs:
    • Background: Record the information that is known about the organizations business situation: determine organizational structure, identify the problem area and describe any solutions currently used to address the business problem.
    • Business objectives: Describe the customers primary objective agreed upon by the project sponsors and other business units affected by the results.
    • Business success criteria: Define the nature of business success for the data mining project from the business point of view. This might be as precisely as possible and able to be measured objectively.
    • Reusability, Adaptability and Versatility Goals: Identify, from a business long-term perspective, which are the prerequisites and future perspectives: whether the business goals involve reusability, adaptability, and versatility (i.e., should our solution procedure perform well over a range of different operating contexts?).
  • Task: Once the goal is clearly defined, this task involves more detailed fact-finding about all of the resources, constraints, assumptions and other factors that should be considered in determining the data analysis goal and project plan.
  • Outputs:
    • Inventory of resources: Accurate list of the resources available to the project, including: personnel, data sources, computing resources and software.
    • Requirements, assumptions and constraints: List all requirements of the project (schedule of completion, security and legal restrictions, quality, etc.), list the assumptions made by the project (economic factors, data quality assumptions, non-checkable assumptions about the business upon which the project rests, etc.) and list the constraints on the project (availability of resources, technological and logical constraints, etc.).
    • Risks and contingencies: List of the risks or events that might occur to delay the project or cause it to fail (scheduling, financial, data, results, etc.) and list of the corresponding contingency plans.
    • Terminology: Compile a glossary of technical terms (business and data mining terminology) and buzzwords that need clarification.
    • Costs and benefits: Construct a cost-benefit analysis for the project (comparing the estimated costs with the potential benefit to the business if it is successful).
  • Task: Translate business goals (in business terminology) into data mining goal reality (in technical terms).
  • Outputs:
    • Data mining and context-aware goals: Describe the type of data mining problem. Initial exploration of how the different contexts are going to be used. Describe technical goals. Describe the desired outputs of the project that enables the achievement of the business objectives.
    • Data mining and context-aware success criteria: Define the criteria for a successful outcome to the project in technical terms: describe the methods for model and context assessment, benchmarks, subjective measurements, etc.
  • Task: Describe the intended plan for achieving the data mining goals and thereby achieving the business goals. The plan should specify he project of the business goals, data mining goals (reusability, adaptability, and versatility), resources, risks, and schedule for all phases of data mining as well as include an initial selection of tools and techniques.
  • Outputs:
    • Project plan: List the stages to be executed in the project, together with duration, resources required, inputs, outputs and dependencies. Where possible make explicit the large-scale iterations in the data mining process, for example repetitions of the modeling and evaluation phases.
    • Initial assessment of tools and techniques: At the end of the first phase, the project also performs an initial assessment of tools and techniques, including the initial identification of contexts (changes) and the context-aware techniques to deal with them.

Next phase –> Phase 2: Data Understanding


Legend of the different representation of original and new/enhanced tasks and outputs: