Enterprises are getting unprecedented opportunities for business improvements with big data. The data sets generated by a wide range of enterprises activities are being used to tap into social media and internet of things infrastructures. The data can be used for discovering hidden knowledge and get insights for reaching optimal business decisions.
Such knowledge extraction also drives the development of machine learning capabilities to boost automation and accelerating decision making. However, data science is not simply about finding patterns in a dataset, it also includes identifying recurring patterns for solving real business problems. So, a data scientists must know how to leverage domain knowledge.
Read More: Get a competitive edge in data mining with the aid of domain knowledge
Domain knowledge is important in all types of data mining processes. For example, the popular CRISP-DM comprises several steps that are based on domain knowledge, that include:
Also, domain knowledge is extremely important where learning depends on a set of past observations. Because, even though the present data is derived from a real-life setting, it may not always represent real scenarios. In technical terms, it is called a data overfitting problem.
Data overfitting occurs when a machine learning agent performs very well on training datasets but returns poor performance when used with additional or new data sets. Despite there being “rules of thumb” for detecting and solving the problems of data overfitting, a true solution is not possible without domain knowledge. Perhaps, that is the reason why companies operating in similar business environments and industries get very different returns from a similar investment in big data analytics and data mining.
Such knowledge extraction also drives the development of machine learning capabilities to boost automation and accelerating decision making. However, data science is not simply about finding patterns in a dataset, it also includes identifying recurring patterns for solving real business problems. So, a data scientists must know how to leverage domain knowledge.
Read More: Get a competitive edge in data mining with the aid of domain knowledge
Importance of Domain Knowledge in the Data Mining
Domain knowledge is important in all types of data mining processes. For example, the popular CRISP-DM comprises several steps that are based on domain knowledge, that include:
- Business Understanding phase, to formulate the data mining problem from a business viewpoint. Domain knowledge in this phase helps in articulating tangible business problems and challenges.
- Data Understanding phase, for observing data to inspect and visualise it properly. Here, domain knowledge gives an idea of how the data represent the problem and if it is free from bias.
- Modelling phase, wherein different data mining and machine learning models used and analysed to get insights on solutions that will solve the problem.
- Evaluation phase, where different databasing models are evaluated in terms of suitability for the problem at hand.
Also, domain knowledge is extremely important where learning depends on a set of past observations. Because, even though the present data is derived from a real-life setting, it may not always represent real scenarios. In technical terms, it is called a data overfitting problem.
Data overfitting occurs when a machine learning agent performs very well on training datasets but returns poor performance when used with additional or new data sets. Despite there being “rules of thumb” for detecting and solving the problems of data overfitting, a true solution is not possible without domain knowledge. Perhaps, that is the reason why companies operating in similar business environments and industries get very different returns from a similar investment in big data analytics and data mining.

Comments
Post a Comment