Few thoughts on Data Preparation for #Analytics

It is Friday and it is time for a blog post.
Typical analysis project spends 70% (- 80%) of time in preparing the data. Achieving the right Data quality and right format of data is a primary success factor of success of an analytic project.
What makes this task very knowledge intensive and why is a multifaceted skill required to carry out this task?
I will give a quick/simple example of how the “Functional knowledge” other than the technical knowledge is important in the preparation of the data. There is a functional distinction between missing data and non-existing data.
For example consider a customer data set. If the customer is married and the age of spouse is not available this is missing data. If customer is single, age of spouse is non-existing. In the data mart these two scenarios need to be represented differently so that the analytic model behaves properly.
Dealing with the missing data (data imputation techniques) within the data set while preparing the data impacts on the results of the analytical models.
Dr. Gerhard Svolba of SAS has written extensively on Data Preparation as well as Data Quality (for Analytics) and this presentation gives more details on the subject.
I have made a blog post earlier dealing with these challenges in the “Big data” world – http://technofunctionalconsulting.blogspot.in/2012/10/data-munging-in-big-data-world.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: