5 Steps to effective Data Analytics
The first thing that generally people think of when someone says data analytics are - spreadsheets, graphs, charts, analysis software, reporting & visualization tools etc. If that is the scenario with you too, then your approach towards data analytics is not in the right direction.
In this short article, I want to explain some key aspects that will help you in taking the most optimum approach to attack any data analytics problem. Please note that in this article I do not intend to go into technical nitty grities, rathar discuss about the generalized approach that should be kept in mind for performing effective analytics.
Performing effective data analytics is a three phase process:
1) Understanding phase
2) Planning phase
3) Execution phase
Understanding phase: This is the most vital phase having three steps:
a) Understanding the end goal: The first step of doing good analytics is to have a crystal clear understanding about the business objective or goal that has to be achieved using the data. For example, whether the data need to be used to design a metrics dashboard? or it has go be used to generate actionable insights? or it has to be used to design a model to predict a business outcome etc. Not having a proper understanding of this end goal will result in investing lot of effort and budget in wrong direction, ending up in processing bulk of data but generating little value for the business users.
b) Understanding scope of data: One of the vital steps is to outline the scope of data. This means that datasets may have hundreds of fields and millions of rows, but all the fields may not be relevant for the business objective to be achieved. Hence it is important to understand from the business team about the right scope of data to be used, otherwise analysts will end up analysing hundreds of attributes which may make little sense to the business users.
c) Understanding the data quality: Once the above two steps are done, the focus should be given on understanding the quality of data. As we all know that the quality of the data will determine the quality of business decisions to be taken based on the data. This is a vital step so that all the data challenges can be identified and addressed. Though I dont want to go to technicalities but still explaining this with an example for ease of understanding - assume that there is one column for a KPI which has several rows as blank/Null as well as zeroes. Should the Nulls be replaced by zeroes? Should the zeroes be replaced by Nulls? Should it be left as it is? Each of these scenarios would result in a separate mean value for the KPI. Hence these kind of challenges should be identified and proper fixes should be defined based on discussion with the business team.
Planning phase: This phase deals with the steps to be followed to deal with some key aspects before the execution phase starts. This is a two step phase:
a)Plan for operational steps: In this step it is important to identify plan for all the operational steps including:
i)identifying all the joining conditions to be applied,
ii)data fixes and transformations to be performed,
iii)identifying models to be used, filtering conditions to be applied etc.
iv)Ensuring correct data types are taken for the relevant fields
It is also important to identify which steps can be done in parallel and which ones have to be sequential. Proper plan of the steps have to be created and followed.
b) Planning for resources: Plan for the analysts, tools & software to be used based on the requirements and budget available. Identifying right skilled resources is going to be extremely critical. Identity the most optimum set of tools based on budget available ( it may be more than one tools also, but ensure that right tools are used for the right purpose to gain maximum mileage )
Execution phase: This should be the most light weight phase where the primary activity should be to execute the steps and activities identified in the previous steps. One more step that is very vital in any data analytics work is to verify the counts and numbers in the output ( many times people tend to ignore scenarios where there are minor count mismatches while dealing with large data sets. However this can prove fatal later as these errors may occur not only due to calculation error but also logical errors )
Remember that the effort spent in the last phase is inversely proportional to the sum total of effort spent in the previous phases. Hence for an effective data analytics it is very important to effectively and efficiently perform the first five steps
Visit my Facebook Page @ facebook.com/FBTrainBrain/
Read my other article
What it needs for a successful implementation of a data science project