What is data science? Here’s how to understand the subject
Data science enables conversion of the data assets into data products. Not surprisingly, demand is very high for this hot new profession.
Several articles have been published emphasising why data should be treated as an organisational asset. And there are many that focus on deriving value by analysing the structured data and Big Data to gain deep business insights. Companies are now taking a step further to turn their data assets into products/services which in themselves would carry commercial value.
Data science enables conversion of the data assets into data products. Not surprisingly, demand is very high for this hot new profession. IDC predicts a need for 181,000 such professionals in the US by 2018 and a requirement for five times that number of positions with data management and interpretation capabilities. McKinsey Global Institute estimates the shortage of data scientists in 2018 at 190,000. Glassdoor has listed the role of a data scientist as the top job of 2016 from among 25 best jobs in America. The average salary of a data scientist is likely to be 50-80% higher than that of a business analyst.
What is data science?
Data science helps uncover hidden patterns from large volume of structured and unstructured data which can then be deployed commercially. What differentiates it from standard business intelligence is that one is not sure what one is looking for and instead, attempts to uncover hidden patterns of commercial value.
Take, for instance, an attempt to classify video clips as, say, political, sports, humour, self-improvement, etc., without manually opening each file. To achieve this, the data scientist will need to study numerous subjective characteristics of the video clip, such as voice modulation, sentiment analysis, colour, speech to text, NLP, or other parameter yet unknown, to detect repeatable patterns that will enable classification of the video clips accurately.
As a discipline of study, data science combines the technology of data analysis, visualisation, statistics, mathematics, and the knowledge of business prerogatives. Statistics plays a central role in fitting patterns to data sets. With descriptive statistics one can qualify, categorise and describe what is shown by the available data, while inferential statistics helps in deducing possibilities beyond the available data. While statistical techniques provide quantitative insight, sound business knowledge helps translate it into business outcome.
Given that we are now generating more data than ever, the need for identifying patterns from such huge volumes of data has never been more relevant. And with technological advances, it is becoming all the more feasible. This can only mean that there is little excuse for organisations to overlook the value that can be gained through data science.
Turning data into product
Creating a data product involves defining a problem; postulating the desired outcome; determining the data required for analysis and ensuring its cleanliness, completeness and authenticity; using statistics, visualisation techniques, domain knowledge to analyse the data from several perspectives to uncover patterns, or trends; upon observation of a pattern, design experiments to confirm the accuracy and repeatability in different scenarios; represent the successful pattern as an algorithm/build models which a machine can learn and use for analysis.
Some considerations for creating a data product
Quality of data: To succeed with a data product, it is essential to have quality data. This ensures that the patterns being fitted to the data are not obscured by errant or outlying data. Thus the clean and relevant data helps shorten the pattern identification cycle and increases the success of the data product.
A viable business model: Typically, data products are bundled with other offerings that generate revenue. However, it is essential to estimate the additional value the data product will bring to such products and whether the effort spent on creating the data product is justifiable.
Benefits of data science
Data science finds critical usage in many key sectors. In the financial domain, data science is being used to unearth frauds or test risk models to
evaluate credit risks. In retail, targeted offers to prospective buyers are increasing conversion rates. In fact, data science finds applications in every
industry that generates data. Traditional businesses are also exploiting data science to build data products that will propel their respective businesses.
The writer, Jay Shah, is associate vice-president and ERP head, Nihilent Technologies