I was recently having an informal discussion with a colleague at work regarding predictive analytics and the kinds possibilities moving ahead. It was quite interesting as the discussion started moving towards the statistics as a science side of things. He said that it’s not really forecasting when you can never tell something a 100%. I reasoned saying that if I can be 80% sure of something, then why not use the estimate rather than just random guessing. I might be wrong 2 times out of 10, but in all probabilities (pun intended), I might also be right 8 times out of 10. So why not view the glass half full. The other point is that forecasting results should never be viewed as point estimates by managers and more so by data scientists. A confidence interval in statistics gives a range wherein the forecasted value would stay in, say 95% of the time. The forecast might still be outside this range 5 times out of 100. So intuition and other qualitative factors can have its place once these estimates are created with statistical backing, through which the manager or concerned decision maker can then draw their conclusions. The point is that it’s not a substitute, but a complementary addition in the decision making process.
Besides I added, that some breaking innovations at this time making news are a result of Machine Learning which draws its fundamental from statistics. For example, recently in the news – Singapore launches world’s first ‘self-driving’ taxi service (http://bit.ly/2bkStZL), and Analysing cricket through a baseball lens (http://es.pn/2c2R49G).
He did somewhat agree to the above points, then said that if a person tries a statistical model and it turns out wrong, he wouldn’t go back for a fourth time! Perhaps he has had a few bad experiences to start off, however it does point out a key issue which business users and managers feel with regards to predictive analytics.
So, what is ‘Data Science’?
I heard this at a conference when a few academics were discussing data science, and it seems bluntly true, which is that anything which has to call itself a science, probably isn’t. It seems to be more of an applied field, wherein concepts of mathematics and statistics are applied and built upon using software. So maybe ‘Data Engineering’ would be a better name?
Anyway Data Science does represent an individual who possesses skills from various interdisciplinary fields such as statistics, software development, and databases. Hence, it’s primarily a ‘business term’ through which organizations identify such individuals. An academic would probably not be called this, but instead would go by as a Statistician, Machine Learning / AI etc, wherein they are using similar concepts as in data science but are focussing in its application to their specific domain.
Regardless, the skills are extremely transferable. We see a large number of new data scientists with software, database and mathematical academic backgrounds. Perhaps a better explanation can be given here on how the skills are transferable, but I’m short on time so maybe later. But with AI, the number of higher skilled jobs are expected to go up, which is contrary to what skeptics say when they see analytics tool which automate huge chunks of the data modelling process. You still need people for building such tools and RnD right.
An apt example, is recently we had a presentation when one market leader demonstrated their tool. And as expected it was quite impressive considering the decades of cumulative development which must have happened. However, our manager had so many intricacies which he kept asking about with regards to the data dashboards and automated charts, and if that they could be achieved. The presenter of course said it’s not only possible but easy to do. What he did not mention was that developing such customized solutions would need base development, and are not actually magically in the tool. These entail much higher costs as to what is estimated. It’s like a beautiful farm house which you cannot escape. You have a beautiful garden and toys to play with, but you better stick with them. The moment you want to personalize it, which you obviously would want to after using it for a while, that’s when the limitations would surface. You will not always be a newbie who is happy with probabilities given by a simple logistic regression, you will at some point want to play with the model and try new algorithms (or maybe even understand what’s happening inside!).
This is where a data scientist could come into the picture in a business environment. Data scientists are traditionally meant to be sitting along side business experts providing analytical and data modelling solutions which are customized and in depth, which are like gold nuggets for top level management decision making. Data scientists will soon be (or are moving towards) working in tandem with domain experts, instead sitting with the IT departments.
This shows that data scientists are not just technically learned, but also need to understand the business perspective of their applications, and how their data models need to fit and benefit an entire business process. For example, a recommendation system might be value adding to one function, but how does the business process benefit from it as a whole? That is the kind of question data scientists will move towards answering.
So basically, ‘Data Science’ might not be the most technically correct term, but it now does strike a chord in terms of the interdisciplinary skills expected by an organization. Besides, it does sound quite sexy.