This is an extract of answer provided on Quora on the subject of Machine Learning.
How do you explain Machine Learning and Data Mining to non-Computer Science people?
Suppose you go shopping for mangoes one day. The vendor has laid out a cart full of mangoes. You can handpick the mangoes, the vendor will weigh them, and you pay according to a fixed Rs per Kg rate (typical story in India).
Obviously, you want to pick the sweetest, most ripe mangoes for yourself (since you are paying by weight and not by quality). How do you choose the mangoes?
You remember your grandmother saying that bright yellow mangoes are sweeter than pale yellow ones. So you make a simple rule: pick only from the bright yellow mangoes. You check the color of the mangoes, pick the bright yellow ones, pay up, and return home. Happy ending?
Life is complicated
Suppose you go home and taste the mangoes. Some of them are not sweet as you’d like. You are worried. Apparently, your grandmother’s wisdom is insufficient. There is more to mangoes than just color.
After a lot of pondering (and tasting different types of mangoes), you conclude that the bigger, bright yellow mangoes are guaranteed to be sweet, while the smaller, bright yellow mangoes are sweet only half the time (i.e. if you buy 100 bright yellow mangoes, out of which 50 are big in size and 50 are small, then the 50 big mangoes will all be sweet, while out of the 50 small ones, on average only 25 mangoes will turn out to be sweet).
You are happy with your findings, and you keep them in mind the next time you go mango shopping. But next time at the market, you see that your favorite vendor has gone out of town. You decide to buy from a different vendor, who supplies mangoes grown from a different part of the country. Now, you realize that the rule which you had learnt (that big, bright yellow mangoes are the sweetest) is no longer applicable. You have to learn from scratch. You taste a mango of each kind from this vendor, and realize that the small, pale yellow ones are in fact the sweetest of all.
Now, a distant cousin visits you from another city. You decide to treat her with mangoes. But she mentions that she doesn’t care about the sweetness of a mango, she only wants the most juicy ones. Once again, you run your experiments, tasting all kinds of mangoes, and realizing that the softer ones are more juicy.
Now, you move to a different part of the world. Here, mangoes taste surprisingly different from your home country. You realize that the green mangoes are in fact tastier than the yellow ones.
You marry someone who hates mangoes. She loves apples instead. You go apple shopping. Now, all your accumulated knowledge about mangoes is worthless. You have to learn everything about the correlation between the physical characteristics and the taste of apples, by the same method of experimentation. You do it, because you love her.
Enter computer programs
Now, imagine that all this while, you were writing a computer program to help you choose your mangoes (or apples). You would write rules of the following kind:
if (color is bright yellow and size is big and sold by favorite vendor): mango is sweet.
if (soft): mango is juicy.
You would use these rules to choose the mangoes. You could even send your younger brother with this list of rules to buy the mangoes, and you would be assured that he will pick only the mangoes of your choice.
But every time you make a new observation from your experiments, you have to manually modify the list of rules. You have to understand the intricate details of all the factors affecting the quality of mangoes. If the problem gets complicated enough, it can get really difficult to make accurate rules by hand that cover all possible types of mangoes. Your research could earn you a PhD in Mango Science (if there is one).
But not everyone has that kind of time.
Enter Machine Learning algorithms
ML algorithms are an evolution over normal algorithms. They make your programs “smarter”, by allowing them to automatically learn from the data you provide.
You take a randomly selected specimen of mangoes from the market (training data), make a table of all the physical characteristics of each mango, like color, size, shape, grown in which part of the country, sold by which vendor, etc (features), along with the sweetness, juicyness, ripeness of that mango (output variables). You feed this data to the machine learning algorithm (classification/regression), and it learns a model of the correlation between an average mango’s physical characteristics, and its quality.
Next time you go to the market, you measure the characteristics of the mangoes on sale (test data), and feed it to the ML algorithm. It will use the model computed earlier to predict which mangoes are sweet, ripe and/or juicy. The algorithm may internally use rules similar to the rules you manually wrote earlier (for eg, a decision tree), or it may use something more involved, but you don’t need to worry about that, to a large extent.
Voila, you can now shop for mangoes with great confidence, without worrying about the details of how to choose the best mangoes. And what’s more, you can make your algorithm improve over time (reinforcement learning), so that it will improve its accuracy as it reads more training data, and modifies itself when it makes a wrong prediction. But the best part is, you can use the same algorithm to train different models, one each for predicting the quality of apples, oranges, bananas, grapes, cherries and watermelons, and keep all your loved ones happy 🙂
And that, is Machine Learning for you.
Analogy 1 : Growing up in the World
From the childhood you have been meeting, observing and interacting with people. Their behavior and impression on you gets stored in your brain. Your brain becomes a huge data center. You keep on adding more data as you meet new people. Soon you are able to guess how your experience will be with the next person you meet. The person smiles well, wears spectacles and has short hair. You become friendly with him because other smiling people who wear specs are good to you. Then a big 6′ man with a beard and broken tooth comes and you run away as a kid. This is all part of Data Mining within your brain.
As you grow up, you realize that spectacles, beards and size are not the only things that can tell you what people are like. You begin to see their position in society and their behavior in new situations. So the relevant attributes may change. Your algorithms improve by themselves. This is machine learning.
Analogy 2 : Belief in Astrology
This is all my supposition and I’m working to verify my belief. Your birth date has a sum that psychics, astrologers, numerologists call a Birth Number. Hundreds of years ago, they’d have noticed that there are patterns in a person’s personality and his birth number. For example, people with birth number X are good in making up weird yet interesting analogies (like this one). People with birth number X are bad in relationships. As they met more people with birth number X who were in poor relationships, it added to the “support” and “confidence” of their data. Then, after meeting a person with birth numberX, happily married for 20 years to a person with a certain birth number Y, they made an adaptation to their rules of prediction. This kept increasing till the point when they were able to predict the personality types up to 99% of times.
So this is again a combination of Data Mining and Machine Learning
P.s. I still don’t believe in astrology.
Analogy 3 : Business Management
<a deeper explanation>
You collect lot of data from the processes of your large retail store. Whenever someone makes a purchase, the computer at the billing counter adds a record to your database. I’m a regular visitor at your store and today I brought an expensive pen and a writing pad. Two records are created:
<Aditya brought Item # 12220 (Pen) today as a part of transaction # 222333>
<Aditya brought Item # 12243 (Pad) today as a part of transaction # 222333>
Now the first part is organizing the data. You will see if there are any records like
<Aditya brought Item # 000000 (nothing) today as a part of transaction # 222333>
You will get rid of them. This is called CLEANING.
The records in the database will be stored in a Data Warehouse, which is a large database arranged in a fashion that will simplify the process of finding good results.
Now you perform CLUSTERING.
You will find out the transactions/the people/ the products which are related to a group. For example, the people who buy stationary from your store will be in one cluster. You can use this information to see cool information like People who buy expensive pads buy cheap pens, and don’t buy anything else which a regular housewife buys.
Then comes CLASSIFICATION
If a person who is 23 years old, doesn’t have a job and doesn’t earn much comes to your store to look for a new computer, he’s not going to buy it.
A person who brought an expensive $100 pen mostly purchases a few novels too. You can use this information to modify your store so that the pens section is far apart from novels section. The person will have to travel a long way over multiple aisles which may cause them to buy another thing on the way.
When you had put up a Christmas sale with 20% discount on all products for kids, you had an surplus profit of $50,000 in the last year. Depending on the products and their quantities/costs available right now, this year you can earn up to $60,000 of profit even with a discount offer of 30%.
So you are in the market trying to buy a mango – which one will you buy – the yellow one or the green one? the big one or the small one? You have never ate a mango ever – how do you know which ones are sweet and which ones sour?
- Have you ate any fruit at some time? And you know that good apples/oranges are bright colored and that stinky apples/oranges are bad. You could come to the conclusion that any fruit that is bright and not stinky and big is probably good and then go buy some bright big non-stinky mangoes. (transfer learning)
- What if you’ve never had any fruit at all ever? All you could do is to bucket all similar looking/smelling/size-shaped mangoes together and not come to any conclusion about which bucket is sweet or sour then you can buy one fruit from each bucket and be assured (hopefully) that not all of them are sour (unsupervised clustering).
So you bought some mangoes and one of the mangoes you ate was sweet and you come to the conclusion that *this* mango is sweet.
That’s a pretty useless conclusion because you’ve have already ate *this* mango, useful conclusions are ones you can come to on other mangoes – the ones you have not yet paid your hard earned money for.
So next time when you go mango shopping and you want to buy similarly (or probably more) sweet mangoes, you may do one or more of the following things.
- Look for and buy mangoes that had same color/smell as the sweet mango you ate earlier (clustering based classification – this is supervised learning if you have ate at least 100 mangoes or semi-supervised if you have 100 mangoes with you but only ate 1 or 2 so far).
- The more mangoes you eat – the more truer conclusions you’ll come to. (reinforcement learning).
- You need to eat not just more mangoes but need to try and eat mangoes with diverging features and sometimes try even the ones you predict to be sour because (ex 1: the totapuri mangoes are good when they are green not when they are yellow, ex 2: what if the yellow sweet smelling mangoes you bought from a bad vendor were sour – do you blame the vendor or the color/smell?) (once bitten twice shy or even going to a point of no return because of a single bad experience)
- Do not buy mangoes on friday the 13th because all the ones that your friend bought on that day were sour and he fell sick (overfitting).
- Wear the same shirt that you wore the other day (overfitting) when you went mango shopping and all mangoes were sweet.
- You conclude that mangoes from that specific vendor which weigh exactly the same as your last sweet one with the exact same color and exact same smell are the only ones that are good. You are not completely wrong but how will you find more such mangoes? (curse of dimensionality).
- Give a score to all the mangoes as follows – bad smell (-3 points), good smell (+3 points), good color (+2 point), bad color (-2 point), good shape (+1 points). This kind of classification as compared to the one we did in (1) above may be more useful if say the mangoes were being auctioned instead of being sold at fixed price and you need to come up with the right price to pay. (regression)
Ok, this is all just general learning – what is machine about it. Let’s say you created a website called mangobook dot com and you’ve somehow convinced a billion people in the world that they should post pictures of every mango they buy and then comment on the photo about the taste after eating it. You have information about hundreds of billions of mangoes from around the world now – more than any single person could eat in a thousand lifetimes. A person reading all these posts on your website could then come to similar learnings as above. For example you could learn that “only people who live in bangalore like totapuri mangoes” and then you can sell this information to advertisers and profit. Since it is impossible for a single person to read all posts on your website you would make computers do this reading and learning. And the best part is you don’t have to write any new code – you need to just talk to someone from internet-movies dot com or amazing-online-products dot com, get their code and just run it over your mango data instead of on their movie viewership data or product purchase data – that’s all.
But how does this code work?
- Ok so we said we have common code for learning over mango data and movie data – which means we need to first convert mango data and movie data into a single format. Let’s create a spreadsheet so that for every mango that was eaten or a movie that was seen there is a row and the columns can only take “true” or “false” values. For mangoes, these columns could be “mango’s weight was less than 200grams”, “mango’s weight was more than 500 grams”, “mango’s color was around 570nanometer wavelength +/- 50”, “mango variety is alphonso”, “mango was bought on friday the 13th”, “mango buyer lives in bangalore” etc… Note that this is just one way and that using this way you should not create columns of type “mango weight was 102.324 grams” – if you do then you will suffer fromcurse of dimensionality. Note also that inspite of all the hype of machine learning – your domain knowledge of mangoes and your creativity becomes very important in deciding what column names to use. (feature extraction, feature selection)
- There will be one more final column for each row that is named “buyer loved the mango she ate”.
- Now we could simply calculate the probabilities and conditional probabilities as follows. If there were 100 rows in your spreadsheet and 30 of then had “true” in the column named “buyer loved the mango she ate”, then that is the prior probability that any random mango is good. Now look at these rows only and see how many of them had “mango’s color was around 570nanometer wavelength +/- 50” and “mango variety is alphonso” both as true – say there are 20 such rows. What did you learn? Prefer alphonso mangoes that are 570 nanometer wavelength yellow. You could extend this to all combinations of columns (Naive bayes, nearest neighbour, decision trees).
- If you remember regression that was mentioned earlier in this article then we assigned points like bad smell (-3 points), good smell (+3 points), good color (+2 point), bad color (-2 point), good shape (+1 points). how did we get to those points? Initially you don’t know and you just give all columns the same point – say (+1) and then you consider each of the columns one by one and see if changing the point scheme for that column by +1 or -1 helps in better segregation of the mangoes …that is all bad mangoes get negative total points and good mangoes get positive total points. You continue changing the point scheme until there is good enough segregation or until you are tired. (perceptron, gradient descent, hyperplane, support vector machine, kernel trick, …)
- One last warning is that if you have only ate a few mangoes and all the sweet ones you ate had good smell and most of them (but not all) were yellow – you might come to the conclusion that “good smell” and “sweet” are equivalent and assign +100 points to “good smell” and 0 points to all other features. On the other hand you might also be tempted to assign at least a “-1” point to “was mango bought on friday the 13th”. Both of these might not be good for your analysis. (regularization of feature weights – use as few features as possible (occam’s razor) but don’t rely heavily on a single feature).
One really last thing is that you might say that most of the learning in this article was intuitive and gut-feeling based – which is fine except when you are the ceo of north american mango imports corporation – then you have lots at stake and you better not make mistakes in identifying good mangoes lest your company goes bankrupt. You need to understand the risk/probability of going bankrupt when you buy millions of dollars worth of mangoes from India that you may not be able to resell in the US. And that’s when math comes to rescue (training loss, generalized loss, Hoeffding’s inequality etc). Also machines don’t have intuition or gut-feeling and hence need a structure around which they can develop and measure different hypothesis’ (gradient descent for example) and these structures also need to be practical enough that you can implement them and they are not just pure math (stochastic gradient descent, surrogate loss models)