Concept of Overfitting, Underfitting, Bias & Variance of Machine Learning in Simple English

Author : Analytics Educator

The data scientist aspirants sometimes find little difficult to comprehend the concepts of over fitting and under fitting of the model. In the field of data analytics, one can’t afford to move ahead without understanding these concepts well. Hence, let’s simplify it for all the future business analytics professionals.

During our school days, primarily there were three types of students:

Studious: They used to study well, could grab the concepts well which were taught in the class. On the top of that they had good problem solving skills and applied the concepts learnt in the class to solve problems during exam.

Average: They used to study hard but didn't have the capability to remember the all the concepts well. They had a steep learning curve and were a slow learner. However, their strength was their perseverance, never gave up easily but with low capabilities and weak problem solving skills.

Not interested: They were never good at studies nor they try to learn it. Their inclination was towards chatting, movies, and video games. They will fail in almost all subjects and then will approach the teacher for private tuition so that they could at least pass.

The Teacher: They will wait when the "Not interested" will approach them and they could charge them handsomely for private tuition with a minimum guarantee that they will at least pass in the exam.

They have appeared for school exam

The teacher himself would set the paper and check them for the exam (The teacher will not leave any stone unturned to his private tuition students look good). Hence, the result used to be like:

Now they have appeared for board exam

The board exam will have a little different result.

The Studious will study as usual, apply her problem solving skills and score a similar result as she had done during her school exam.
The Average will struggle and get perhaps, marginally an improved, but still mediocre result. He still remains a slow learner, hence achieved the maximum whatever he could with his limited capabilities.
However, Not interested scored significantly low in the board exam since now he has no help from the teacher, and could hardly solve the problems which he has never seen before.

How do we relate this example with Machine Learning?

Now if we compare the School Exam and Board Exam with our Training and Test dataset respectively. The School Exam is like your training data and the board exam is like your test data.

If your training shows a high accuracy and significantly goes down for the test data then we have over fit or variance. If both training and test data shows low accuracy then it’s under fit or Bias. If training and test both shows a consistent high accuracy then it’s the best fit which data scientists always desire to get.

End note

We hope you have understood the concept of over fitting and under fitting of the model. However, in case you face any confusion or require any more clarification, then please feel free to write to us at analyticseducator@gmail.com so that we could try to clarify your doubts.