The data scientist aspirants sometimes find little difficult to comprehend the concepts of over fitting and under fitting of the model. In the field of data analytics, one can’t afford to move ahead without understanding these concepts well. Hence, let’s simplify it for all the future business analytics professionals.
During our school days, primarily there were three types of students:
Studious: They used to study well, could grab the concepts well which were taught in the class. On the top of that they had good problem solving skills and applied the concepts learnt in the class to solve problems during exam.
Average: They used to study hard but didn't have the capability to remember the all the concepts well. They had a steep learning curve and were a slow learner. However, their strength was their perseverance, never gave up easily but with low capabilities and weak problem solving skills.
Not interested: They were never good at studies nor they try to learn it. Their inclination was towards chatting, movies, and video games. They will fail in almost all subjects and then will approach the teacher for private tuition so that they could at least pass.
The Teacher: They will wait when the "Not interested" will approach them and they could charge them handsomely for private tuition with a minimum guarantee that they will at least pass in the exam.
The teacher himself would set the paper and check them for the exam (The teacher will not leave any stone unturned to his private tuition students look good). Hence, the result used to be like:
The board exam will have a little different result.
Now if we compare the School Exam and Board Exam with our Training and Test dataset respectively. The School Exam is like your training data and the board exam is like your test data.
If your training shows a high accuracy and significantly goes down for the test data then we have over fit or variance. If both training and test data shows low accuracy then it’s under fit or Bias. If training and test both shows a consistent high accuracy then it’s the best fit which data scientists always desire to get.
We hope you have understood the concept of over fitting and under fitting of the model. However, in case you face any confusion or require any more clarification, then please feel free to write to us at analyticseducator@gmail.com so that we could try to clarify your doubts.