What Is Underfitting

Learn about underfitting in data modeling, a common problem where a model is too simple to capture the underlying patterns in the data, leading to poor performance.

Have More Questions →

Definition of Underfitting

Underfitting occurs in a statistical or machine learning model when it is too simple to adequately capture the underlying trend or patterns within the training data. This failure to learn from the data results in a model that performs poorly on both the training data and new, unseen data. It is characterized by high bias and often high variance (though primarily high bias).

Causes and Characteristics

Common causes of underfitting include using a model that is not complex enough for the data (e.g., fitting a linear model to non-linear data), insufficient training duration, or insufficient features (input variables) for the model to learn from. An underfit model typically shows low accuracy or high error rates on the training set, indicating it hasn't learned the basic relationships.

Example of Underfitting

Imagine trying to model the relationship between a person's age and their income using a simple straight line (linear regression) when the actual relationship is parabolic (income increases, then plateaus, then slightly decreases after a certain age). The straight line would be too simple, failing to capture the initial rise and later plateau/decline, thus underfitting the data.

Addressing Underfitting

To combat underfitting, one can increase model complexity (e.g., using polynomial regression instead of linear, or a more complex neural network architecture). Adding more relevant features, reducing regularization, or training the model for a longer period (with caution against overfitting) are also effective strategies.

Frequently Asked Questions

How is underfitting different from overfitting?
What is 'high bias' in the context of underfitting?
Can a model suffer from both underfitting and overfitting?
Why is underfitting considered a problem?