K-Nearest Neighbors (KNN) is a simple yet effective machine-learning algorithm that classifies a data point based on the majority label of its k nearest neighbors. However, the choice of k directly influences how well the model balances bias and variance.

Dataset Used: Iris Dataset (contains 150 samples across 3 flower species — Setosa, Versicolor, and Virginica)
Features: Sepal length, Sepal width, Petal length, Petal width
Distance Metric: Euclidean Distance
Validation Technique: 10-Fold Cross-Validation
Software: Weka

kAccuracy (%)KappaMAERMSE
195.330.930.03990.1747
1596.670.950.05100.1371
3095.330.930.08420.1737
5091.330.870.16400.2330

Accuracy: Percentage of correctly classified samples.

Cohen’s Kappa: Measures agreement between predicted and actual labels, adjusting for chance.

MAE (Mean Absolute Error): Measures average magnitude of classification error.

RMSE (Root Mean Squared Error): Penalizes large misclassifications more heavily.

Visualization:

Interpretation:

  • Accuracy and Kappa improve up to k = 15, then drop after that.
  • MAE and RMSE are lowest at k = 15, confirming it’s the most balanced choice.

My Conclusion

When k is too small, the algorithm reacts to noise, causing overfitting.
When k is too large, it smooths over important local patterns, leading to underfitting.
An intermediate k = 15 offers the best trade-off between these two extremes.

Leave a comment

Design a site like this with WordPress.com
Get started