评估机器学习模型意味着衡量它们的性能——使用适当的metrics(准确率、精确率、召回率等)对模型未见过的test data进行测试。正确的评估对于了解模型是否真正有效且可靠至关重要。
在未见数据上进行评估
→ evaluate on a TEST set the model did NOT train on → measures GENERALIZATION (real performance)
→ training accuracy alone is misleading (a model can memorize training data)
→ train/validation/test split; cross-validation → reliable performance estimates
常见指标
CLASSIFICATION:
ACCURACY → % correct (but misleading for IMBALANCED data — e.g. 99% 'not fraud')
PRECISION → of predicted positives, how many are actually positive (avoid false positives)
RECALL → of actual positives, how many were found (avoid false negatives/missing cases)
F1 → balance of precision and recall
CONFUSION MATRIX → true/false positives/negatives breakdown
REGRESSION:
MAE, MSE/RMSE → average prediction error (how far off predictions are)
→ choose metrics that fit the problem (accuracy isn't always right)
