[program-l] Re: Python: How do I evaluate if my decision tree is overfitting

From: Tony Malykh <anton.malykh@xxxxxxxxx>
To: program-l@xxxxxxxxxxxxx, pranav@xxxxxxxxxxxxxxxxx
Date: Mon, 3 Jan 2022 16:01:07 -0800

Performance on new data is always going to be worse than on training data regardless whether your model is overfitted or not.
ML 101 method of identifying overfitting is plotting your metric value by epoch (or by # of trees in your case) on evaluation or validation dataset and looking at average slope of the curve towards the end. If the curve is going up then you're underfitting; if it's going down then you're overfitting. And if it's flat, then you found sweet spot.
As for feature importance, many decision tree libraries can provide you this info. I know xgboost does. Never worked with sklearn, but a cursory search revealed this page:
https://stackoverflow.com/questions/49170296/scikit-learn-feature-importance-calculation-in-decision-trees
So it appears to be available in sklearn as well.

HTH

--Tony

On 1/2/2022 3:45 PM, pranav@xxxxxxxxxxxxxxxxx wrote:

Hi all,

I have built a decision tree in python using the sklearn library. How do I
evaluate if it is overfitting or underfitting?

I also need to do some feature engineering in terms of determining which
features are contributing most to my model. I am thinking about using the
shap values but if I print them, I get numbers which I am not sure how to
interpret.

Pranav

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

Follow-Ups:
- [program-l] Re: Python: How do I evaluate if my decision tree is overfitting
  - From: pranav

References:
- [program-l] Python: How do I evaluate if my decision tree is overfitting
  - From: pranav

[program-l] Re: Python: How do I evaluate if my decision tree is overfitting

Other related posts: