[program-l] Re: Python: How do I evaluate if my decision tree is overfitting

  • From: Tony Malykh <anton.malykh@xxxxxxxxx>
  • To: program-l@xxxxxxxxxxxxx, pranav@xxxxxxxxxxxxxxxxx
  • Date: Mon, 3 Jan 2022 16:01:07 -0800

Performance on new data is always going to be worse than on training data regardless whether your model is overfitted or not.
ML 101 method of identifying overfitting is plotting your metric value by epoch (or by # of trees in your case) on evaluation or validation dataset and looking at average slope of the curve towards the end. If the curve is going up then you're underfitting; if it's going down then you're overfitting. And if it's flat, then you found sweet spot.
As for feature importance, many decision tree libraries can provide you this info. I know xgboost does. Never worked with sklearn, but a cursory search revealed this page:
https://stackoverflow.com/questions/49170296/scikit-learn-feature-importance-calculation-in-decision-trees
So it appears to be available in sklearn as well.

HTH

--Tony


On 1/2/2022 3:45 PM, pranav@xxxxxxxxxxxxxxxxx wrote:

Hi all,

I have built a decision tree in python using the sklearn library. How do I
evaluate if  it is overfitting or underfitting?

I also need to do some feature engineering in terms of determining which
features are contributing most to my model. I am thinking about using the
shap values but if I print them, I get numbers which I am not sure how to
interpret.

Pranav

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq
** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

Other related posts: