[program-l] Re: Python: How do I evaluate if my decision tree is overfitting

  • From: Tony Malykh <anton.malykh@xxxxxxxxx>
  • To: program-l@xxxxxxxxxxxxx, pranav@xxxxxxxxxxxxxxxxx
  • Date: Tue, 4 Jan 2022 14:05:52 -0800

You can use my AudioChart NVDA add-on. I saw there was also some similar sonification toolkit for Jupyter if you use Jupyter.

Alternatively just get an average for last n epochs and compare it to the average for previous n epochs.


On 1/4/2022 2:52 AM, pranav@xxxxxxxxxxxxxxxxx wrote:

Hi Tony,
<snip ML 101 method of identifying overfitting is plotting your metric value by 
epoch (or by # of trees in your case) on evaluation or validation dataset and 
looking at average slope of the curve towards the end. If the curve is going up 
then you're underfitting; if it's going down then you're overfitting. And if it's 
flat,
PL]  How do I do this as a blind programmer? Should I use audio graphs or are 
there some statistics I can use?

Pranav
-----Original Message-----
From: program-l-bounce@xxxxxxxxxxxxx <program-l-bounce@xxxxxxxxxxxxx> On Behalf 
Of Tony Malykh
Sent: Tuesday, January 4, 2022 5:31 AM
To: program-l@xxxxxxxxxxxxx; pranav@xxxxxxxxxxxxxxxxx
Subject: [program-l] Re: Python: How do I evaluate if my decision tree is 
overfitting

Performance on new data is always going to be worse than on training data 
regardless whether your model is overfitted or not.
ML 101 method of identifying overfitting is plotting your metric value by epoch 
(or by # of trees in your case) on evaluation or validation dataset and looking 
at average slope of the curve towards the end. If the curve is going up then 
you're underfitting; if it's going down then you're overfitting. And if it's 
flat, then you found sweet spot.
As for feature importance, many decision tree libraries can provide you this 
info. I know xgboost does. Never worked with sklearn, but a cursory search 
revealed this page:
https://stackoverflow.com/questions/49170296/scikit-learn-feature-importance-calculation-in-decision-trees
So it appears to be available in sklearn as well.

HTH

--Tony


On 1/2/2022 3:45 PM, pranav@xxxxxxxxxxxxxxxxx wrote:
Hi all,

I have built a decision tree in python using the sklearn library. How
do I evaluate if  it is overfitting or underfitting?

I also need to do some feature engineering in terms of determining
which features are contributing most to my model. I am thinking about
using the shap values but if I print them, I get numbers which I am
not sure how to interpret.

Pranav

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq
** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq
** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

Other related posts: