[program-l] Re: Python: How do I evaluate if my decision tree is overfitting

From: Tony Malykh <anton.malykh@xxxxxxxxx>
To: program-l@xxxxxxxxxxxxx, pranav@xxxxxxxxxxxxxxxxx
Date: Tue, 4 Jan 2022 14:05:52 -0800

You can use my AudioChart NVDA add-on. I saw there was also some similar sonification toolkit for Jupyter if you use Jupyter.

Alternatively just get an average for last n epochs and compare it to the average for previous n epochs.

On 1/4/2022 2:52 AM, pranav@xxxxxxxxxxxxxxxxx wrote:

Hi Tony,
<snip ML 101 method of identifying overfitting is plotting your metric value by
epoch (or by # of trees in your case) on evaluation or validation dataset and
looking at average slope of the curve towards the end. If the curve is going up
then you're underfitting; if it's going down then you're overfitting. And if it's
flat,
PL] How do I do this as a blind programmer? Should I use audio graphs or are
there some statistics I can use?

Pranav
-----Original Message-----
From: program-l-bounce@xxxxxxxxxxxxx <program-l-bounce@xxxxxxxxxxxxx> On Behalf
Of Tony Malykh
Sent: Tuesday, January 4, 2022 5:31 AM
To: program-l@xxxxxxxxxxxxx; pranav@xxxxxxxxxxxxxxxxx
Subject: [program-l] Re: Python: How do I evaluate if my decision tree is
overfitting

Performance on new data is always going to be worse than on training data
regardless whether your model is overfitted or not.
ML 101 method of identifying overfitting is plotting your metric value by epoch
(or by # of trees in your case) on evaluation or validation dataset and looking
at average slope of the curve towards the end. If the curve is going up then
you're underfitting; if it's going down then you're overfitting. And if it's
flat, then you found sweet spot.
As for feature importance, many decision tree libraries can provide you this
info. I know xgboost does. Never worked with sklearn, but a cursory search
revealed this page:
https://stackoverflow.com/questions/49170296/scikit-learn-feature-importance-calculation-in-decision-trees
So it appears to be available in sklearn as well.

HTH

--Tony

On 1/2/2022 3:45 PM, pranav@xxxxxxxxxxxxxxxxx wrote:

Hi all,

I have built a decision tree in python using the sklearn library. How
do I evaluate if it is overfitting or underfitting?

I also need to do some feature engineering in terms of determining
which features are contributing most to my model. I am thinking about
using the shap values but if I print them, I get numbers which I am
not sure how to interpret.

Pranav

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

** To leave the list, click on the immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** program-l-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:program-l-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** program-l-request@xxxxxxxxxxxxx with the Subject:- faq

References:
- [program-l] Python: How do I evaluate if my decision tree is overfitting
  - From: pranav
- [program-l] Re: Python: How do I evaluate if my decision tree is overfitting
  - From: Tony Malykh
- [program-l] Re: Python: How do I evaluate if my decision tree is overfitting
  - From: pranav

[program-l] Re: Python: How do I evaluate if my decision tree is overfitting

Other related posts: