diff --git a/notebooks/ensemble_ex_03.ipynb b/notebooks/ensemble_ex_03.ipynb index 895d786c5..f9d1e4590 100644 --- a/notebooks/ensemble_ex_03.ipynb +++ b/notebooks/ensemble_ex_03.ipynb @@ -101,20 +101,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Both gradient boosting and random forest models improve when increasing the\n", - "number of trees in the ensemble. However, the scores reach a plateau where\n", - "adding new trees just makes fitting and scoring slower.\n", + "Random forest models improve when increasing the number of trees in the\n", + "ensemble. However, the scores reach a plateau where adding new trees just\n", + "makes fitting and scoring slower.\n", "\n", - "To avoid adding new unnecessary tree, unlike random-forest gradient-boosting\n", + "Gradient boosting models overfit when the number of trees is too large. To\n", + "avoid adding a new unnecessary tree, unlike random-forest gradient-boosting\n", "offers an early-stopping option. Internally, the algorithm uses an\n", "out-of-sample set to compute the generalization performance of the model at\n", "each addition of a tree. Thus, if the generalization performance is not\n", "improving for several iterations, it stops adding trees.\n", "\n", "Now, create a gradient-boosting model with `n_estimators=1_000`. This number\n", - "of trees is certainly too large. Change the parameter `n_iter_no_change` such\n", - "that the gradient boosting fitting stops after adding 5 trees that do not\n", - "improve the overall generalization performance." + "of trees is certainly too large. Change the parameter `n_iter_no_change`\n", + "such that the gradient boosting fitting stops after adding 5 trees to avoid\n", + "deterioration of the overall generalization performance." ] }, { diff --git a/notebooks/ensemble_sol_03.ipynb b/notebooks/ensemble_sol_03.ipynb index 7fc5dae16..4906e1b55 100644 --- a/notebooks/ensemble_sol_03.ipynb +++ b/notebooks/ensemble_sol_03.ipynb @@ -129,20 +129,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Both gradient boosting and random forest models improve when increasing the\n", - "number of trees in the ensemble. However, the scores reach a plateau where\n", - "adding new trees just makes fitting and scoring slower.\n", + "Random forest models improve when increasing the number of trees in the\n", + "ensemble. However, the scores reach a plateau where adding new trees just\n", + "makes fitting and scoring slower.\n", "\n", - "To avoid adding new unnecessary tree, unlike random-forest gradient-boosting\n", + "Gradient boosting models overfit when the number of trees is too large. To\n", + "avoid adding a new unnecessary tree, unlike random-forest gradient-boosting\n", "offers an early-stopping option. Internally, the algorithm uses an\n", "out-of-sample set to compute the generalization performance of the model at\n", "each addition of a tree. Thus, if the generalization performance is not\n", "improving for several iterations, it stops adding trees.\n", "\n", "Now, create a gradient-boosting model with `n_estimators=1_000`. This number\n", - "of trees is certainly too large. Change the parameter `n_iter_no_change` such\n", - "that the gradient boosting fitting stops after adding 5 trees that do not\n", - "improve the overall generalization performance." + "of trees is certainly too large. Change the parameter `n_iter_no_change`\n", + "such that the gradient boosting fitting stops after adding 5 trees to avoid\n", + "deterioration of the overall generalization performance." ] }, { @@ -167,7 +168,7 @@ "source": [ "We see that the number of trees used is far below 1000 with the current\n", "dataset. Training the gradient boosting model with the entire 1000 trees would\n", - "have been useless." + "have been detrimental." ] }, { diff --git a/python_scripts/ensemble_ex_03.py b/python_scripts/ensemble_ex_03.py index 72f8f362c..cecb9484a 100644 --- a/python_scripts/ensemble_ex_03.py +++ b/python_scripts/ensemble_ex_03.py @@ -64,20 +64,21 @@ # Write your code here. # %% [markdown] -# Both gradient boosting and random forest models improve when increasing the -# number of trees in the ensemble. However, the scores reach a plateau where -# adding new trees just makes fitting and scoring slower. +# Random forest models improve when increasing the number of trees in the +# ensemble. However, the scores reach a plateau where adding new trees just +# makes fitting and scoring slower. # -# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting +# Gradient boosting models overfit when the number of trees is too large. To +# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting # offers an early-stopping option. Internally, the algorithm uses an # out-of-sample set to compute the generalization performance of the model at # each addition of a tree. Thus, if the generalization performance is not # improving for several iterations, it stops adding trees. # # Now, create a gradient-boosting model with `n_estimators=1_000`. This number -# of trees is certainly too large. Change the parameter `n_iter_no_change` such -# that the gradient boosting fitting stops after adding 5 trees that do not -# improve the overall generalization performance. +# of trees is certainly too large. Change the parameter `n_iter_no_change` +# such that the gradient boosting fitting stops after adding 5 trees to avoid +# deterioration of the overall generalization performance. # %% # Write your code here. diff --git a/python_scripts/ensemble_sol_03.py b/python_scripts/ensemble_sol_03.py index a72542464..55f882443 100644 --- a/python_scripts/ensemble_sol_03.py +++ b/python_scripts/ensemble_sol_03.py @@ -86,20 +86,21 @@ ) # %% [markdown] -# Both gradient boosting and random forest models improve when increasing the -# number of trees in the ensemble. However, the scores reach a plateau where -# adding new trees just makes fitting and scoring slower. +# Random forest models improve when increasing the number of trees in the +# ensemble. However, the scores reach a plateau where adding new trees just +# makes fitting and scoring slower. # -# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting +# Gradient boosting models overfit when the number of trees is too large. To +# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting # offers an early-stopping option. Internally, the algorithm uses an # out-of-sample set to compute the generalization performance of the model at # each addition of a tree. Thus, if the generalization performance is not # improving for several iterations, it stops adding trees. # # Now, create a gradient-boosting model with `n_estimators=1_000`. This number -# of trees is certainly too large. Change the parameter `n_iter_no_change` such -# that the gradient boosting fitting stops after adding 5 trees that do not -# improve the overall generalization performance. +# of trees is certainly too large. Change the parameter `n_iter_no_change` +# such that the gradient boosting fitting stops after adding 5 trees to avoid +# deterioration of the overall generalization performance. # %% # solution @@ -110,7 +111,7 @@ # %% [markdown] tags=["solution"] # We see that the number of trees used is far below 1000 with the current # dataset. Training the gradient boosting model with the entire 1000 trees would -# have been useless. +# have been detrimental. # %% [markdown] # Estimate the generalization performance of this model again using the