As previously, these slide are intended to be read in conjunction with the "Isolet Demo" from the course repo.
sklearn.ensemble
API. RandomForestClassifier
from sklearn.ensemble
implements RF for
classificationRandomForestRegressor
can be used for regression problems. from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
n_estimators
- how many trees to use,max_depth
- how far down to grow the trees,min_samples_split
and min_samples_leaf
are alternatives,max_features
- maximum number of features to use for each tree,max_samples
- fit the tree to a subsample rather than full bootstrap s
sample. RandomForestClassifier(
n_estimators=500, # number of trees
criterion='entropy',
max_depth=None,
max_features='sqrt',
oob_score=True, # use CV otherwise
max_samples=0.5, # smaller yields more regularization
n_jobs=2
)
max_features
has been found to work well.GradientBoostedClassifier
and GradientBoostedRegressor
.xgboost
implementation is also popular. from sklearn.ensemble import GradientBoostedRegressor
from sklearn.ensemble import GradientBoostedClassifier
n_estimators
- the number of boosting rounds (e.g. number of trees)learning_rate
- value by which to scale each new regressor subsample
- fraction of sample to use in each boosting roundmax_depth
- maximum depth of trees, see also:
min_impurity_decrease
, min_samples_split
, and min_samples_leaf
for
implicit control of tree depthmax_features
- maximum number of features to use for each tree,gb1 = GradientBoostingClassifier(
loss='deviance',
n_estimators=100, # number of trees
learning_rate=.1,
subsample=1,
max_depth=16,
max_features='sqrt',
verbose=0
)
.staged_predict_proba()
can be used to get estimates
from the working model after each boosting round ("stage").