As previously, these slide are intended to be read in conjunction with the "Isolet Demo" from the course repo.
sklearn.ensemble API. RandomForestClassifier from sklearn.ensemble implements RF for
classificationRandomForestRegressor can be used for regression problems. from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
n_estimators - how many trees to use,max_depth - how far down to grow the trees,min_samples_split and min_samples_leaf are alternatives,max_features - maximum number of features to use for each tree,max_samples - fit the tree to a subsample rather than full bootstrap s
sample. RandomForestClassifier(
n_estimators=500, # number of trees
criterion='entropy',
max_depth=None,
max_features='sqrt',
oob_score=True, # use CV otherwise
max_samples=0.5, # smaller yields more regularization
n_jobs=2
)
max_features has been found to work well.GradientBoostedClassifier and GradientBoostedRegressor.xgboost implementation is also popular. from sklearn.ensemble import GradientBoostedRegressor
from sklearn.ensemble import GradientBoostedClassifier
n_estimators - the number of boosting rounds (e.g. number of trees)learning_rate - value by which to scale each new regressor subsample - fraction of sample to use in each boosting roundmax_depth - maximum depth of trees, see also:
min_impurity_decrease, min_samples_split, and min_samples_leaf for
implicit control of tree depthmax_features - maximum number of features to use for each tree,gb1 = GradientBoostingClassifier(
loss='deviance',
n_estimators=100, # number of trees
learning_rate=.1,
subsample=1,
max_depth=16,
max_features='sqrt',
verbose=0
)
.staged_predict_proba() can be used to get estimates
from the working model after each boosting round ("stage").