Decision Trees
- class sealion.decision_trees.DecisionTree(max_branches=inf, min_samples=1)
Decision Trees are powerful algorithms that create a tree by looking at the data and finding what questions are best to ask? For example if you are trying to predict whether somebody has cancer it may ask, “Do they have cells with a
size >= 5
?” or “Is there any history of smoking or drinking in the family?” These algorithms can easily fit most datasets, and are very well capable of overfitting (luckily we offer some parameters to help with that.)If you are going to be using categorical features with this Decision Tree, make sure to one hot encode it. You can do that with the
one_hot()
function in the utils module.This Decision Tree class doesn’t support regression, or any labels that have continuous values. It is only for classification tasks. The reason this is such is because most curves and lines can be formed or modeled better with regression algorithms (Linear, Polynomial, Ridge, etc. are all available in the regression module) or neural networks. Decision Trees will typically overfit on such tasks.
- __init__(max_branches=inf, min_samples=1)
- Parameters
max_branches – maximum number of branches for the decision tree
min_samples – minimum number of samples in a branch for the branch to split
- average_branches()
- Returns
an estimate of how many branches the decision tree has right now with its current parameters.
- evaluate(x_test, y_test)
- Parameters
x_test – 2D testing data.
y_test – 1D testing labels.
- Returns
accuracy score.
- fit(x_train, y_train)
- Parameters
x_train – training data - 2D (make sure to one_hot categorical features)
y_train – training labels
- Returns
- give_tree(tree)
- Parameters
tree – a tree made by you or given by give_best_tree in the RandomForest module
- Returns
None, just that the tree you give is now the tree used by the decision tree.
- predict(x_test)
- Parameters
x_test – 2D prediction data.
- Returns
predictions in 1D vector/list.
- return_tree()
- Returns
the tree inside (if you want to look at it)
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions given by model
y_test – actual labels
- Returns
an image of the predictions and the labels.