Decision Trees

class sealion.decision_trees.DecisionTree(max_branches=inf, min_samples=1)

Decision Trees are powerful algorithms that create a tree by looking at the data and finding what questions are best to ask? For example if you are trying to predict whether somebody has cancer it may ask, “Do they have cells with a size >= 5?” or “Is there any history of smoking or drinking in the family?” These algorithms can easily fit most datasets, and are very well capable of overfitting (luckily we offer some parameters to help with that.)

If you are going to be using categorical features with this Decision Tree, make sure to one hot encode it. You can do that with the one_hot() function in the utils module.

This Decision Tree class doesn’t support regression, or any labels that have continuous values. It is only for classification tasks. The reason this is such is because most curves and lines can be formed or modeled better with regression algorithms (Linear, Polynomial, Ridge, etc. are all available in the regression module) or neural networks. Decision Trees will typically overfit on such tasks.

__init__(max_branches=inf, min_samples=1)

Parameters

max_branches – maximum number of branches for the decision tree
min_samples – minimum number of samples in a branch for the branch to split

average_branches()

Returns: an estimate of how many branches the decision tree has right now with its current parameters.

evaluate(x_test, y_test)

Parameters

x_test – 2D testing data.
y_test – 1D testing labels.

Returns

accuracy score.

fit(x_train, y_train)

Parameters

x_train – training data - 2D (make sure to one_hot categorical features)
y_train – training labels

Returns

give_tree(tree)

Parameters: tree – a tree made by you or given by give_best_tree in the RandomForest module
Returns: None, just that the tree you give is now the tree used by the decision tree.

predict(x_test)

Parameters: x_test – 2D prediction data.
Returns: predictions in 1D vector/list.

return_tree()

Returns: the tree inside (if you want to look at it)

visualize_evaluation(y_pred, y_test)

Parameters

y_pred – predictions given by model
y_test – actual labels

Returns

an image of the predictions and the labels.