Decision Trees

class sealion.decision_trees.DecisionTree(max_branches=inf, min_samples=1)

Decision Trees are powerful algorithms that create a tree by looking at the data and finding what questions are best to ask? For example if you are trying to predict whether somebody has cancer it may ask, “Do they have cells with a size >= 5?” or “Is there any history of smoking or drinking in the family?” These algorithms can easily fit most datasets, and are very well capable of overfitting (luckily we offer some parameters to help with that.)

If you are going to be using categorical features with this Decision Tree, make sure to one hot encode it. You can do that with the one_hot() function in the utils module.

This Decision Tree class doesn’t support regression, or any labels that have continuous values. It is only for classification tasks. The reason this is such is because most curves and lines can be formed or modeled better with regression algorithms (Linear, Polynomial, Ridge, etc. are all available in the regression module) or neural networks. Decision Trees will typically overfit on such tasks.

__init__(max_branches=inf, min_samples=1)
Parameters
  • max_branches – maximum number of branches for the decision tree

  • min_samples – minimum number of samples in a branch for the branch to split

average_branches()
Returns

an estimate of how many branches the decision tree has right now with its current parameters.

evaluate(x_test, y_test)
Parameters
  • x_test – 2D testing data.

  • y_test – 1D testing labels.

Returns

accuracy score.

fit(x_train, y_train)
Parameters
  • x_train – training data - 2D (make sure to one_hot categorical features)

  • y_train – training labels

Returns

give_tree(tree)
Parameters

tree – a tree made by you or given by give_best_tree in the RandomForest module

Returns

None, just that the tree you give is now the tree used by the decision tree.

predict(x_test)
Parameters

x_test – 2D prediction data.

Returns

predictions in 1D vector/list.

return_tree()
Returns

the tree inside (if you want to look at it)

visualize_evaluation(y_pred, y_test)
Parameters
  • y_pred – predictions given by model

  • y_test – actual labels

Returns

an image of the predictions and the labels.