Utilities
- sealion.utils.confusion_matrix(y_pred, y_test, plot=True)
Confusion matrices are an often used tool for seeing how well your model did and where it can improve.
A confusion matrix is just a matrix with the number of a certain class classified as another. So if your classifier predicted
[0, 1]
and the correct answers were[1, 0]
- then you would get 1 zero predicted as a 1, 1 one predicted as a 0, and no 0s predicted as 0s and 1s predicted as 1s. By default this method will plot the confusion matrix, but if you don’t want it just set it to False.Some warnings - make sure that y_pred and y_test are both 1D and start at 0. Also make sure that all predictions in y_pred exist in y_test. This will probably not be a big deal with traditional datasets for machine learning.
To really understand this function try it out - it’ll become clear with the visualization.
- Parameters
y_pred – predictions (1D)
y_test – labels (1D)
plot – whether or not to plot, default True
- Returns
the matrix, and show a visualization of the confusion matrix (if plot = True)
- sealion.utils.one_hot(indices, depth)
So you’ve got a feature in a data where a certain value represents a certain category. For example it could be that 1 represents sunny, 2 represents rainy, and 0 represents cloudy. Well what’s the problem? If you feed this to your model - it’s going to think that 2 and 1 are similar, because they are just 1 apart - despite the fact that they are really just categories. To fix that you can feed in your features - say it’s a list like
[2, 2, 1, 0, 1, 0]
and it will be one hot encoded to whatever depth you please.Here’s an example (it’ll make it very clear):
>>> features_weather = [1, 2, 2, 2, 1, 0] >>> one_hot_features = one_hot(features_weather, depth = 3) #depth is three here because you have 3 categories - rainy, sunny, cloudy >>> one_hot_features [[0, 1, 0], #1 is at first index [0, 0, 1], #2 is at second index [0, 0, 1], #2 is at second index [0, 0, 1], #2 is at second index [0, 1, 0], #1 is at first index [1, 0, 0]] #0 is at 0 index
That looks like our data features, one hot encoded at whatever index. Make sure to set the depth param correctly.
For these such things, play around - it will help.
- Parameters
indices – features_weather or something similar as shown above
depth – How many categories you have
- Returns
one-hotted features
- sealion.utils.revert_one_hot(one_hot_data)
Say from the
one_hot()
data you’ve gotten something like this :[[0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]
and you want to change it back to its original form. Use this function - it will turn that one-hotted data above to
[2, 0, 1, 2, 0]
- which is just the index of the one in each data point of the list. Hence it is reverting the one_hot transformation.- Parameters
one_hot_data – data in one_hot form. Must be a numpy array (2D)!
- Returns
index of the one for each of the data points in a 1D numpy array
- sealion.utils.revert_softmax(softmax_data)
Say from the softmax function (in neural networks) you’ve gotten something like this :
[[0.2, 0.3, 0.5], [0.8, 0.1, 0.1], [0.15, 0.8, 0.05], [0.3, 0.3, 0.4], [0.7, 0.15, 0.15]]
and you want to change it back to its pre-softmax form. Use this function - it will turn that softmax-ed data above to
[2, 0, 1, 2, 0]
- which is just the index of the one in each data point of the list. Hence it is reverting the softmax transformation.- Parameters
softmax_data – data in one_hot form. Must be a numpy array (2D)!
- Returns
index of the one for each of the data points in a 1D numpy array
- sealion.utils.precision(tp, fp, tn, fn)
Precision is simply a measure of how much of the data we said are positive are actually positive. Used mostly for binary classification.
- Parameters
tp – number of true positives
fp – number of false positives
tn – number of true negatives
fn – number of false negatives
- Returns
precision metric
- sealion.utils.recall(tp, fp, tn, fn)
Recall is a measure of how much of the data that is actually positive was classified to be positive. There’s a tradeoff between precision and recall, because as you decrease precision by predicting less positives, you increase recall by predicting more negatives (that are really positive - this is known as a false negative.)
- Parameters
tp – number of true positives
fp – number of false positives
tn – number of true negatives
fn – number of false negatives
- Returns
recall metric
- sealion.utils.f1_score(precision, recall)
The f1_score is harmonic mean between the precision and recall scores. It is used to assemble them into one score. The harmonic mean is used often times to combine information about different measures, even if they are in different units.
:param precision : precision score :param recall : recall score :return: f1_score