Utilities

sealion.utils.confusion_matrix(y_pred, y_test, plot=True)

Confusion matrices are an often used tool for seeing how well your model did and where it can improve.

A confusion matrix is just a matrix with the number of a certain class classified as another. So if your classifier predicted [0, 1] and the correct answers were [1, 0] - then you would get 1 zero predicted as a 1, 1 one predicted as a 0, and no 0s predicted as 0s and 1s predicted as 1s. By default this method will plot the confusion matrix, but if you don’t want it just set it to False.

Some warnings - make sure that y_pred and y_test are both 1D and start at 0. Also make sure that all predictions in y_pred exist in y_test. This will probably not be a big deal with traditional datasets for machine learning.

To really understand this function try it out - it’ll become clear with the visualization.

Parameters
  • y_pred – predictions (1D)

  • y_test – labels (1D)

  • plot – whether or not to plot, default True

Returns

the matrix, and show a visualization of the confusion matrix (if plot = True)

sealion.utils.one_hot(indices, depth)

So you’ve got a feature in a data where a certain value represents a certain category. For example it could be that 1 represents sunny, 2 represents rainy, and 0 represents cloudy. Well what’s the problem? If you feed this to your model - it’s going to think that 2 and 1 are similar, because they are just 1 apart - despite the fact that they are really just categories. To fix that you can feed in your features - say it’s a list like [2, 2, 1, 0, 1, 0] and it will be one hot encoded to whatever depth you please.

Here’s an example (it’ll make it very clear):

>>> features_weather = [1, 2, 2, 2, 1, 0]
>>> one_hot_features = one_hot(features_weather, depth  = 3) #depth is three here because you have 3 categories - rainy, sunny, cloudy
>>> one_hot_features
[[0, 1, 0], #1 is at first index
 [0, 0, 1], #2 is at second index
 [0, 0, 1], #2 is at second index
 [0, 0, 1], #2 is at second index
 [0, 1, 0], #1 is at first index
 [1, 0, 0]] #0 is at 0 index

That looks like our data features, one hot encoded at whatever index. Make sure to set the depth param correctly.

For these such things, play around - it will help.

Parameters
  • indices – features_weather or something similar as shown above

  • depth – How many categories you have

Returns

one-hotted features

sealion.utils.revert_one_hot(one_hot_data)

Say from the one_hot() data you’ve gotten something like this :

[[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0]]

and you want to change it back to its original form. Use this function - it will turn that one-hotted data above to [2, 0, 1, 2, 0] - which is just the index of the one in each data point of the list. Hence it is reverting the one_hot transformation.

Parameters

one_hot_data – data in one_hot form. Must be a numpy array (2D)!

Returns

index of the one for each of the data points in a 1D numpy array

sealion.utils.revert_softmax(softmax_data)

Say from the softmax function (in neural networks) you’ve gotten something like this :

[[0.2, 0.3, 0.5],
[0.8, 0.1, 0.1],
[0.15, 0.8, 0.05],
[0.3, 0.3, 0.4],
[0.7, 0.15, 0.15]]

and you want to change it back to its pre-softmax form. Use this function - it will turn that softmax-ed data above to [2, 0, 1, 2, 0] - which is just the index of the one in each data point of the list. Hence it is reverting the softmax transformation.

Parameters

softmax_data – data in one_hot form. Must be a numpy array (2D)!

Returns

index of the one for each of the data points in a 1D numpy array

sealion.utils.precision(tp, fp, tn, fn)

Precision is simply a measure of how much of the data we said are positive are actually positive. Used mostly for binary classification.

Parameters
  • tp – number of true positives

  • fp – number of false positives

  • tn – number of true negatives

  • fn – number of false negatives

Returns

precision metric

sealion.utils.recall(tp, fp, tn, fn)

Recall is a measure of how much of the data that is actually positive was classified to be positive. There’s a tradeoff between precision and recall, because as you decrease precision by predicting less positives, you increase recall by predicting more negatives (that are really positive - this is known as a false negative.)

Parameters
  • tp – number of true positives

  • fp – number of false positives

  • tn – number of true negatives

  • fn – number of false negatives

Returns

recall metric

sealion.utils.f1_score(precision, recall)

The f1_score is harmonic mean between the precision and recall scores. It is used to assemble them into one score. The harmonic mean is used often times to combine information about different measures, even if they are in different units.

:param precision : precision score :param recall : recall score :return: f1_score