SeaLion
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.
Installation
SeaLion is provided on PyPI, and can be installed with Pip:
pip install sealion
On the first import, SeaLion will compile Cython files. This will take a while (around 30 seconds).
Regression
This module contains all of the regression modules you will need.
- class sealion.regression.LinearRegression
While KNNs (K-Nearest-Neighbors) may be the simplest ML algorithms out there, Linear Regression is probably the one you heard first. You may have used it on a TI-84 before - all it does is it fit a line to the data. It does this through the gradient descent algorithm or a neat normal equation. We will use the normal equation as usually it works just the same and is faster, but for much larger datasets you should explore neural networks or dimensionality reduction (check the algos there.) The equation in school taught is
y = mx + b
, but we’ll denote it asy_hat = m1x1 + m2x2 ... mNxN + b
. The hat is meant to resemble the predictions, and the reason we do it fromm1...mN
is because our data can be in N dimensions, not necessarily one.Some other things to know is that your data for
x_train
should always be 2D. 2D means that it is[[]]
. This doesn’t mean the data is necessarily 2D (this could look like[[1, 2], [2, 3]]
) - but just means that its lists inside lists.y_train
is your labels, or the “correct answers” so it should be in a 1D list, which is just a list. This library assumes just a bit of knowledge about this - and it isn’t too difficult - so feel free to search this up.Another thing to note here is that for our library you can enter in numpy arrays of python lists, but you will always get numpy arrays back (standard practice with other libs, too.)
A lot of the methods here are consistent and same to a lot of the other classes of the library, so reading this will make it a lot easier down the line.
The goal of this module is it for it to be a useful algorithm, but I also hope this is inspiring to your journey of machine learning. It isn’t nearly as hard as it seems.
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
r^2
score
- fit(x_train, y_train)
- Parameters
x_train – 2D training data
y_train – 1D training labels
- Returns
None
- predict(x_test)
- Parameters
x_test – 2D prediction data
- Returns
predictions in a 1D numpy array
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.LogisticRegression(accuracy_desired=0.95, learning_rate=0.01, max_iters=1000, show_acc=True)
Logistic Regression is in the sweet spot of being easy to understand and useful. Despite having “regression” in the name, what it does is binary classification. Say you had a dataset of people’s heights and weights, along with other attributes (we call them features) and you wanted to predict whether they were healthy (0) or unhealthy (1). You may choose to use logistic regression as this task is binary classification (classifying 2 categories.) Make sure your labels are 0 and 1.
Logistic Regression doesn’t have a closed form solution, so we’ll have to use the gradient descent algorithm. It may take longer but we’ve provided a progress bar for you to see how it’s going.
You may want to look into is the sigmoid function, it’s what really is at the core of distinguishing logistic and linear regression. It’ll make more sense after you look at the differences in their output equations.
With that in mind, we can get started!
- __init__(accuracy_desired=0.95, learning_rate=0.01, max_iters=1000, show_acc=True)
- Parameters
accuracy_desired – the accuracy at which the fit() method can stop
learning_rate – how fast gradient descent should run
max_iters – how many iterations of gradient descent are allowed
show_acc – whether or not to show the accuracy in the fit() method
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
what % of its predictions on x_test were correct
- fit(x_train, y_train)
- Parameters
x_train – training data (2D)
y_train – training labels (1D)
- Returns
- predict(x_test)
- Parameters
x_test – prediction data (2D)
- Returns
predictions in a 1D vector/list
- reset_params()
Run this after the fit() method has been run please. :return: Nothing, just redoes the weight init for all parameters.
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.SoftmaxRegression(num_classes, accuracy_desired=0.95, learning_rate=0.01, max_iters=1000, show_acc=True)
Once you know logistic regression, softmax regression is a breeze. Logistic regression is really a specific type of softmax regression, where the number of classes is equal to 2. Whereas logistic regression only can predict for 2 classes (0 and 1), softmax regression can predict for N number of classes. This can be 5, 3, or even a thousand! To define the number of classes in this model, insert the
num_classes
parameter in the init. ALL parameters in this class are the same as in logistic regression except for that argument.Another note, if you use softmax regression with 2 classes - you just end up using logistic regression. In general you should use logistic regression if there are only 2 classes as it is faster and optimized as such.
If you’re interested in the theory, the primary change is from the sigmoid function to the softmax function. Also look into the log loss and crossentropy loss, both of them are at the heart of softmax and logistic regression. Maybe interesting to read up on that.
- __init__(num_classes, accuracy_desired=0.95, learning_rate=0.01, max_iters=1000, show_acc=True)
:param num_classes : number of classes your dataset has :param accuracy_desired: the accuracy at which the fit() method can stop :param learning_rate: how fast gradient descent should run :param max_iters: how many iterations of gradient descent are allowed :param show_acc: whether or not to show the accuracy in the fit() method
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
what % of its predictions on x_test were correct
- fit(x_train, y_train)
- Parameters
x_train – training data (2D)
y_train – training labels (1D)
- Returns
- predict(x_test)
- Parameters
x_test – prediction data (2D)
- Returns
predictions in a 1D vector/list
- reset_params()
Run this after the fit() method has been run please. :return: Nothing, just redoes the weight init for all parameters.
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.RidgeRegression(alpha=0.5)
Imagine you have a dataset with 1000 features. Most of these features will usually be irrelevant to the task your solving; only a few of them will really matter. If you don’t want to use Dimensionality Reduction (see the algorithms there), you may want to consider using this. What ridge regression does is try to keep the weights as small as possible, a.k.a. regularization. This is because if a weight of a feature is not needed you want it to be 0 - you don’t want it to be 0.01 because of overfitting or the particular instances of the training data. Therefore it will work well with many features as it reduces the weights, hence making the model overfit less and generalize more (do we really need those 0.1s and 0.2s in the weights?) As StatQuest said, it “desensitizes” the training data (highly recommend you watch that video - regularization can be tough.)
You’ll be glad to know that this uses a closed form solution (generally much faster than iterative gradient descent.) There are other algorithms for regularization in regression like Lasso and Elastic Net, a combination of Lasso and Ridge, that are available in this module.
There’s only one parameter you need to worry about, which is alpha. It is simply how much to punish the model for its weights (especially unnecessary ones). It’s typically set between 0 and 1.
- __init__(alpha=0.5)
Set the alpha parameter for the model.
- Parameters
alpha – default 0.5, ranges from 0 - 1
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
r^2 score
- fit(x_train, y_train)
- Parameters
x_train – 2D training data
y_train – 1D training labels
- Returns
- predict(x_test)
- Parameters
x_test – 2D prediction data
- Returns
predictions in a 1D numpy array
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.LassoRegression(alpha=0.5, accuracy_desired=0.95, learning_rate=0.01, max_iters=10000, show_acc=True)
Regularizer for regression models just like Ridge Regression. A few notable differences, but for the most part will do the same thing. Lasso regression tries to minimize the weights just like ridge regression, but one of its big differences is its tendency to make the weights of the regression model 0. This greatly decreases overfitting by making sure unnecessary features aren’t considered in the model.
Another difference is the use of gradient descent instead of a closed form solution like Ridge Regression. It shares the same alpha parameter to determine how much you want to “punish” (i.e. reduce) the weights, especially those not needed.
- __init__(alpha=0.5, accuracy_desired=0.95, learning_rate=0.01, max_iters=10000, show_acc=True)
- Parameters
alpha – penalty for the weights, applied to both lasso and ridge components
Check above documentation for all other parameters.
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
r^2 score for the predictions of x_test and y_test
- fit(x_train, y_train)
- Parameters
x_train – training data (must be 2D)
y_train – training labels (must be 1D)
- Returns
None, just the model has the weights and biases stored
- predict(x_test)
- Parameters
x_test – testing data (must be 2D)
- Returns
predictions (1D vector/list)
- reset_params()
Resets all weights and biases of the model. :return: none
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.ElasticNet(l1_r=0.5, alpha=0.5, accuracy_desired=0.95, learning_rate=0.01, max_iters=1000, show_acc=True)
Elastic Net is a combination of Ridge and Lasso Regression. It implements both penalties, and you just decide how much weight each should have. The parameter
l1_r
in the__init__
of this class is the amount of “importance” lasso regression has (specifically the regularization term), on a scale from 0 - 1. If lasso regression gets an “importance” of 0.7, then we give the ridge regression part of this model an “importance” 0.3. Uses gradient descent, as there is no closed form solution.- __init__(l1_r=0.5, alpha=0.5, accuracy_desired=0.95, learning_rate=0.01, max_iters=1000, show_acc=True)
- Parameters
l1_r – The weight that lasso regression gets in this model. Default 0.5, but setting it higher tips the scale. Setting it to 0 or 1 makes it just ridge or lasso regression.
alpha – penalty for the weights, applied to both lasso and ridge components
Check above documentation for all other parameters.
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
r^2 score for the predictions of x_test and y_test
- fit(x_train, y_train)
- Parameters
x_train – training data (must be 2D)
y_train – training labels (must be 1D)
- Returns
None, just the model has the weights and biases stored
- predict(x_test)
- Parameters
x_test – testing data (must be 2D)
- Returns
predictions (1D vector/list)
- reset_params()
Resets all weights and biases of the model. :return: none
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.ExponentialRegression
Say you’ve got some curved data. How can a line possibly fit that data? Glad you ask - that’s why this class exists. Exponential Regression does something very simple. All it does is just take the log transformation of exponential data, so it becomes a line and model that. Because now we can find
y_transformed_log = mx + b
, we can turn thaty_transformed_log
back intoy
by raising it to powere
.- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
r^2 score
- fit(x_train, y_train)
- Parameters
x_train – training data (2D)
y_train – training labels (1D)
- Returns
- predict(x_test)
- Parameters
x_test – prediction data (2D)
- Returns
predictions in a 1D vector/list
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.regression.PolynomialRegression
Polynomial Regression is like an extended version of Linear Regression (sort of like Exponential Regression.) All it does is turn the equation from y = m1x1 + m2x2 … mNxN + bias to y = m1x1^1 + m2x2^2 … mNxN^N + bias. It just adds those power combinations to make the regression module model the data better. Neural networks can also model those functions similarily.
Please normalize your data with this module with the (X - mu) / sigma or just by dividing by the maximum value. This will help with faster convergence.
Not as famous as some other regression algorithms, so may you need a bit of experimentation.
- evaluate(x_test, y_test)
- Parameters
x_test – data to be evaluated on
y_test – labels
- Returns
r^2 score
- fit(x_train, y_train)
- Parameters
x_train – training data (2D)
y_train – training labels (1D)
- Returns
- predict(x_test)
- Parameters
x_test – points to be predicted on. Has to be stored in 2D array, even if just one data point. Ex.
[[1, 1]]
or[[1, 1], [2, 2], [3, 3]]
- Returns
flattened numpy array of your data going through the forward pass (that’s rounded as it’s either 0 or 1)
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
Neural Networks
Models
- class sealion.neural_networks.models.NeuralNetwork(layers=None)
This class is very rich and packed with methods, so I figured it would be best to have this tutorial guide you on a tutorial on the “hello world” of machine learning.
There are two ways to initialize this model with its layers, through the
__init__()
or through theadd()
methods.The first way:
>>> from sealion import neural_networks as nn >>> layers = [nn.layers.Flatten(), nn.layers.Dense(784, 64, activation=nn.layers.ReLU()), ... nn.layers.Dense(64, 32, activation=nn.layers.ReLU())] >>> nn.layers.Dense(32, 10, activation=nn.layers.Softmax())] >>> model = nn.models.NeuralNetwork(layers)
Or you can go through the second way:
>>> from sealion import neural_networks as nn >>> model = nn.models.NeuralNetwork() >>> model.add(nn.layers.Flatten()) >>> model.add(nn.layers.Dense(784, 64, activation=nn.layers.ReLU())) >>> model.add(nn.layers.Dense(64, 32, activation=nn.layers.ReLU())) >>> model.add(nn.layers.Dense(32, 10, activation=nn.layers.ReLU()))
Either way works just fine.
Next, you will want to perhaps see how complex the model is (having too many parameters means a model can easily overfit), so for that you can just do:
>>> num_params = model.num_parameters() # returns an integer >>> assert num_params == 52650
Looks like our model will be pretty complex. Next up is finalizing and training.
>>> model.finalize(loss=nn.loss.CrossEntropy(), optimizer=nn.optimizers.Adam())
Here we use cross-entropy loss for this classification problem and the Adam optimizer.
Onto training. Assuming our data is
60k * 28 * 28
(sounds a lot like MNIST) in the variable X_train and we have y_train is a one-hot encoded matrix size60k * 10
(10 classes) we can do :>>> model.train(X_train, y_train, epochs=20) # train for 20 epochs
which will work fine. Here we have a batch_size of 32 (default), and the way we make this run fast is by making batch_size datasets, and training them in parallel via multithreading.
If you want to change batch_size to 18 you could do:
>>> model.train(X_train, y_train, epochs=20, batch_size=18) # train for 20 epochs, batch_size 18
If you want the gradients to be calculated over the entire dataset (this will be much longer) you can do:
>>> model.full_batch_train(X_train, y_train, epochs=20)
Lastly there’s also something known as mini-batch gradient descent, which just does gradient descent but randomly chooses a percent of the dataset for gradients to be calculated upon. This cannot be parallelized, but still runs fast:
>>> model.mini_batch_train(X_train, y_train, N=0.1) # here we take 10% of X_train or 6000 data-points ... # randomly selected for calculating gradients at a time
All of these methods have a show_loop parameter on whether you want to see the tqdm loop, which is set true by default.
Now that we have trained our model, time to test and use it. To evaluate it, given X_test (shape
10k * 28 * 28
) and y_test (shape10k * 10
), we can feed this into ourevaluate()
function:>>> model.evaluate(X_test, y_test)
This gives us the loss, which can be not so interpretable - so we have some other options.
If you are doing a classification problem as we are here you may do instead:
>>> model.categorical_evaluate(X_test, y_test)
which just gives what percent of X_test was classified correctly.
For regression just do:
>>> model.regression_evaluate(X_test, y_test)
which gives the
r^2
value of the predictions on X_test. If you are doing this make sure that y_test is not one-hot encoded.To predict on data we can just do:
>>> predictions = model.predict(X_test)
and if we want this in reverted one-hot-encoded form all we need to do is this:
>>> from utils import revert_softmax >>> predictions = revert_softmax(predictions)
Storing around 53,000 parameters in matrices isn’t much, but for good practice let’s store it. Using the give_parameters() method we can save our weights in a list:
>>> parameters = model.give_parameters()
Then we may want to pickle this using the given method, into a file called “MNIST_pickled_params”:
>>> file_name = "MNIST_pickled_params" >>> model.pickle_params(FILE_NAME=file_name)
Now this is the beauty - let’s say a few weeks from now we come back and realize we probably should train for 20 epochs. BUT - we don’t want to have to restart training (imagine this was a really big network.)
So we can just load the weights using the pickle module, build the same model architecture, insert the params, and hit train()!!
>>> import pickle >>> with open('MNIST_pickled_params.pickle', 'rb') as f : >>> params = pickle.load(f)
Now we can create the model structure (must be the EXACT same):
>>> model = nn.models.NeuralNetwork() >>> model.add(nn.layers.Flatten()) >>> model.add(nn.layers.Dense(784, 64, activation = nn.layers.ReLU())) >>> model.add(nn.layers.Dense(64, 32, activation = nn.layers.ReLU())) >>> model.add(nn.layers.Dense(32, 10, activation = nn.layers.ReLU()))
Obviously we want to finalize our model for good practice:
>>> model.finalize(loss = nn.loss.CrossEntropy(), optimizer = nn.optimizers.Adam())
and we can enter in the weights & biases using the enter_parameters():
>>> model.enter_parameters(params) #must be params given from the give_parameters() method
and train!
>>> model.train(X_train, y_train, epochs = 100) #let's try 100 epochs this time
The reason this is such a big deal is because we don’t have to start training from scratch all over again. The model will not have to start from 0% accuracy, but may start with 80% given the params belonging to our partially-trained model we loaded from the pickle file. This is of course not a big deal for something like MNIST but will be for bigger model architectures, or datasets.
Of course there’s a lot more things we could’ve changed, but I think that’s pretty good for now!
- __init__(layers=None)
- finalize(loss, optimizer)
Both of these have to be the classes for the loss and optimizations
Layers
- class sealion.neural_networks.layers.Layer
Base Layer class. All layers inherit from this.
- backward(grad)
This method takes in a gradient (e.x
∂l/∂Z2
) and returns its grad (e.x.dL/dA1
)
- forward(inputs)
This method saves the inputs, and returns its outputs
- class sealion.neural_networks.layers.Flatten
This would be better explained as Image Data Flattener. Let’s say your dealing with MNIST (check out the examples on github) - you will have data that is 60000 by 28 by 28. Neural networks can’t work with data like that - it has to be a matrix. What you could do is take that
60000 * 28 * 28
data and apply this layer to make the data60000 * 784
. 784 because28 * 28 = 784
and we just “squished” this matrix to one layer. If you have data that has colors, e.g.60000 * 28 * 28 * 3
, applying this layer would turn it into a60000 * 2352
matrix,2352 = 28 * 28 * 3
.An example for how this would work with something like MNIST (the grayscale
60k * 28 * 28
dataset) is shown below.>>> from sealion import neural_networks as nn >>> from sealion.neural_networks.models import NeuralNetwork >>> model = NeuralNetwork() >>> model.add(nn.layers.Flatten()) # always, always add that on the first layer >>> model.add(nn.layers.Dense(784, whatever_output_size, more_args)) # now put that data to ``28*28 = 784``. >>> # Do more cool stuff...
- class sealion.neural_networks.layers.Dropout(dropout_rate)
Dropout is one of the most well-known regularization techniques there is. Imagine you were working on a coding project with about 200 people. If we just relied on one person to know how to compile, another person to know how to debug, then what happens when those special people leave for a day, or worse leave forever?
Now I know that seems to have no connection to dropout, but here’s how it does. In dropout this sort of scenario is prevented. “Dropping out”, or setting to 0, some of the outputs of some of the neurons means that the model will have to learn that it can’t just depend on one neuron to learn the most important features. This means has each neuron learn some features, some other features, etc. and can’t just depend on one node. The model will become more robust and generalize better with Dropout as every neuron now has a better set of weights. Normally due to dropout, it will be applied in training, but then “reverse-applied” in testing. Dropout will make the training accuracy go down a bit, but remember in the end it’s testing on real-world data that matters.
There’s a parameter dropout_rate on what percent (from 0 to 1 here) you want each neuron in the layer you are at to become 0. This is essentially the chance of dropping out any given neuron, or usually what percent of neurons will be dropped out. Typical values range from 0.1 to 0.5. Example below.
Let’s say you’ve gotten your models up so far:
>>> model.add(nn.layers.Flatten()) >>> model.add(nn.layers.Dense(128, 64, activation = nn.layers.ReLU()))
And now you want to add dropout. Well just add that layer like such:
>>> dropout_probability = 0.2 >>> model.add(nn.layers.Dropout(dropout_probability))
This will just mean that about 20% of the neurons coming from this first input layer will be dropped out. A higher dropout rate may not always lead to better generalization, but usually will decrease training accuracy.
In dropout remember 2 things, not just one matter. The probability, and the layers its applied at. Experimentation is key.
- class sealion.neural_networks.layers.Dense(input_size: int, output_size: int, activation=None, weight_init='xavier')
This is the class that you will depend on the most. This class is for creating the fully-connected layers that make up Deep Neural Networks - where each neuron in a previous layer is connected to each layer in the next. Feel free to watch a couple of youtube tutorials from 3Blue1Brown (if you like calculus :) or others to get a better understanding of these parameters, or just look at the examples on my github.
The main method of course is the init. You will have to define the input_size, the output_size, and the activation and weight initialization are optional parameters.
In a neural network the number of nodes starting in a layer are known as the input_size here (fan_in in papers), and the number of nodes in the output of a layer is the output_size here (fan_out in papers.) The activation parameter is for the activation/non-linearity function you want your layers to go through. If you don’t understand what I just meant, sorry - but another way to think about it is that its a way to change the outputs of your network a bit for it to fit you dataset a little better (looking at a graph will help.) The default is no activation (some people call that Linear), so you’ll need to add that yourself. I’ll get to weight init in a bit. Examples below.
To add a layer, here’s how it’s done:
>>> from sealion import neural_networks as nn >>> model = nn.models.NeuralNetwork() >>> model.add(nn.layers.Dense(128, 64)) # input_size = 128, output_size = 64
This sets up a neural network with 128 incoming nodes (that means we have 128 numeric features), and then 64 output nodes.
Let’s say we wanted an activation function, like a relu or sigmoid (there are a bunch to choose from this API.) You could add that layer here like such:
>>> model = nn.models.NeuralNetwork() # clear all existing layers >>> model.add(nn.layers.Dense(128, 64, activation=nn.layers.Sigmoid())) # all outputs go through the sigmoid function
Onto weight initalization! The jargon starts …. now. What weight init does is make the weights come from a special distribution (typically Gaussian) with a given standard deviation based on the input and output size. The reason this is done is because you don’t want the weights to be initalized too big or else the gradients in backpropagation may cause the model to go way off and become NaNs or infinity (this is known as exploding gradients.) For neural networks that are small that solve datasets like XOR or even breast cancer this isn’t a problem, but for deep neural networks for problems like MNIST this is a huge concern. The weight_init you will want to use is also dependent on what activation you are using. Most activation functions will do well with Xavier Glorot, so that is set as the default. You can choose to also use He for ReLU, LeakyReLU, ELU, or other variants of the ReLU activation. For SELU, you may choose to use LeCun weight initalization.
The possible choices are:
>>> "xavier" # no activation, logistic/sigmoid, softmax, and tanh >>> "he" # relu + similar variants >>> "lecun" # selu >>> "none" # if you want to do this
To set this you can just do:
>>> model = nn.models.NeuralNetwork() # clear all existing layers >>> model.add(nn.layers.Dense(128, 64, activation=nn.layers.ReLU(), weight_init="he"))
Sorry for so much documentation, but this really is the class you will call the most.
- class sealion.neural_networks.layers.BatchNormalization(input_size: int, momentum=0.9, epsilon=0.001, lr=0.01)
Batch Normalization is a frequently used technique in today’s models (especially in CNNs).
Often times you are told to normalize your data (make it bell-curved shaped) before you feed it to your model - that’s because normalization makes it easier for the model to learn because the loss curve is smoother (thus getting to the minima is easier and faster.) So, why don’t we apply normalization to the inputs of all of the other layers (aside from the input layer)? That’s what batch normalization does.
I got this explanation from CodeEmporium
In order for a given hidden layer to normalize its inputs it needs to know the mean and variance (just standard deviation squared) of the inputs it usually gets. Note that this does change (because the parameters of prior layers change), so the mean and variance is not necessarily the same throughout training. To do this we need to use a moving average to approximate the mean and variance of the given inputs fed to a B.N. layer we would like to normalize.
While the B.N. first uses this mean and variance to normalize its inputs to a standard normal distribution, where the mean is 0 and the standard deviation is 1, it is not guaranteed that this distribution is optimal as inputs for the succeeding layer. So the B.N. layer also learns a mean and variance, called beta and gamma respectively, to adjust the standard normal distribution (it got by normalizing its inputs) before feeding it to the next layer.
Onto the parameters:
input_size: number of features the B.N. layer needs to normalize (just output_size of the Dense layer above)
momentum: how slowly to change the mean and variance that the B.N. layer uses for normalization. A higher momentum means that the mean and the variance the B.N. layer uses will change very slowly, whereas a lower momentum means that the mean and the variance the B.N. layer uses will change very quickly in response to changes in the B.N.’s inputs (and their mean and variance). Default 0.9.
epsilon: tiny value needed in normalization if the variance ever becomes 0 to protect against 0 division errors. Default 0.001.
lr: learning rate for the B.N. layer’s learning of the normalized mean and variance. Default 0.01.
To add Batch Normalization to your model simply do:
>>> import sealion as sl >>> model = sl.neural_networks.models.NeuralNetwork() >>> model.add(...) # add whatever Dense Layers >>> model.add(sl.neural_networks.layers.BatchNormalization(input_size = 5, momentum = 0.9, lr = 0.01))
Note you may want to experiment on whether to place a Batch Normalization layer after an activation or before, etc. I don’t know too much about this, but handsonml said this so I might as well too.
Activations
- class sealion.neural_networks.layers.Tanh
Uses the tanh activation, which squishes values from -1 to 1.
- class sealion.neural_networks.layers.Sigmoid
Uses the sigmoid activation, which squishes values from 0 to 1.
- class sealion.neural_networks.layers.Swish
Uses the swish activation, which is sort of like ReLU and sigmoid combined. It’s really just
f(x) = x * sigmoid(x)
. Not as used as other activation functions, but give it a try!
- class sealion.neural_networks.layers.ReLU
The most well known activation function and the pseudo-default almost. All it does is turn negative values to 0 and keep the rest. Basically
f(x) = max(x, 0)
- class sealion.neural_networks.layers.LeakyReLU(leak=0.01)
Variant of the ReLU activation, just allows negatives values to be something more like 0.01 instead of 0, which means the neuron is “dead” as
0 * anything
is 0.The leak is the slope of how low negative values you are willing to tolerate. Usually set from 0.001 to 0.2, but the default of 0.01 works usually quite well.
- class sealion.neural_networks.layers.ELU(alpha=1)
Solves the similar dying activation problem. The default of 1 for alpha works quite well in practice, so you won’t need to change it much.
- class sealion.neural_networks.layers.SELU
Special type of activation function, that will “self-normalize” (have a mean of 0, and a standard deviation of 1) its outputs. This self-normalization typically leads to faster convergence.
If you are using this activation function make sure weight_init is set = “lecun” in whichever layer applied. It also need its inputs (in the beginning and all throughout) to be standardized (mu = 0, sigma = 1) for it to work, so make sure to get that taken care of. You can do that by standardizing your inputs and then always using SELU activation function (remember the “lecun” part!).
- class sealion.neural_networks.layers.PReLU(lr=0.0001, momentum=0.9)
The PReLU activation function is essentially the same thing as the LeakyReLU activation, except that the “leak” parameter is learnt during training. To learn this parameter, gradient descent is used - so you have a learning rate and momentum parameter. Both are on a scale of 0 to 1. We initialize this “leak” parameter to 0.25 at the first iteration.
- class sealion.neural_networks.layers.Softmax
Softmax activation function, used for multi-class (2+) classification problems. Make sure to use crossentropy with softmax, and it is only meant for the last layer!
Losses
- class sealion.neural_networks.loss.Loss
Base loss class.
- class sealion.neural_networks.loss.MSE
MSE stands for mean-squared error, and its the loss you’ll want to use for regression. To set it in the model.finalize() method just do:
>>> from sealion import neural_networks as nn >>> model = nn.models.NeuralNetwork(layers_list) >>> model.finalize(loss=nn.loss.MSE(), optimizer=...)
and you’re all set!
- class sealion.neural_networks.loss.CrossEntropy
This loss function is for classification problems. I know there’s a binary log loss and then a multi-category cross entropy loss function for classification, but they’re essentially the same thing so I thought using one class would make it easier. Remember to use one-hot encoded data for this to work (check out utils).
If you are using this loss function, make sure your last layer is Softmax and vice versa. Otherwise, annoying error messages will occur.
To set this in the
model.finalize()
method do:>>> from sealion import neural_networks as nn >>> model = nn.models.NeuralNetwork() >>> # ... add the layers ... >>> model.add(nn.layers.Softmax()) # last layer has to be softmax >>> model.finalize(loss=nn.loss.CrossEntropy(), optimizer=...)
and that’s all there is to it.
Optimizers
- class sealion.neural_networks.optimizers.Optimizer
Base optimizer class. All optimizers extend from this.
- class sealion.neural_networks.optimizers.GD(lr=0.001, clip_threshold=inf)
The simplest optimizer - you will quickly outgrow it. All you need to understand here is that the learning rate is just how fast you want the model to learn (default 0.001) set typically from 1e-6 to 0.1. A higher learning rate may mean the model will struggle to learn, but a slower learning rate may mean the model will but will also have to take more time. It is probably the most important hyperparameter in all of today’s machine learning.
The clip_threshold is simply a value that states if the gradients is higher than this value, set it to this. This is to prevent too big gradients, which makes training harder. The default for all these optimizers is infinity, which just means no clipping - but feel free to change that. You’ll have to experiment quite a bit to find a good value. This method is known as gradient clipping.
In the
model.finalize()
method:>>> model.finalize(loss=..., optimizer=nn.optimizers.GD(lr=0.5, clip_threshold=5)) # here the learning >>> # learning rate is 0.5 and the threshold for clipping is 5.
- class sealion.neural_networks.optimizers.Momentum(lr=0.001, momentum=0.9, nesterov=False, clip_threshold=inf)
If you are unfamiliar with gradient descent, please read the docs on that in GD class for this to hopefully make more sense.
Momentum optimization is the exact same thing except with a little changes. All it does is accumulate the past gradients, and go in that direction. This means that as it makes updates it gains momentum and the gradient updates become bigger and bigger (hopefully in the right direction.) Of course this will be uncontrolled on its own, so a momentum parameter (default 0.9) exists so the previous gradients sum don’t become too big. There is also a nesterov parameter (default false, but set that true!) which sees how the loss landscape will be in the future, and makes its decisions based off of that.
An example:
>>> momentum = nn.optimizers.Momentum(lr=0.02, momentum=0.3, nesterov=True) # learning rate is 0.2, momentum at 0.3, and we have nesterov! >>> model.finalize(loss=..., optimizer=momentum)
There’s also a clip_threshold argument which you implements gradient clipping, an explanation you can find in the GD() class’s documentation.
Usually though this works really good with SGD…
- class sealion.neural_networks.optimizers.SGD(lr=0.001, momentum=0.0, nesterov=False, clip_threshold=inf)
SGD stands for Stochastic gradient descent, which means that it calculates its gradients on random (stochastic) picked samples and their predictions. The reason it does this is because calculating the gradients on the whole dataset can take a really long time. However ML is a tradeoff and the one here is that calculating gradients on just a few samples means that if those samples are all outliers it can respond poorly, so SGD will train faster but not get as high an accuracy as Gradient Descent on its own.
Fortunately though, there are work arounds. Implementing momentum and nesterov with SGD means you get faster training and also the convergence is great as now the model can go in the right direction and generalize instead of overreact to hyperspecific training outliers. By default nesterov is set to False and there is no momentum (set to 0.0), so please change that as you please.
To use this optimizer, just do:
>>> model.finalize(loss=..., optimizer=nn.optimizers.SGD(lr=0.2, momentum=0.5, nesterov=True, clip_threshold=50))
Here we implemented SGD optimization with a learning rate of 0.2, a momentum of 0.5 with nesterov’s accelerated gradient, and also gradient clipping at 50.
- class sealion.neural_networks.optimizers.AdaGrad(lr=0.001, nesterov=False, clip_threshold=inf, e=1e-10)
Slightly more advanced optimizer, an understanding of momentum will be invaluable here. AdaGrad and a whole plethora of optimizers use adaptive gradients, or an adaptive learning rate. This just means that it will assess the landscape of the cost function, and if it is steep it will slow it down, and if it is flatter it will accelerate. This is a huge deal for avoiding gradient descent from just going into a steep slope that leads to a local minima and being stuck, or gradient descent being stuck on a saddle point.
The only new parameter is e, or this incredibly small value that is meant to prevent division by zero. It’s set to 1e-10 by default, and you probably won’t ever need to think about it.
As an example:
>>> model.finalize(loss=..., optimizer=nn.optimizers.AdaGrad(lr=0.5, nesterov=True, clip_threshold=5))
AdaGrad is not used in practice much as often times it stops before reaching the global minima due to the gradients being too small to make a difference, but we have it anyways for your enjoyment. Better optimizers await!
- class sealion.neural_networks.optimizers.RMSProp(lr=0.001, beta=0.9, nesterov=False, clip_threshold=inf, e=1e-10)
RMSprop is a widely known and used algorithm for deep neural network. All it does is solve the problem AdaGrad has of stopping too early by not scaling down the gradients so much. It does through a beta parameter, which is set to 0.9 (does quite well in practice.) A higher beta means that past gradients are more important, and a lower one means current gradients are to be valued more.
An example:
>>> model.finalize(loss=..., optimizer=nn.optimizers.RMSprop(nesterov=True, beta=0.9))
Of course there is the nesterov, clipping threshold, and
e
parameter all for you to tune.
- class sealion.neural_networks.optimizers.Adam(lr=0.001, beta1=0.9, beta2=0.999, nesterov=False, clip_threshold=inf, e=1e-10)
Most popularly used optimizer, typically considered the default just like ReLU for activation functions. Combines the ideas of RMSprop and momentum together, meaning that it will adapt to different landscapes but move in a faster direction. The beta1 parameter (default 0.9) controls the momentum optimization, and the beta2 parameter (default 0.999) controls the adaptive learning rate parameter here. Once again higher betas mean the past gradients are more important.
Often times you won’t know what works best - so hyperparameter tune.
As an example:
>>> model.finalize(loss=..., optimizer=nn.optimizers.Adam(lr=0.1, beta1=0.5, beta2=0.5))
Adaptive Gradients may not always work as good as SGD or Nesterov + Momentum optimization. For MNIST, I have tried both and there’s barely a difference. If you are using Adam optimization and it isn’t working maybe try using nesterov with momentum instead.
- class sealion.neural_networks.optimizers.Nadam(lr=0.001, beta1=0.9, beta2=0.999, clip_threshold=inf, e=1e-10)
Nadam optimization is the same thing as Adam, except there’s nesterov updating. Basically this class is the same as the Adam class, except there is no nesterov parameter (default true.)
As an example:
>>> model.finalize(loss=..., optimizer=nn.optimizers.Nadam(lr=0.1, beta1=0.5, beta2=0.5))
- class sealion.neural_networks.optimizers.AdaBelief(lr=0.001, beta1=0.9, beta2=0.999, nesterov=False, clip_threshold=inf, e=1e-10)
AdaBelief is a recently popular optimizer that I thought to implement after checking in with the original paper. The way it works is by establishing the general “belief” (prediction) in where the gradient will go. If the belief and the actual given gradient are very similar, there will be a large change in the weights (larger gradients), whereas if they are very different, there will be a small change in the weights. This has worked well in practice, and its results are comparable to Adam.
For those interested in the original paper, go here : https://arxiv.org/pdf/2010.07468.pdf
Decision Trees
- class sealion.decision_trees.DecisionTree(max_branches=inf, min_samples=1)
Decision Trees are powerful algorithms that create a tree by looking at the data and finding what questions are best to ask? For example if you are trying to predict whether somebody has cancer it may ask, “Do they have cells with a
size >= 5
?” or “Is there any history of smoking or drinking in the family?” These algorithms can easily fit most datasets, and are very well capable of overfitting (luckily we offer some parameters to help with that.)If you are going to be using categorical features with this Decision Tree, make sure to one hot encode it. You can do that with the
one_hot()
function in the utils module.This Decision Tree class doesn’t support regression, or any labels that have continuous values. It is only for classification tasks. The reason this is such is because most curves and lines can be formed or modeled better with regression algorithms (Linear, Polynomial, Ridge, etc. are all available in the regression module) or neural networks. Decision Trees will typically overfit on such tasks.
- __init__(max_branches=inf, min_samples=1)
- Parameters
max_branches – maximum number of branches for the decision tree
min_samples – minimum number of samples in a branch for the branch to split
- average_branches()
- Returns
an estimate of how many branches the decision tree has right now with its current parameters.
- evaluate(x_test, y_test)
- Parameters
x_test – 2D testing data.
y_test – 1D testing labels.
- Returns
accuracy score.
- fit(x_train, y_train)
- Parameters
x_train – training data - 2D (make sure to one_hot categorical features)
y_train – training labels
- Returns
- give_tree(tree)
- Parameters
tree – a tree made by you or given by give_best_tree in the RandomForest module
- Returns
None, just that the tree you give is now the tree used by the decision tree.
- predict(x_test)
- Parameters
x_test – 2D prediction data.
- Returns
predictions in 1D vector/list.
- return_tree()
- Returns
the tree inside (if you want to look at it)
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions given by model
y_test – actual labels
- Returns
an image of the predictions and the labels.
Ensemble Learning
- class sealion.ensemble_learning.RandomForest(num_classifiers=20, max_branches=inf, min_samples=1, replacement=True, min_data=None)
Random Forests may seem intimidating but they are super simple. They are just a bunch of Decision Trees that are trained on different sets of the data. You give us the data, and we will create those different sets. You may choose for us to sample data with replacement or without, either way that’s up to you. Keep in mind that because this is a bunch of Decision Trees, classification is only supported (avoid using decision trees for regression - it’s range of predictions is limited.) The random forest will have each of its decision trees predict on data and just choose the most common prediction (not the average.)
Enjoy this module - it’s one of our best.
- __init__(num_classifiers=20, max_branches=inf, min_samples=1, replacement=True, min_data=None)
- Parameters
num_classifiers – Number of decision trees you want created.
max_branches – Maximum number of branches each Decision Tree can have.
min_samples – Minimum number of samples for a branch in any decision tree (in the forest) to split.
replacement – Whether or not any of the data points in different chunks/sets of data can overlap.
min_data – Minimum number of data there can be in any given data chunk. Each classifier is trained on a chunk of data, and if you want to make sure each chunk has 3 points for example you can set min_data = 3. It’s default is 50% of the amount of data, the None is just a placeholder.
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
accuracy score
- fit(x_train, y_train)
- Parameters
x_train – 2D training data
y_train – 1D training labels
- Returns
- give_best_tree(x_test, y_test)
You give it the data and the labels, and it will find the tree in the forest that does the best. Then it will return that tree. You can then take that tree and put it into the DecisionTree class using the give_tree method.
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
tree that performs the best (dictionary data type)
- predict(x_test)
- Parameters
x_test – testing data (2D)
- Returns
Predictions in 1D vector/list.
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
- class sealion.ensemble_learning.EnsembleClassifier(predictors, classification=True)
Aside from random forests, voting/ensemble classifiers are also another popular way of ensemble learning. How it works is by training multiple different classifiers (you choose!) and predicting the most common class (or the average for regression - more on that later.) Pretty simple actually, and works quite effectively. This module also can tell you the best classifier in a group with its get_best_predictor(), so that could be useful. Similar to
give_best_tree()
in the random forest module, what it does is give the class of the algorithm that did the best on the data you gave it. This can also be used for rapid hypertuning on the exact same module (giving the same class but with different parameters in the init.)Example:
>>> from sealion.regression import SoftmaxRegression >>> from sealion.naive_bayes import GaussianNaiveBayes >>> from sealion.nearest_neighbors import KNearestNeighbors >>> ec = EnsembleClassifier({'algo1': SoftmaxRegression(num_classes=3), 'algo2': GaussianNaiveBayes(), 'algo3': KNearestNeighbors()}, ... classification=True) >>> ec.fit(X_train, y_train) >>> y_pred = ec.predict(X_test) # predict >>> ec.evaluate_all_predictors(X_test, y_test) algo1 : 95% algo2 : 90% algo3 : 75% >>> best_predictor = ec.get_best_predictor(X_test, y_test) # get the best predictor >>> print(best_predictor) # is it Softmax Regression, Gaussian Naive Bayes, or KNearestNeighbors that did the best? <regression.SoftmaxRegression object at 0xsomethingsomething> >>> y_pred = best_predictor.predict(X_test) # looks like softmax regression, let's use it
Here we first important all the algorithms we are going to be using from their respective modules. Then we create an ensemble classifier by passing in a dictionary where each key stores the name, and each value stores the algorithm.
Classification = True
by default, so we didn’t need to put that (if you want regression put it to False. A good way to rememberclassification = True
is the default is that this is an EnsembleCLASSIFIER.)We then fitted that and got it’s predictions. We saw how well each predictor did (that’s where the names come in) through the
evaluate_all_predictors()
method. We could then get the best predictor and use that class. Note that this class will ONLY use algorithms other than neural networks, which should be plenty. This is because neural networks have a differentevaluate()
method and typically will be more random in performance than other algorithms.I hope that example cleared anything up. The
fit()
method trains in parallel (thanks joblib!) so it’s pretty fast. As usual, enjoy this algorithm!- __init__(predictors, classification=True)
- Parameters
predictors – dict of
{name (string): algorithm (class)}
. See example above.classification – is it a classification or regression task? default classification - if regression set this to False.
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
accuracy score
- evaluate_all_predictors(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
None, just prints out the name of each algorithm in the predictors dict fed to the __init__ and its score on the data given.
- fit(x_train, y_train)
- Parameters
x_train – 2D training data
y_train – 1D training labels
- Returns
- get_best_predictor(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
the class of the algorithm that did best on the given data. look at the above example if this doesn’t make sense.
- predict(x_test)
- Parameters
x_test – testing data (2D)
- Returns
Predictions in 1D vector/list.
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions from the predict() method
y_test – labels for the data
- Returns
a matplotlib image of the predictions and the labels (“correct answers”) for you to see how well the model did.
Mixtures
- class sealion.mixtures.GaussianMixture(n_clusters=5, retries=3, max_iters=200, kmeans_init=True)
Gaussian Mixture Models are really just a fancier extension of KMeans. Make sure you know KMeans before you read this. If you are unfamiliar with KMeans, feel free to look at the examples on GitHub for unsupervised clustering or look at the unsupervised clustering documentation (which contains the KMeans docs.) You may also want to know what a gaussian (normal) distribution is.
In KMeans, you make the assumption that your data is in spherical clusters, all of which are the same shape. What if your data is instead in such circles, but also maybe ovals that are thin, tall, etc. Basically what if your clusters are spherical-esque but of different shapes and sizes.
This difference can be measured by the standard deviation, or variance, of each of the clusters. You can think of each cluster as a gaussian distribution, each with a different mean (similiar to the centroids in KMeans) and standard deviation (which effects how skinny/wide it is.) This model is called a mixture, because it essentially is just a mixture of gaussian functions. Some of the clusters maybe bigger than others (aside from shape), and this is also taken into account with a mixture weight - a coefficient that is multiplied to each gaussian distribution.
With this gaussian distribution, you can then take any points and assign them to the cluster they have the highest change of being in. Because this is probabilities, you can say that a given data point has a 70% chance of being in this class, 20% this class, 5% this class, and 5% this other class - which is something you cannot do with KMeans. The parameters of the gaussian distribution are learnt using an algorithm known as Expectation Maximization, which you may want to learn about (super cool)!
You can also do anomaly detection with Gaussian Mixture Models! You do this by looking at the probability each data point belonged to the cluster it had the highest chance of being in. We can call this list of a probability that a data point belonged to the cluster it was chosen to for all data points “confidences”. If a given data points confidence is in the lowest _blank_ (you set this) percent of confidences, it’s marked as an anomaly. This _blank_ is from 1 to 100, and is essentially what percent of your data you think are outliers.
To find the best value for n_clusters, we also have a visualize_elbow_curve method to do that for you. Check the docs below if you’re interested!
That’s a lot, so let’s take a look at its methods!
NOTE: X SHOULD BE 2D FOR ALL METHODS
- __init__(n_clusters=5, retries=3, max_iters=200, kmeans_init=True)
- Parameters
n_clusters – number of clusters assumed in the data
retries – number of times the algorithm will be tried, and then the best solution picked
- Max_iters
maximum number of iterations that the algorithm will be ran for each retry
- Kmeans_init
whether the centroids found by using KMeans should be used to initialize the means in this Gaussian Mixture. This will only work if your data is in more than one dimension. If you are using a lot of retries, this may not be a good option as it may lead to the same solution over and over again.
- aic()
Returns the AIC metric (lower is better.)
- anomaly_detect(X, threshold)
- Parameters
X – prediction data
threshold – what percent of the data you believe is an outlier
- Returns
whether each data point in X is an anomaly based on whether its confidence in the cluster it was assigned to is in the lowest threshold percent of all of the confidences.
- bic()
Returns the BIC metric (lower is better.)
- confidence_samples(X)
This method essentially gives the highest probability in the rows of probabilities in the matrix that is the output of the
soft_predict()
method.Translation:
It’s telling you the probability each data point had in the cluster it had the largest probability in (and ultimately got assigned to), which is really telling you how confident it is that a given data point belongs to the data.
- Parameters
X – prediction data
- fit(X)
this method finds the parameters needed for Gaussian Mixtures to function
:param X : training data
- predict(X)
this method returns the cluster each data point in X belongs to
:param X : prediction data
- soft_predict(X)
this method returns the probability each data point belonged to each cluster. This is stored in a matrix with the length of X rows and the width of the amount of clusters.
:param X : prediction data
- visualize_clustering(color_dict)
This method will not work for data that has only 1 dimension (univariate.) It will plot the data you just gave in the fit() method.
- Parameters
color_dict – parameter of the label a cluster was assigned to to its color (must be matplotlib compatible) The color dict could be
{0 : "green", 1 : "blue", 2 : "red"}
for example.
- visualize_elbow_curve(min_n_clusters=2, max_n_clusters=5)
This method tries different values for n_cluster, from min_n_cluster to max_n_cluster, and then plots their AIC and BIC metrics. Finding the n_cluster that leads to the “elbow” is probably the optimal n_cluster value.
- Parameters
min_n_clusters – the minimum value of n_clusters to be tried
max_n_clusters – the max value of n_clusters to be tried
Nearest Neighbors
- class sealion.nearest_neighbors.KNearestNeighbors(k=5, regression=False)
Arguably the easiest machine learning algorithm to make and understand. Simply looks at the k closest points (you give these points) for a data point you want to predict on, and if the majority of the closest k points are class X, it will predict back class X. The k number is decided by you, and should be odd (what happens if there’s a tie?). If used for regression (we support that), it will just take the average of all of their values. If you are going to use regression, make sure to set regression = True - otherwise you will get a very low score in the evaluate() method as it will assume it’s for classification (KNNs typically are.)
A great introduction into ML - maybe consider using this and then seeing if you can beat it with your own version from scratch.If you can - please send it on GitHub!
Other than that, enjoy!
- __init__(k=5, regression=False)
- Parameters
k – number of points for a prediction used in the algorithm
regression – are you using regression or classification (if classification - do nothing)
- evaluate(x_test, y_test)
- Parameters
x_test – testing data (2D)
y_test – testing labels (1D)
- Returns
accuracy score (r^2 score if regression = True)
- fit(x_train, y_train)
- Parameters
x_train – 2D training data
y_train – 1D training labels
- Returns
- predict(x_test)
- Parameters
x_test – 2D prediction data
- Returns
predictions in 1D vector/list
- visualize_evaluation(y_pred, y_test)
- Parameters
y_pred – predictions given by model, 1D vector/list
y_test – actual labels, 1D vector/list
Visualize the predictions and labels to see where the model is doing well and struggling.
Utilities
- sealion.utils.confusion_matrix(y_pred, y_test, plot=True)
Confusion matrices are an often used tool for seeing how well your model did and where it can improve.
A confusion matrix is just a matrix with the number of a certain class classified as another. So if your classifier predicted
[0, 1]
and the correct answers were[1, 0]
- then you would get 1 zero predicted as a 1, 1 one predicted as a 0, and no 0s predicted as 0s and 1s predicted as 1s. By default this method will plot the confusion matrix, but if you don’t want it just set it to False.Some warnings - make sure that y_pred and y_test are both 1D and start at 0. Also make sure that all predictions in y_pred exist in y_test. This will probably not be a big deal with traditional datasets for machine learning.
To really understand this function try it out - it’ll become clear with the visualization.
- Parameters
y_pred – predictions (1D)
y_test – labels (1D)
plot – whether or not to plot, default True
- Returns
the matrix, and show a visualization of the confusion matrix (if plot = True)
- sealion.utils.one_hot(indices, depth)
So you’ve got a feature in a data where a certain value represents a certain category. For example it could be that 1 represents sunny, 2 represents rainy, and 0 represents cloudy. Well what’s the problem? If you feed this to your model - it’s going to think that 2 and 1 are similar, because they are just 1 apart - despite the fact that they are really just categories. To fix that you can feed in your features - say it’s a list like
[2, 2, 1, 0, 1, 0]
and it will be one hot encoded to whatever depth you please.Here’s an example (it’ll make it very clear):
>>> features_weather = [1, 2, 2, 2, 1, 0] >>> one_hot_features = one_hot(features_weather, depth = 3) #depth is three here because you have 3 categories - rainy, sunny, cloudy >>> one_hot_features [[0, 1, 0], #1 is at first index [0, 0, 1], #2 is at second index [0, 0, 1], #2 is at second index [0, 0, 1], #2 is at second index [0, 1, 0], #1 is at first index [1, 0, 0]] #0 is at 0 index
That looks like our data features, one hot encoded at whatever index. Make sure to set the depth param correctly.
For these such things, play around - it will help.
- Parameters
indices – features_weather or something similar as shown above
depth – How many categories you have
- Returns
one-hotted features
- sealion.utils.revert_one_hot(one_hot_data)
Say from the
one_hot()
data you’ve gotten something like this :[[0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]
and you want to change it back to its original form. Use this function - it will turn that one-hotted data above to
[2, 0, 1, 2, 0]
- which is just the index of the one in each data point of the list. Hence it is reverting the one_hot transformation.- Parameters
one_hot_data – data in one_hot form. Must be a numpy array (2D)!
- Returns
index of the one for each of the data points in a 1D numpy array
- sealion.utils.revert_softmax(softmax_data)
Say from the softmax function (in neural networks) you’ve gotten something like this :
[[0.2, 0.3, 0.5], [0.8, 0.1, 0.1], [0.15, 0.8, 0.05], [0.3, 0.3, 0.4], [0.7, 0.15, 0.15]]
and you want to change it back to its pre-softmax form. Use this function - it will turn that softmax-ed data above to
[2, 0, 1, 2, 0]
- which is just the index of the one in each data point of the list. Hence it is reverting the softmax transformation.- Parameters
softmax_data – data in one_hot form. Must be a numpy array (2D)!
- Returns
index of the one for each of the data points in a 1D numpy array
- sealion.utils.precision(tp, fp, tn, fn)
Precision is simply a measure of how much of the data we said are positive are actually positive. Used mostly for binary classification.
- Parameters
tp – number of true positives
fp – number of false positives
tn – number of true negatives
fn – number of false negatives
- Returns
precision metric
- sealion.utils.recall(tp, fp, tn, fn)
Recall is a measure of how much of the data that is actually positive was classified to be positive. There’s a tradeoff between precision and recall, because as you decrease precision by predicting less positives, you increase recall by predicting more negatives (that are really positive - this is known as a false negative.)
- Parameters
tp – number of true positives
fp – number of false positives
tn – number of true negatives
fn – number of false negatives
- Returns
recall metric
- sealion.utils.f1_score(precision, recall)
The f1_score is harmonic mean between the precision and recall scores. It is used to assemble them into one score. The harmonic mean is used often times to combine information about different measures, even if they are in different units.
:param precision : precision score :param recall : recall score :return: f1_score