Saving & Training Multiple Models

Subscribe to Tech with Tim

YouTube

In this tutorial we will be creating multiple models and saving ones that generate the best scores. We will also be plotting our data points to see a graphical representation of data correlation.

Installing Packages

In this tutorial we will need to install one more module called matplotlib. We will also be using the module pickle that does not need to be installed.

Similarly to before we will activate our environment using activate "environment name" and then type pip install matplotlib in the command prompt.

Importing Modules

We need to import pickle and matplotlib as well as all of the previous modules before starting.

#Import Libraries
import numpy as np
import pandas as pd
from sklearn import linear_model
import sklearn
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from matplotlib import style
import pickle

Saving Our Model

To save our model we will write to a new file using pickle.dump().

with open("studentgrades.pickle", "wb") as f:
    pickle.dump(linear, f)

# linear is the name of the model we created in the last tutorial
# it should be defined above this

Loading Our Model

Once we've saved our model we can load it in using the following two lines. Now you can remove the code that creates and trains our model as we are simply loading in an existing one from our pickle file.

pickle_in = open("studentgrades.pickle", "rb")
linear = pickle.load(pickle_in)

# Now we can use linear to predict grades like before

Training Multiple Models

You may have noticed that our models vary in accuracy. This is because when we split the data into training and testing data it is divided differently each time. Since our model trains very quickly it may be worth training multiple models and saving the best one. We can do this in the following way.

# TRAIN MODEL MULTIPLE TIMES FOR BEST SCORE
best = 0
for _ in range(20):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

    linear = linear_model.LinearRegression()

    linear.fit(x_train, y_train)
    acc = linear.score(x_test, y_test)
    print("Accuracy: " + str(acc))
    
    # If the current model has a better score than one we've already trained then save it
    if acc > best:
        best = acc
        with open("studentgrades.pickle", "wb") as f:
            pickle.dump(linear, f)

Plotting Our Data

To get a visual representation of our data we can plot it using the matplotlib library we installed earlier. We are going to use a scatter plot to visualize our data.

# Drawing and plotting model
plot = "failures" # Change this to G1, G2, studytime or absences to see other graphs
plt.scatter(data[plot], data["G3"]) 
plt.legend(loc=4)
plt.xlabel(plot)
plt.ylabel("Final Grade")
plt.show()

Full Code

#Import Library
import numpy as np
import pandas as pd
from sklearn import linear_model
import sklearn
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from matplotlib import style
import pickle

style.use("ggplot")

data = pd.read_csv("student-mat.csv", sep=";")

predict = "G3"

data = data[["G1", "G2", "absences","failures", "studytime","G3"]]
data = shuffle(data) # Optional - shuffle the data

x = np.array(data.drop([predict], 1))
y =np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)


# TRAIN MODEL MULTIPLE TIMES FOR BEST SCORE
best = 0
for _ in range(20):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

    linear = linear_model.LinearRegression()

    linear.fit(x_train, y_train)
    acc = linear.score(x_test, y_test)
    print("Accuracy: " + str(acc))

    if acc > best:
        best = acc
        with open("studentgrades.pickle", "wb") as f:
            pickle.dump(linear, f)

# LOAD MODEL
pickle_in = open("studentgrades.pickle", "rb")
linear = pickle.load(pickle_in)


print("-------------------------")
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
print("-------------------------")

predicted= linear.predict(x_test)
for x in range(len(predicted)):
    print(predicted[x], x_test[x], y_test[x])


# Drawing and plotting model
plot = "failures"
plt.scatter(data[plot], data["G3"])
plt.legend(loc=4)
plt.xlabel(plot)
plt.ylabel("Final Grade")
plt.show()

Introduction

Linear Regression P.1

Linear Regression P.2

Saving & Training Multiple Models

KNN P.1 – Irregular Data

KNN P.2 – How Does it Work?

KNN P.3 – Implementation

SVM P.1 – Loading Sklearn Datasets

SVM P.2 – How Does A SVM Work?

SVM P.3 – Implementation

K Means Clustering P.1 – How it Works

K Means Clustering P.2 – Implementing K Means

Next lessonKNN P.1 – Irregular Data

Design & Development by