Creating a Simple Model

This guide covers creating a simple model with Terrene using the Titanic dataset.

Step 1: Acquire Training Data

For this guide we will be using the Titanic dataset from Kaggle. This dataset contains a list of passengers who were on the Titanic and some information about them such as which cabin they were staying in, how old they were, if they had a sibling with them on board, etc. and wether or not they survived the titanic.

Our goal for this tutorial is creating a model that can predict wether or not a hypothetical passenger would have survived the titanic had they been on it based on the training data.

Step 2: Create a Reducer

First, login to your account and on the menu on the left hand side, click on New Reducer

After creating a new reducer, import the code to parse the training data. In this case because it is a simple CSV file, all we have to do is write the following:

import pandas
content = pandas.read_csv(content)

To learn more about writing custom reducers, please read the reducers part of the documentation

Now test to see if your reducer is working properly by uploading the test training data you downloaded in step 1:

Once you click on test run, a pane should open on the left hand side displaying a chunk of your parsed data:

Step 3: Create a TFrame

To train our model, we need to first store it in a TFrame. To do so, click on New TFrame on the left hand menu. After creating a TFrame, click on the upload icon on the top menu of the TFrame:

Then select the training data and the reducer you just created and click on Upload Data

Your parsed data now should be displayed in the TFrame:

Step 4: Create a Predictive Model

To create a predictive model, select Survived column on the TFrame. This will mark Survived as a target variable for your model. The target variable is what your model will try to predict.

Your target variable has to be a numerical value. If you are trying to predict a categorial value, please convert it to a numerical value in the reducer code.

Terrene will automatically select the remaining variables as your input variables. It will automatically encode non-numerical input variables and will also do some feature engineering on the data.

After the pane opens up on the left, unselect "Unnamed: 0", "PassengerId", "Name", "Fare", and "Ticket" from input variables because these variables are mostly unique and we won't have them to supply to the model when trying to make predictions to the model.

This will create your model. Once redirected to the model page, scroll down to the Train Model section and select your TFrame to train your model with:

After selecting a TFrame, Terrene will make some suggestions about what algorithm to use. Pick the first one then click on Start Training

Terrene will then start training your model and will show the progress in real-time. Once the training stops, your model is trained. To test your model, click on Populate Demo Data then Predict to make some predictions.

Next Steps

Now that you know how to make predictive models with Terrene, we suggest checking out the full predictive models documentation to learn about probabilistic models, anomaly detection, and other model types that Terrene has.