Scheduling Automatic Model Training

This guide covers creating a pipeline to automatically fetch new training data and re-train our models.

Problem Statement

  1. Update Training data from remote source every 30 minutes

  2. Retrain the predictive model on the updated training data after every update

Step 1: Writing a Reducer to Pull New Data

To learn more about creating reducers and making custom reducers, please read the reducers documentation and simple model creation guide.

Terrene supports the requests python package, assuming you have an API endpoint that returns new training data, you can create a reducer with the following code:

import pandas
import requests
content = pandas.read_content(content) # to read the existing data inside the reudcer
res = requests.get('("https://my-api-endpoint').").json()
new_data = pandas.DataFrame(res['["data'])"])
# save the new data by adding to old data
content = content.append(new_data)

Step 2: Create a Pipeline

Click on New Pipeline on the left menu to create a new pipeline.

Once nagviated to the pipeline page, on the right hand side under Insert a New Block section:

  1. Select your training data TFrame

  2. Select the reducer you created in step 1

  3. Select Override Existing Data

  4. Set the context to "{}" (empty context, for more information view the reducers documentation)

  5. Click on Insert Block

Now insert another block by doing the following:

  1. Select your predictive model

  2. Select Model Training as your resource action

  3. Select your training data TFrame

  4. For model trainer, select the first one that is suggested to you

  5. Click on Insert Block

Now to test your pipeline, click on Run Pipeline on the bottom menu to make sure it is working. To schedule a reccuring run of your pipeline, click on Schedule New Job. Select the interval you want the job to reccur at and click on Create.

Next Steps

Now that you know how to create recurring pipelines, we suggest taking a look at the guide on invoking pipelines programmatically.