Creating a Simple Pipeline

This guide covers creating a pipeline to automatically fetch new training data and re-train our models.

Problem Statement

  1. Update Training data from remote source every 30 minutes

  2. Retrain the predictive model on the updated training data after every update

Step 1: Writing a Reducer to Pull New Data

To learn more about creating reducers and making custom reducers, please read the reducers documentation and simple model creation guide.

Terrene supports the requests python package, assuming you have an API endpoint that returns new training data, you can create a reducer with the following code:

import pandas
import requests
content = pandas.read_content(content) # to read the existing data inside the reudcer
res = requests.get("https://my-api-endpoint").json()
new_data = pandas.DataFrame(res["data"])
# save the new data by adding to old data
content = content.append(new_data)

Step 2: Create a Pipeline

Click on New Pipeline on the left menu to create a new pipeline.

Once navigated to the pipeline page, on the right hand side under Insert a New Block section:

  1. Select your training data TFrame

  2. Select the reducer you created in step 1

  3. Set the context to "{}" (empty context, for more information view the reducers documentation)

  4. Click on Insert Task

Now insert another block by doing the following:

  1. Select your predictive model

  2. Select Train Model as your resource action

  3. Select your training data TFrame

  4. For model trainer, select the first one that is suggested to you

  5. Click on Insert Task

Now to test your pipeline, click on Run Pipeline on the bottom menu to make sure it is working.