Reducers

This page covers technical details about implementing your own reducers

Reducers are hosted functions on Terrene that can be used to load, transform, and extract your data. Reducers have the potential to run any code that can be executed within a Docker container. They can also be chained together to do for example do the following:

  1. Load the data using Python

  2. Manipulate the data using Java

  3. Further manipulate the data using C++

  4. Upload the data to a webserver using Python

Currently only the Python Kernel is supported but we are working on adding more kernels in the future. If you have any specific kernels other than Python 3 that you would like to see on Terrene, please let us know.

Supported Packages (Python3 Kernel)

  • numpy

  • scipy

  • requests

  • joblib

  • terrene

  • pandas

  • scipy

The above list is by no means definite and we are planning to add more packages to the default kernel as time goes on. We also are planning on allowing users to submit their own python Docker files in the future.

Concepts

content Variable

  • If a reducer is invoked with a file or a TFrame, the path to the file or the raw data of the TFrame will be passed to the reducer as a global variable named content.

  • If both a file and a TFrame are supplied to the reducer, then content variable will be the path to the file.

For example to write a reducer that takes a csv file and parses it do the following:

import pandas
content = pandas.read_csv(content)

When passing a TFrame to a reducer, Terrene will first convert the TFrame to a CSV file and will upload it to the reducer. For example to write a reducer that takes a TFrame and adds a new column, you can do the following:

import pandas
content = pandas.read_csv(content)
content['new_col'] = content['old_col'] * 2

Reducer Output

To output something from the reducer, assign it to the content variable. Only pandas dataframes are valid outputs of a reducer.

Execution Context

Reducers accept a context variable whenever invoked. These context variables will be accessible as environment variables on runtime and can be accessed by doing the following:

import os
my_context = os.environ.get('CONTEXT_VARIABLE_NAME')