Reducers are hosted functions on Terrene that can be used to load, transform, and extract your data. Reducers have the potential to run any code that can be executed within a Docker container. They can also be chained together to do for example do the following:
Load the data using Python
Manipulate the data using Java
Further manipulate the data using C++
Upload the data to a webserver using Python
Currently only the Python Kernel is supported but we are working on adding more kernels in the future. If you have any specific kernels other than Python 3 that you would like to see on Terrene, please let us know.
numpy
scipy
requests
joblib
terrene
pandas
scipy
The above list is by no means definite and we are planning to add more packages to the default kernel as time goes on. We also are planning on allowing users to submit their own python Docker files in the future.
If a reducer is invoked with a file or a TFrame, the path to the file or the raw data of the TFrame will be passed to the reducer as a global variable named content
.
If both a file and a TFrame are supplied to the reducer, then content
variable will be the path to the file.
For example to write a reducer that takes a csv file and parses it do the following:
import pandascontent = pandas.read_csv(content)
When passing a TFrame to a reducer, Terrene will first convert the TFrame to a CSV file and will upload it to the reducer. For example to write a reducer that takes a TFrame and adds a new column, you can do the following:
import pandascontent = pandas.read_csv(content)content['new_col'] = content['old_col'] * 2
To output something from the reducer, assign it to the content
variable. Only pandas dataframes are valid outputs of a reducer.
Reducers accept a context
variable whenever invoked. These context variables will be accessible as environment variables on runtime and can be accessed by doing the following:
import osmy_context = os.environ.get('CONTEXT_VARIABLE_NAME')