The Splice ML Manager API

This topic describes the methods available in the ML Manager API; you’ll find examples of each of these in the Using ML Manager topic, in this chapter.

You can use Splice ML Manager with Python in Zeppelin notebooks, using our pyspark interpreter and our MLManager class in your program to manipulate experiments.

This topic contains the following sections:

Getting Started with the ML Manager API

To get started with MLManager, you need to:

  1. Create your MLManager instance
  2. Establish a connection to your database
  3. Create an experiment
  4. Create a run
  5. Run your experiment(s)

1. Create your MLManager instance

To use ML Manager, you need to first create a class instance:

%spark.pyspark
from splicemachine.ml.management import MLManager
manager = MLManager()

2. Connect to Your Database

You can establish a connection to your database using our Native Spark Datasource, which is encapsulated in the SpliceMLContext object. Once you’ve connected, you can use the SpliceMLContext to perform database inserts, selects, upserts, updates and many more functions, all directly from Spark, without any required serialization.

%spark.pyspark
from splicemachine.spark.context import SpliceMLContext
splice = SpliceMLContext(spark)

3. Create an Experiment

This code creates a new experiment and sets it as the active experiment:

%spark.pyspark
manager.create_experiment('myFirstExperiment')
manager.set_active_experiment('myFirstExperiment')

4. Create a Run

We use a method of our MLManager object to create a new run:

manager.create_new_run(user_id=‘firstrun’)

Run Your Experiment

The Using ML Manager topic in this chapter provides a complete example of running machine learning experiments with ML Manager in a Zeppelin notebook.

ML Manager Methods

This section describes the methods of the MLManager class, in these three subsections:

Experiment and Run Methods

This section describes the ML Manager methods for working with experiments and runs:

create_experiment

Use the create_experiment method to create and name an experiment.

manager.create_experiment( experiment_name )

experiment_name

A string name or integer ID you want to use for the experiment.

Example:

manager.create_experiment( 'myFirstExperiment')


create_new_run

Use the create_new_run to create a new run under the currently active experiment, and to make the new run the currently active run.

manager.create_new_run( run_name )

run_name

The name of the user creating the run.

Example:

manager.create_new_run( 'myNewRun')


reset_run

Use the reset_run method to rest the current run. This deletes logged parameters, metrics, artifacts, and other information associated with the run.

manager.reset_run( )

Example:

manager.reset_run( )


set_active_experiment

Use the set_active_experiment method to make an existing experiment the active experiment. All new runs will be created under this experiment.

manager.set_active_experiment( experiment_name )

experiment_name

A string name you want to use for the experiment.

Example:

manager.set_active_experiment( 'myFirstExperiment')


set_active_run

Use the set_active_run method to set a previous run as the active run under the current experiment; this allows you to log metadata for a completed run.

manager.set_active_run( run_name )

run_name

The string name of the run you want to make the active run.

Example:

manager.set_active_run( 'myNewRun')


Logging Methods

This section describes the ML Manager methods for logging models, parameters, metrics, and artifacts:

log_artifact

Use the log_artifact method to log a local file or directory as an artifact of the currently active run.

manager.log_artifact( local_path, artifact_path )

local_path

The path to the file that you want written to your artifacts URI.

artifact_path

Optional. The subdirectory of your artifacts URI to which you want the artifact written.

Example:

manager.log_artifact( '/tmp/myRunData' )


log_artifacts

Use the log_artifacts method to log the contents of a local directory as artifacts of the currently active run.

manager.log_artifacts( local_dir, artifact_path )

local_dir

The path to the directory of files that you want written to your artifacts URI.

artifact_path

Optional. The subdirectory of your artifacts URI to which you want the artifact written.

Example:

manager.log_artifacts( '/tmp/myRunInfo' )


log_metric

Use the log_metric method to log a (key, numeric-value) pair for the currently active run. You can update a metric throughout the course of the run, and you can subsequently view the metric’s history.

manager.log_metric( metric_name, metric_value )

metric_name

A string naming the metric to log.

metric_value

The double-precision numeric value to log for the metric.

Example:

    #log how long the model took
manager.log_metric('time', time_taken)


log_model

Use the log__model method to log a model for the currently active run.

log__model( model, module )

model

The fitted pipeline/model (in Spark) that you want to log.

module

The module that the model is part of; for example, mlflow.spark or mlflow.sklearn.

Example:

    #save model to MLflow for deployment
manager.log_model( model, 'mlflow.sklearn' )


log_param

Use the log_param method to log a (key, string-value) pair for the currently active run.

manager.log_param( param_name, param_value )

param_name

A string naming the parameter to log.

param_value

The string value to log for the parameter.

Example:

manager.log_param('classifier', 'neural network')
manager.log_param('maxIter', '100')


log_spark_model

Use the log_spark_model method to save a MLlib model you’ve created to MLflow, for future deployment.

log_spark_model( model )

model

The fitted pipeline/model you want to log.

Example:

    #save the pipeline and model to s3
model.save('s3a://myModels/myFirstModel')
    #save model to MLflow for deployment
manager.log_spark_model(model)


Tagging Methods

This section describes the ML Manager methods for tagging:

set_tag

Use the set_tag method to set the value of a tag for the current run. Tags are specific pieces of information associated with a run, such as the project ID, the version ID, or the deployable status.

set_tag( key, value )

tag_name

The name of the tag you want to assign a value to for the current run.

tag_value

The string value to for the tag.

Example:

manager.set_tag('projectId', 'myNewProject')