Splice Machine ML Manager
The Splice ML Manager is an integrated machine learning (ML) platform that minimizes data movement and enables enterprises to deliver better decisions faster by continuously training the models on the most updated available data. With Splice ML Manager, data science teams are able to produce a higher number of more predictive models, facilitated by the ability to:
- Experiment frequently using diverse parameters to compare model effectiveness
- Leverage updated operational data to concurrently train the model
- Minimize the movement of data by running the models on your cluster’s Spark executors
- Compress the time from model deployment to action
Splice ML Manager provides end-to-end life-cycle management for your ML models, thereby streamlining and accelerating the design and deployment of intelligent applications using real-time data. ML Manager through its tight integration with Splice data platform results in reduced data movement that empowers data scientists to conduct a higher number of experiments to derive better feature vectors with more signal and compare algorithms with varied parameters to build better models in a limited amount of time.
The Splice ML Manager facilitates machine learning development within Zeppelin notebooks. Here are some of its key features:
- ML Manager runs directly on Apache Spark, allowing you to complete massive jobs in parallel.
- Our native
PySpliceContextlets you directly access the data in your database and very efficiently convert it to/from a Spark DataFrame with no serialization/deserialization required.
- MLflow is integrated directly into your Splice Machine cluster, to facilitate tracking of your entire Machine Learning workflow.
- After you have found the best model for your task, you can easily deploy it live to AWS SageMaker to make predictions in real time.
- As new data flows in, updating your model is a simple matter of returning to your Notebook, creating new runs, and redeploying by tapping a button.
The Splice ML Manager leverages Apache MLflow and Amazon Sagemaker, and like MLflow, is organized around the concepts of runs:
A run is the execution of some data science code; each run can record different types of information, including:
|Metrics||Map string values such as
||Model output metrics, such as: *F1 score, AUC, Precision, Recall, R^2*.|
|Parameters||Map strings such as
||Model parameters, such as Num Trees, Preprocessing Steps, Regularization.|
|Models||So that you can subsequently deploy them to SageMaker.||Fitted pipelines or models.|
|Tags||These map strings such as
||Specific pieces of information associated with a run, such as the project, version, and deployable status.|
For more information about logging information in the ML Manager, see the ML Manager API topic.
ML Manager organizes runs into experiments; each experiment groups runs together for a specific task; for example, you might experiment with using different machine learning models in different runs, to compare results.
Using ML Manager
The other topics in this section will help you to start using ML Manager: