Creating a New Splice Machine Cluster

When you first visit your new Splice Machine Cloud Manager dashboard, you’ll see the initial dashboard view, which prompts you to create a new cluster:

Click Create New Cluster to start the process of provisioning your Splice Machine cluster. You’ll then need to:

  1. Configure Cluster Parameters for data sizing, cluster power, and backup frequency.
  2. Configure Cluster Access and Options for your users.
  3. Set Up Payment for your Splice Machine cluster.
  4. Start Using Splice Machine!

Configure Cluster Parameters

You use the Create New Cluster screen to provision your cluster:

If you have subscribed to Splice Machine via the AWS Marketplace, your costs will be estimated on an hourly basis instead of a monthly basis:

Screen Help

Many of the components of the Create Cluster screen, like most of our Cloud Manager screens, include small information buttons that you can click to display a small pop-up that describes the components.

For example, here are the pop-ups from the Create New Cluster screen:

Click the information button again to dismiss a pop-up.

About the Cluster Parameters

You’ll notice several sliders that you can adjust to modify the configuration of your cluster. As you move these sliders, you’ll see how the estimated monthly costs for your cluster change. Here are explanations of the adjustments you can make to your cluster provisioning:

Note that you can come back and modify your cluster configuration in the future, so you’re not stuck forever with your initial settings.

  Cluster Name Supply whatever name you want for your Splice Machine cluster.
  Cloud Provider

You can select which cloud provider is hosting your cluster by clicking the current provider name, which drops down a list of choices.

If you have subscribed to Splice Machine via the AWS Marketplace, your costs will be estimated on an hourly basis instead of on a monthly basis.

  Region You can select in which region your cluster will reside by clicking the current region name, which drops down a list of choices.
Internal Dataset (TB)

Move the slider to modify your estimate of how large your database will be.

Internal Dataset is the amount of data that you will be storing within your Splice Machine database.

Dedicated Storage

Select this checkbox to have us provision dedicated storage for your database instance, which does add cost.

Leave this unselected to have your database instance stored on shared hardware.

External Dataset (TB)

Move the slider to modify your estimate of how large your external dataset will be.

External Dataset is the amount of data the you will be accessing from external data sources, using features such as external tables and our virtual table interface.

OLTP Splice Units

Move the slider to modify your estimate of how much processing power you need for transactional activity, involving quick inserts, lookups, updates, and deletes. More OLTP units means more region servers in your cluster.

OLAP Splice Units

Move the slider to modify your estimate of how much processing power you need for running longer queries, typically analytical queries. More OLAP units means more Spark executors.

Notebook Spark Units

Move the slider to modify your estimate of how many Spark units should be utilized by the Splice Machine Native Spark Datasource and other external uses of Spark libraries, such as MLlib.

Enable Select this checkbox to enable the Splice Machine ML Manager, which provides access to our Model Workflow and Deployment integration and additional Machine Learning libraries.
Frequency

Select how frequently you want Splice Machine to back up your database. You can select Hourly, Daily, or Weekly; each selection displays additional backup timing and retention options:

Hourly:

Daily:

Weekly:

A Splice Unit is a measure of processing work; one unit currently translates (approximately) to 2 virtual CPUs and 16 GB of memory.

Modifying Cluster Parameters

We recommend that you spend a few minutes experimenting with modifying the cluster parameters; you’ll notice that as you increase various values, the estimated monthly cost of your cluster changes.

When you’re satisfied with your cluster configuration parameters, click the Next button to set up access to your cluster.

You’ll notice that when you increase some values, Splice Machine may indicate that the current setting for a parameter clashes with a change that you’ve made. For example, in the following image, we have increased the Internal Dataset size to 20 TB, and as a result the Cluster Power values are no longer adequate to support that large a dataset, as indicated by the striping:

Splice Machine will not allow you to create your cluster if any of your values clash. You can click the vertical bar at the end of the striping to instantly set the parameter to the required value.

If you don’t correct the required setting and attempt to advance to the Next screen, you’ll see an error message and will be unable to advance until you do correct it.

Configure Cluster Access and Options

Once you’ve configured your cluster, click the Next button to display the Cluster Access and Options screen. The following image includes displays of the pop-up help information displays for the different access methods:

You can set your cluster up for access to your Amazon Virtual Private Cloud (VPC) access by selecting the Client VPC connectivity required option and providing your VPC account ID.

You need to configure AWS Identity and Access Management (IAM) for your cluster to allow Splice Machine to access selected S3 folders; this is described in our Configuring an S3 bucket for Splice Machine Acces tutorial.

For more information about Amazon VPC, see https://aws.amazon.com/vpc/.

For more information about Amazon IAM, see https://aws.amazon.com/iam/.

You can change the number of Zeppelin instances available on your cluster, and you can adjust how much Java memory is allocated for the Spark Interpreter in each instance. Multiple Zeppelin instances allow multiple users to develop and run notebooks independently.

You can also add (at an additional cost) our Machine Learning Manager) to your cluster by clicking the Enable button in the ML Manager section at the bottom of this screen. The Splice Machine ML Manager facilitates machine learning development by integrating MLflow, Amazon Sagemaker deployment, additional Machine Learning libraries, and our database together.

After setting up any options and access methods, please confirm that you accept our terms and conditions, then click the Launch button, which will take you to the Payment screen, unless you’ve subscribed to Splice Machine from the Amazon Marketplace or have already set up a payment method for your account.

Set Up Payment

When you click the Launch button, then one of these actions happens:

  • If you subscribed to Splice Machine via the AWS Marketplace, or you already have a payment method set up on your account, you’ll land on your dashboard and will be notified when your cluster has been initialized.
  • If you don’t yet have a payment method set up, you’ll land on the Payment screen, in which you can elect to use on of three payment methods:

Credit Card

ACH Electronic Transfer

Authorization Code

Modifying Payment Information

If you ever need to change your Splice Machine payment information, you can update it in the Billing Activity tab of the Account screen; just click the Update button to revisit the Payment screen:

If you’ve purchased Splice Machine through Amazon Marketplace, change your billing credentials in the Marketplace instead.

Start Using Your Database!

After your cluster spins up, which typically requires about 10 minutes, you can load your data into your Splice Machine database and start running queries.

The easiest way to get going with your new database is to use our Zeppelin Notebook interface, with which you can quickly run queries and generate different visualizations of your results, all without writing any code. We’ve provided a number of useful Zeppelin tutorials, including one that walks you through setting up a schema, creating tables, loading data, and then running queries.

Note that your data must be in an AWS S3 bucket before you can import it into your Splice Machine database: