Creating a New Splice Machine Cluster

When you first visit your new Splice Machine Cloud Manager dashboard, you’ll see the initial dashboard view, which prompts you to create a new cluster:

Click Create New Cluster to start the process of provisioning your Splice Machine cluster. You’ll then need to:

  1. Configure Cluster Parameters for data sizing, cluster power, and backup frequency.
  2. Configure Cluster Access for your users.
  3. Set Up Payment for your Splice Machine cluster.
  4. Start Using Splice Machine!

Configure Cluster Parameters

You use the Create New Cluster screen to provision your cluster:

You can select your cloud provider from the Cloud Provider drop-down, just beneath the Cluster Name field:

If you have subscribed to Splice Machine via the AWS Marketplace, your costs will be estimated on an hourly basis instead of a monthly basis:

Screen Help

Many of the components of the Create Cluster screen, like most of our Cloud Manager screens, include small information buttons that you can click to display a small pop-up that describes the components.

About the Cluster Parameters

You’ll notice several sliders that you can adjust to modify the configuration of your cluster. As you move these sliders, you’ll see how the estimated monthly costs for your cluster change. Here are explanations of the adjustments you can make to your cluster provisioning:

Note that you can come back and modify your cluster configuration in the future, so you’re not stuck forever with your initial settings.

  Cluster Name Supply whatever name you want for your Splice Machine cluster.
  Region You can select in which AWS region your cluster will reside by clicking the previously selected region name, which drops down a list of choices.
Internal Dataset (TB)

Move the slider to modify your estimate of how large your database will be.

Internal Dataset is the amount of data that you will be storing within your Splice Machine database.

External Dataset (TB)

Move the slider to modify your estimate of how large your external dataset will be.

External Dataset is the amount of data the you will be accessing from external data sources, using features such as external tables and our virtual table interface.

OLTP Splice Units

Move the slider to modify your estimate of how much processing power you need for transactional query processing. More OLTP units means more region servers in your cluster.

OLAP Splice Units

Move the slider to modify your estimate of how much processing power you need for analytical query processing. More OLAP units means more Spark executors.

Frequency

Select how frequently you want Splice Machine to back up your database. You can select Hourly, Daily, or Weekly; each selection displays additional backup timing and retention options:

Hourly:

Daily:

Weekly:

A Splice Unit is a measure of processing work; one unit currently translates (approximately) to 2 virtual CPUs and 16 GB of memory.

Modifying Cluster Parameters

We recommend that you spend a few minutes experimenting with modifying the cluster parameters; you’ll notice that as you increase various values, the estimated monthly cost of your cluster changes.

When you’re satisfied with your cluster configuration parameters, click the Next button to set up access to your cluster.

You’ll notice that when you increase some values, Splice Machine may indicate that the current setting for a parameter clashes with a change that you’ve made. For example, in the following image, we have increased the Internal Dataset size to 20 TB, and as a result the Cluster Power values are no longer adequate to support that large a dataset, as indicated by the striping:

Splice Machine will not allow you to create your cluster if any of your values clash. You can click the vertical bar at the end of the striping to instantly set the parameter to the required value.

If you don’t correct the required setting and attempt to advance to the Next screen, you’ll see an error message and will be unable to advance until you do correct it.

Configure Cluster Access

Once you’ve configured your cluster, click the Next button to display the Cluster Access screen. The following image includes displays of the pop-up help information displays for the different access methods:

You can set your cluster up for access to your Amazon Virtual Private Cloud (VPC) access by selecting the Client VPC connectivity required option and providing your VPC account ID.

You need to configure AWS Identity and Access Management (IAM) for your cluster to allow Splice Machine to access selected S3 folders; this is described in our Configuring an S3 bucket for Splice Machine Acces tutorial.

For more information about Amazon VPC, see https://aws.amazon.com/vpc/.

For more information about Amazon IAM, see https://aws.amazon.com/iam/.

After setting up any access methods, please confirm that you accept our terms and conditions, then click the Launch button, which will take you to the Payment screen, unless you’ve subscribed to Splice Machine from the Amazon Marketplace or have already set up a payment method for your account.

Set Up Payment

When you click the Launch button, then one of these actions happens:

  • If you subscribed to Splice Machine via the AWS Marketplace, or you already have a payment method set up on your account, you’ll land on your dashboard and will be notified when your cluster has been initialized.
  • If you don’t yet have a payment method set up, you’ll land on the Payment screen, in which you can elect to use on of three payment methods:

Credit Card

ACH Electronic Transfer

Authorization Code

Modifying Payment Information

If you ever need to change your Splice Machine payment information, you can update it in the Billing Activity tab of the Account screen; just click the Update button to revisit the Payment screen:

If you’ve purchased Splice Machine through Amazon Marketplace, change your billing credentials in the Marketplace instead.

Start Using Your Database!

After your cluster spins up, which typically requires about 10 minutes, you can load your data into your Splice Machine database and start running queries.

The easiest way to get going with your new database is to use our Zeppelin Notebook interface, with which you can quickly run queries and generate different visualizations of your results, all without writing any code. We’ve provided a number of useful Zeppelin tutorials, including one that walks you through setting up a schema, creating tables, loading data, and then running queries.

Note that your data must be in an AWS S3 bucket before you can import it into your Splice Machine database: