Creating a New Splice Machine Cluster
When you first visit your new Splice Machine Cloud Manager dashboard, you’ll see the initial dashboard view, which prompts you to create a new cluster:
Click Create New Cluster to start the process of provisioning your Splice Machine cluster. You’ll then need to:
- Configure Cluster Parameters for data sizing, cluster power, and backup frequency.
- Configure Cluster Access and Options for your users.
- Set Up Payment for your Splice Machine cluster.
- Start Using Splice Machine!
Configure Cluster Parameters
You use the Create New Cluster screen to provision your cluster:
If you have subscribed to Splice Machine via the AWS Marketplace, your costs will be estimated on an hourly basis instead of a monthly basis:
Many of the components of the Create Cluster screen, like most of our Cloud Manager screens, include small information buttons that you can click to display a small pop-up that describes the components.
For example, here are the pop-ups from the Create New Cluster screen:
Click the information button again to dismiss a pop-up.
About the Cluster Parameters
You’ll notice several sliders that you can adjust to modify the configuration of your cluster. As you move these sliders, you’ll see how the estimated monthly costs for your cluster change. Here are explanations of the adjustments you can make to your cluster provisioning:
Note that you can come back and modify your cluster configuration in the future, so you’re not stuck forever with your initial settings.
||Supply whatever name you want for your Splice Machine cluster.|
You can select which cloud provider is hosting your cluster by clicking the current provider name, which drops down a list of choices.
If you have subscribed to Splice Machine via the AWS Marketplace, your costs will be estimated on an hourly basis instead of on a monthly basis.
||You can select in which region your cluster will reside by clicking the current region name, which drops down a list of choices.|
Move the slider to modify your estimate of how large your database will be.
Select this checkbox to have us provision dedicated storage for your database instance, which does add cost.
Leave this unselected to have your database instance stored on shared hardware.
Move the slider to modify your estimate of how large your external dataset will be.
Move the slider to modify your estimate of how much processing power you need for transactional activity, involving quick inserts, lookups, updates, and deletes. More OLTP units means more region servers in your cluster.
Move the slider to modify your estimate of how much processing power you need for running longer queries, typically analytical queries. More OLAP units means more Spark executors.
Move the slider to modify your estimate of how many Spark units should be utilized by the Splice Machine Native Spark Datasource and other external uses of Spark libraries, such as MLlib.
||Select this checkbox to enable the Splice Machine ML Manager, which provides access to our Model Workflow and Deployment integration and additional Machine Learning libraries.|
Select how frequently you want Splice Machine to back up your database. You can select
A Splice Unit is a measure of processing work; one unit currently translates (approximately) to 2 virtual CPUs and 16 GB of memory.
Modifying Cluster Parameters
We recommend that you spend a few minutes experimenting with modifying the cluster parameters; you’ll notice that as you increase various values, the estimated monthly cost of your cluster changes.
When you’re satisfied with your cluster configuration parameters, click the Next button to set up access to your cluster.
You’ll notice that when you increase some values, Splice Machine may indicate that the current setting for a parameter clashes with a change that you’ve made. For example, in the following image, we have increased the Internal Dataset size to 20 TB, and as a result the Cluster Power values are no longer adequate to support that large a dataset, as indicated by the striping:
Splice Machine will not allow you to create your cluster if any of your values clash. You can click the vertical bar at the end of the striping to instantly set the parameter to the required value.
If you don’t correct the required setting and attempt to advance to the Next screen, you’ll see an error message and will be unable to advance until you do correct it.
Configure Cluster Access and Options
Once you’ve configured your cluster, click the Next button to display the Cluster Access and Options screen. The following image includes displays of the pop-up help information displays for the different access methods:
You can set your cluster up for access to your Amazon Virtual Private
Cloud (VPC) access by selecting the
Client VPC connectivity required
option and providing your VPC account ID.
You need to configure AWS Identity and Access Management (IAM) for your cluster to allow Splice Machine to access selected S3 folders; this is described in our Configuring an S3 bucket for Splice Machine Acces tutorial.
For more information about Amazon VPC, see https://aws.amazon.com/vpc/.
For more information about Amazon IAM, see https://aws.amazon.com/iam/.
You can change the number of Zeppelin instances available on your cluster, and you can adjust how much Java memory is allocated for the Spark Interpreter in each instance. Multiple Zeppelin instances allow multiple users to develop and run notebooks independently.
You can also add (at an additional cost) our Machine Learning Manager) to your cluster by clicking the Enable button in the ML Manager section at the bottom of this screen. The Splice Machine ML Manager facilitates machine learning development by integrating MLflow, Amazon Sagemaker deployment, additional Machine Learning libraries, and our database together.
After setting up any options and access methods, please confirm that you
terms and conditions, then click the Launch button, which will take you to the
Payment screen, unless you’ve
subscribed to Splice Machine from the Amazon Marketplace or have already
set up a payment method for your account.
Set Up Payment
When you click the Launch button, then one of these actions happens:
- If you subscribed to Splice Machine via the AWS Marketplace, or you already have a payment method set up on your account, you’ll land on your dashboard and will be notified when your cluster has been initialized.
- If you don’t yet have a payment method set up, you’ll land on the Payment screen, in which you can elect to use on of three payment methods:
|ACH Electronic Transfer||
Modifying Payment Information
If you ever need to change your Splice Machine payment information, you can update it in the Billing Activity tab of the Account screen; just click the Update button to revisit the Payment screen:
If you’ve purchased Splice Machine through Amazon Marketplace, change your billing credentials in the Marketplace instead.
Start Using Your Database!
After your cluster spins up, which typically requires about 10 minutes, you can load your data into your Splice Machine database and start running queries.
The easiest way to get going with your new database is to use our Zeppelin Notebook interface, with which you can quickly run queries and generate different visualizations of your results, all without writing any code. We’ve provided a number of useful Zeppelin tutorials, including one that walks you through setting up a schema, creating tables, loading data, and then running queries.
Note that your data must be in Azure storage or an AWS S3 bucket before you can import it into your Splice Machine database:
- For information about uploading data to S3, please check our Uploading Data to an S3 Bucket tutorial. You may need to configure your Amazon IAM permissions to allow Splice Machine to access your bucket; see our Configuring an S3 Bucket for Splice Machine Access tutorial.
- To configure Azure Storage for use with Splice Machine, see our Using Azure Storage tutorial.
- Once you’ve got your data uploaded, you can follow our Ingestion Best Practices topic to load that data into Splice Machine.