Splice Machine Troubleshooting and Best Practices

    Learn about our products

This topic provides troubleshooting guidance for these issues that you may encounter with your Splice Machine database:

Restarting Splice Machine After HMaster Failure

If you run Splice Machine without redundant HMasters, and you lose your HMaster, follow these steps to restart Splice Machine:

  1. Restart the HMaster node
  2. Restart every HRegion Server node

Slow Restart After Forced Shutdown

We have seen a situation where HMaster doesn’t exit when you attempt a shutdown, and a forced shutdown is used. The forced shutdown means that HBase may not be able to flush all data and delete all write-ahead logs (WALs); as a result, it can take longer than usual to restart HBase and Splice Machine.

Splice Machine now sets the HBase Graceful Shutdown Timeout to 10 minutes, which should be plenty of time. If the shutdown is still hanging up after 10 minutes, a forced shutdown is appropriate.

Updating Stored Query Plans after a Splice Machine Update

When you install a new version of your Splice Machine software, you need to make these two calls:

These calls will update the stored metadata query plans and purge the statement cache, which is required because the query plan APIs have changed. This is true for both minor (patch) releases and major new releases.

Increasing Parallelism for Spark Shuffles

You can adjust the minimum parallelism for Spark shuffles by adjusting the value of the splice.olap.shuffle.partitions configuration option.

This option is similar to the spark.sql.shuffle.partitions option, which configures the number of partitions to use when shuffling data for joins or aggregations; however, the spark.sql.shuffle.partitions option is set to allow a lower number of partitions than is optimal for certain operations.

Specifically, increasing the number of shuffle partitions with the splice.olap.shuffle.partitions option is useful when performing operations on small tables that generate large, intermediate datasets; additional, but smaller sized partitions allows us to operate with better parallelism.

The default value of splice.olap.shuffle.partitions is 200.

Increasing Memory Settings for Heavy Analytical Work Loads

If you are running heavy analytical loads or running OLAP jobs on very large tables, you may want to increase these property settings in your hbase-site.xml file:

Property Default Value (MB) Recommendations for Heavy Analytical Loads
splice.olap_server.memory 1024 Set to the same value as HMaster heap size
splice.olap_server.memoryOverhead 512 Set to 10% of splice.olap_server.memory
splice.olap_server.virtualCores 1 vCore 4 vCores
splice.olap_server.external true true

Force Compaction to Run on Local Region Server

Splice Machine attempts to run database compaction jobs on an executor that is co-located with the serving Region Server; if it cannot find a local executor after a period of time, Splice Machine uses whatever executor Spark executor it can get; to force use of a local executor, you can adjust the splice.spark.dynamicAllocation.minExecutors configuration option.

To do so:

  • Set the value of splice.spark.dynamicAllocation.minExecutors to the number of Region Servers in your cluster
  • Set the value of splice.spark.dynamicAllocation.maxExecutors to equal to or greater than that number. Adjust these setting in the Java Config Options section of your HBase Master configuration.

The default option settings are:

-Dsplice.spark.dynamicAllocation.minExecutors=0
-Dsplice.spark.dynamicAllocation.maxExecutors=12

For a cluster with 20 Region Servers, you would set these to:

-Dsplice.spark.dynamicAllocation.minExecutors=20
-Dsplice.spark.dynamicAllocation.maxExecutors=20

Kerberos Configuration Option

If you’re using Kerberos, you need to add this option to your HBase Master Java Configuration Options:

-Dsplice.spark.hadoop.fs.hdfs.impl.disable.cache=true

Resource Management for Backup Jobs

Splice Machine backup jobs use a Map Reduce job to copy HFiles; this process may hang up if the resources required for the Map Reduce job are not available from Yarn. To make sure the resources are available, follow these three configuration steps:

  1. Configure minimum executors for Splice Spark
  2. Verify that adequate vcores are available for Map Reduce tasks
  3. Verify that adequate memory is available for Map Reduce tasks

Configure the minimum number of executors allocated to Splice Spark

You need to make sure that both of the following configuration settings relationships hold true.

(splice.spark.dynamicAllocation.minExecutors + 1) < (yarn.nodemanager.resource.cpu-vcores * number_of_nodes)
(splice.spark.dynamicAllocation.minExecutors * (splice.spark.yarn.executor.memoryOverhead+splice.spark.executor.memory) + splice.spark.yarn.am.memory) < (yarn.nodemanager.resource.memory-mb * number_of_nodes)

The actual minExecutors allocated to Splice Spark may be less than specified in splice.spark.dynamicAllocation.minExecutors because of memory constraints in the container. Once Splice Spark is launched, Yarn will allocate the actual minExecutor value and memory to Splice Spark. You need to verify that enough vcores and memory remain available for Map Reduce tasks.

Verify that adequate vcores are available

The Map Reduce application master requires the following number of vcores:

yarn.app.mapreduce.am.resource.cpu-vcores * splice.backup.parallesim

There must be at least this many additional vcores available to execute Map Reduce tasks:

max{mapreduce.map.cpu.vcores,mapreduce.reduce.cpu.vcores}

Thus, the total number of vcores that must be available for Map Reduce jobs is:

yarn.app.mapreduce.am.resource.cpu-vcores * splice.backup.parallesim + max{mapreduce.map.cpu.vcores,mapreduce.reduce.cpu.vcores}

Verify that adequate memory is available

The Map Reduce application master requires this much memory:

yarn.scheduler.minimum-allocation-mb * splice.backup.parallesim

There must be at least this much memory available to execute Map Reduce tasks:

yarn.scheduler.minimum-allocation-mb

Thus, the total number of memory that must be available for Map Reduce jobs is:

yarn.scheduler.minimum-allocation-mb * (splice.backup.parallesim+1)

Bulk Import of Very Large Datasets with Spark 2.2

When using Splice Machine with Spark 2.2 with Cloudera, bulk import of very large datasets can fail due to direct memory usage. Use the following settings to resolve this issue:

Update Shuffle-to-Mem Setting

Modify the following setting in the Cloudera Manager’s Java Configuration Options for HBase Master:

-Dsplice.spark.reducer.maxReqSizeShuffleToMem=134217728

Update the YARN User Classpath

Modify the following settings in the Cloudera Manager’s YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve):

YARN_USER_CLASSPATH=/opt/cloudera/parcels/SPARK2/lib/spark2/yarn/spark-2.2.0.cloudera1-yarn-shuffle.jar:/opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-library-2.11.8.jar
YARN_USER_CLASSPATH_FIRST=true

Using Bulk Import on a KMS-Enabled Cluster

If you are a Splice Machine On-Premise Database customer and want to use bulk import on a cluster with Cloudera Key Management Service (KMS) enabled, you must complete these extra configuration steps:

  1. Change the permissions of /hbase to 711.
  2. Configure permissions for hbase.rootdir.perms to 711 by adding this property to hbase-site.xml on HMaster:
    <property>
        <name>hbase.rootdir.perms</name>
        <value>711</value>
    </property>
    
  3. Make sure that the bulkImportDirectory is in the same encryption zone as is HBase.
  4. Add these properties to hbase-site.xml to load secure Apache BulkLoad and to put its staging directory in the same encryption zone as HBase:
    <property>
       <name>hbase.bulkload.staging.dir</name>
       <value><YourStagingDirectory></value>
     </property>
     <property>
       <name>hbase.coprocessor.region.classes</name>
       <value>org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
     </property>

    Replace <YourStagingDirectory> with the path to your staging directory, and make sure that directory is in the same encryption zone as HBase; for example:

        <value>/hbase/load/staging</value>
    

For more information about KMS, see https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_kms.html.

Assigning Full Administrative Privileges to Users

The default administrative user ID in Splice Machine is splice. If you want to configure other users to have the same privileges as the splice user, follow these steps:

  1. Add an LDAP admin_group mapping for the splice user to both:

    • the HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml
    • the HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xml

    This maps members of the admin_group to the same privileges as the splice user:

    <property>
     <name>splice.authentication.ldap.mapGroupAttr</name>
     <value>admin_group=splice</value>
    </property>
  2. Assign the admin_group to a user by specifying cn=admin_group in the user definition. For example, we’ll add a myUser with these attributes:

    dn: cn=admin_group,ou=Users,dc=splicemachine,dc=colo
    sn: MyUser
    objectClass: inetOrgPerson
    userPassword: myPassword
    uid: myUser

    You need to make sure that the name you specify for cn is exactly the same as the name (the value to the left of the equality symbol) in the splice.authentication.ldap.mapGroupAttr property value. Matching is case sensitive!

    Now myUser belongs to the admin_group group, and thus gains all privileges associated with that group.

Verifying the Configuration

We can now run a few tests to verify the the super uer myUser has administrative privileges. Suppose:

  • userA and userB are regular LDAP users
  • userA owns the schema userA in your Splice Machine database
  • userB owns the schema userB in your Splice Machine database
  • the userA schema has a table named t1, the contents of which are shown below
  • the userB schema has a table named t2, the contents of which are shown here:
splice> select * from userA.t1
T1
-------------
1
2
3

3 rows selected
splice> select * from userB.t2
T2
--------------
1
2
3

3 rows selected

Now we’ll run two tests using the splice> command line interpreter:

Test 1: Verify that myUser can access schemas and tables belonging to both

  1. Connect to Splice Machine as myUser:

    connect 'jdbc:splice://localhost:1527/splicedb;user=myUser;password=myUserPassWord' as myuser_con;
    
  2. Verify that you can select from both schemas:

    splice select * from userA.t1
    T1
    -------------
    1
    2
    3
    3 rows selected
    splice> select * from userB.t2
    T2
    --------------
    1
    2
    3
    3 rows selected
    
  3. Make sure that while connected to myuser_con, you can also perform table operations such as insert, delete, update, and drop table. Also make sure you can create and drop schemas.

Test 2: Verify that myUser can grant privileges to other users.

We’ll test this by granting privileges on schema userB to userA and confirming that userA can access the schema.

splice> connect 'jdbc:splice://localhost:1527/splicedb;user=userA;password=userAPassword' as userA_con;
splice> select * from userB.t2;
ERROR 42502: User 'USERA' does not have SELECT permission on column 'A2' of table 'USERB'.'T2'.
splice> set connection myuser_con;
splice> grant all privileges on schema userB to userA;
0 rows inserted/updated/deleted
splice> set connection userA_con;
splice> select * from userB.t2;
A2
-----------
1
2
3

3 rows selected

Assigning to Multiple Groups

You can assign the privileges to multiple groups by specifying a comma separated list in the ldap.mapGroupAttr property. For example, changing its definition to this:

<property>
    <name>splice.authentication.ldap.mapGroupAttr</name>
    <value>admin_group=splice,cdl_group=splice</value>
</property>

Means that members of both the admin_group and cdl_group groups will have the same privileges as the splice user.