Best Practices: Troubleshooting On-Premise Bulk Import

This section contains the following tips for troubleshooting ingestion of data into our On-Premise Database product:

There are currently no troubleshooting issues to address if you’re using the Splice Machine Database-as-Service product.

Using Bulk Import on a KMS-Enabled Cluster

If you are a Splice Machine On-Premise Database customer and want to use bulk import on a cluster with Cloudera Key Management Service (KMS) enabled, you must complete these extra configuration steps:

  1. Make sure that the bulkImportDirectory is in the same encryption zone as is HBase.
  2. Add these properties to hbase-site.xml to load secure Apache BulkLoad and to put its staging directory in the same encryption zone as HBase:


    Replace <YourStagingDirectory> with the path to your staging directory, and make sure that directory is in the same encryption zone as HBase; for example:


For more information about KMS, see

Bulk Import: Out of Heap Memory

If you run out of heap memory while bulk importing an extremely large amount of data with our On-Premise product, you can resolve the issue by setting the hbase client’s hfile.block.cache.size property value to a very small number. We recommend this setting:


This setting should be applied only to the HBase client.

Bulk Import: Network Timeout

If you enounter a network timeout during bulk ingestion with our On-Premise product, you can resolve it by adjusting the value of the property as follows:

Bulk Import: Out of Direct Memory (Cloudera)

When using the On-Premise version of Splice Machine with Spark with Cloudera, bulk import of very large datasets can fail due to direct memory usage. Use the following settings to resolve this issue:

  1. Update the Shuffle-to-Mem Setting

    Modify the following setting in the Cloudera Manager’s Java Configuration Options for HBase Master:

  2. Update the YARN User Classpath

    Modify the following settings in the Cloudera Manager’s YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve):

  3. Due to how Yarn manages memory, you need to modify your YARN configuration when bulk-importing large datasets. Make this change in your Yarn configuration, ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml:


    You may also need to temporarily make this additional configuration update as a workaround for memory allocation issues. Note that this update is not recommended for production usage, as it affects all YARN jobs and could cause your cluster to become unstable: