Configuring Performance of Your Splice Machine Database
This section contains best practice and troubleshooting information related to modifying configuration options to fine-tune database performance with our On-Premise Database product, in these topics:
- Resolving Periodic Spikes in HBase Read Times
- Increasing Parallelism for Spark Shuffles
- Increasing Memory Settings for Heavy Analytical Work Loads
- Force Compaction to Run Locally
Resolving Periodic Spikes in HBase Read Times
If you’re using Cloudera and you closely monitor your read request queues as a way to stay on top of your cluster load, you might observe a spike in reads every 30 minutes. Cloudera Manager enables an
Hbase Region Health Canary that pings every server once every 30 minutes. As long as you are not experiencing any throughput problems, these spikes are harmless. If you want to get rid of the spikes, you can disable this monitoring, as follows:
- In Cloudera Manager, navigate to
HBase service -> Configuration -> Monitoring.
- Deselect (uncheck)
HBase Region Health Canary.
Increasing Parallelism for Spark Shuffles
You can adjust the minimum parallelism for Spark shuffles by adjusting the value of the
splice.olap.shuffle.partitions configuration option.
This option is similar to the
spark.sql.shuffle.partitions option, which configures the number of partitions to use when shuffling data for joins or aggregations; however, the
spark.sql.shuffle.partitions option is set to allow a lower number of partitions than is optimal for certain operations.
Specifically, increasing the number of shuffle partitions with the
splice.olap.shuffle.partitions option is useful when performing operations on small tables that generate large, intermediate datasets; additional, but smaller sized partitions allows us to operate with better parallelism.
The default value of
Increasing Memory Settings for Heavy Analytical Work Loads
If you are running heavy analytical loads or running OLAP jobs on very large tables, you may want to increase these property settings in your
|Property||Default Value (MB)||Recommendations for Heavy Analytical Loads|
|splice.olap_server.memory||1024||Set to the same value as HMaster heap size|
|splice.olap_server.memoryOverhead||512||Set to 10% of splice.olap_server.memory|
|splice.olap_server.virtualCores||1 vCore||4 vCores|
Force Compaction to Run on Local Region Server
Splice Machine attempts to run database compaction jobs on an executor that is co-located with the serving Region Server; if it cannot find a local executor after a period of time, Splice Machine uses whatever executor Spark executor it can get; to force use of a local executor, you can adjust the
splice.spark.dynamicAllocation.minExecutors configuration option.
To do so:
- Set the value of
splice.spark.dynamicAllocation.minExecutorsto the number of Region Servers in your cluster
- Set the value of
splice.spark.dynamicAllocation.maxExecutorsto equal to or greater than that number. Adjust these setting in the
Java Config Optionssection of your HBase Master configuration.
The default option settings are:
For a cluster with 20 Region Servers, you would set these to: