This procedure splits a table or index file that you want to bulk import into HFiles, using the split keys that you specify. The split keys are specified in a CSV file that is encoded in HBase format.
Unless you already have your split keys accessible in HBase format, Splice Machine recommends using the
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX system procedure instead of this one. The combination of using
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS is exactly equivalent to using
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS ( schemaName, tableName, indexName, splitPoints );
The name of the schema of the table or index that you are splitting.
The name of the table you are splitting.
The name of the index that you are splitting. If this is null, the specified table is split; if this is non-null, the index is split instead.
A list of split points for the table or index, supplied in HBase format in a CSV file this list can be created by a previous call to the
SYSCS_UTIL.COMPUTE_SPLIT_KEY procedure, or you can prepare it manually, in which case, it needs to follow the criteria specified in the next section, Split Points CSV File Format.
Split Points CSV File Format
If you are manually preparing the
splitPoints CSV file, you must create a version of the file you are importing that contains only rows that are region boundary rows. Each row in the file:
- contains only the primary key column value, if you’re importing a table.
- contains only the index column values, if you’re importing an index.
You can use the
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS procedure to pre-split a data file that you’re importing with the
When you pre-split your data, make sure that you set the
skipSampling parameter to
true when calling
SYSCS_UTIL.BULK_IMPORT_HFILE; that tells the bulk import procedure that you have already split your data.
The Best Practices: Bulk Importing Flat Files section of our Importing Data Tutorial describes the different methods for using our bulk HFile import functionality.