This procedure splits a table or index file that you want to bulk import into HFiles, using the split keys that you specify. The split keys are specified in a CSV file that you can create in one of two ways:
- If you know how your data can best be split into evenly sized HFiles, you can manually create a CSV file, as described in our Importing Data: Bulk HFile Examples topic.
- You can call the
SYSCS_UTIL.COMPUTE_SPLIT_KEYprocedure to compute the split keys for the data and save them in a CSV file.
For more information about splitting your tables and indexes into HFiles, see the Using Bulk HFile Import section of our Importing Data tutorial.
Splice Machine recommends using the
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX system procedure instead of this one unless you’re an expert user. The combination of using
SYSCS_UTIL.COMPUTE_SPLIT_KEY is exactly equivalent to using
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS ( schemaName, tableName, indexName, splitPoints );
The name of the schema of the table or index that you are splitting.
The name of the table you are splitting.
The name of the index that you are splitting. If this is null, the specified table is split; if this is non-null, the index is split instead.
A list of split points for the table or index, supplied in a CSV file; this list can be created by a previous call to the
SYSCS_UTIL.COMPUTE_SPLIT_KEY procedure, or you can prepare it manually, in which case, it needs to follow the criteria specified in the next section, Split Points CSV File Format.
Split Points CSV File Format
If you are manually preparing the
splitPoints CSV file, you must create a version of the file you are importing that contains only rows that are region boundary rows. Each row in the file:
- contains only the primary key column value, if you’re importing a table.
- contains only the index column values, if you’re importing an index.
You can use the
SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS procedure to pre-split a data file that you’re importing with the
When you pre-split your data, make sure that you set the
skipSampling parameter to
true when calling
SYSCS_UTIL.BULK_IMPORT_HFILE; that tells the bulk import procedure that you have already split your data.
The Importing Data: Using Bulk HFile Import section of our Importing Data Tutorial describes the different methods for using our bulk HFile import functionality.