SYSCS_UTIL.COMPUTE_SPLIT_KEY

Use the SYSCS_UTIL.COMPUTE_SPLIT_KEY system procedure to compute the split keys for a table or index prior to calling the   SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS procedure to split the data into HFiles. Once you’ve done that, call  SYSCS_UTIL.BULK_IMPORT_HFILE system procedure to import your data in HFile format.

Syntax

call SYSCS_UTIL.COMPUTE_SPLIT_KEY (
        schemaName,
        tableName,
        indexName,
        columnList | null,
        fileName,
        columnDelimiter | null,
        characterDelimiter | null,
        timestampFormat | null,
        dateFormat | null,
        timeFormat | null,
        maxBadRecords,
        badRecordDirectory | null,
        oneLineRecords | null,
        charset | null,
        outputDirectory
);

Parameters

The following table summarizes the parameters used by SYSCS_UTIL.COMPUTE_SPLIT_KEY and other Splice Machine data importation procedures. Each parameter name links to a more detailed description in our Importing Data Tutorial.

The parameter values that you pass into this procedure should match the values that you use when you subsequently call the  SYSCS_UTIL.BULK_IMPORT_HFILE procedure to perform the import.

Category Parameter Description Example Value
Table Info schemaName The name of the schema of the table in which to import. SPLICE
tableName The name of the table in which to import playerTeams
Data Location insertColumnList The names, in single quotes, of the columns to import. If this is null, all columns are imported. 'ID, TEAM'
fileOrDirectoryName

Either a single file or a directory. If this is a single file, that file is imported; if this is a directory, all of the files in that directory are imported. You can import compressed or uncompressed files.

The SYSCS_UTIL.MERGE_DATA_FROM_FILE procedure only works with single files; you cannot specify a directory name when calling SYSCS_UTIL.MERGE_DATA_FROM_FILE.

On a cluster, the files to be imported MUST be on S3, HDFS (or MapR-FS). If you're using our Database Service product, files can only be imported from S3.

/data/mydata/mytable.csv

's3a://splice-benchmark-data/flat/TPCH/100/region'

Data Formats oneLineRecords A Boolean value that specifies whether (true) each record in the import file is contained in one input line, or (false) if a record can span multiple lines. true
charset The character encoding of the import file. The default value is UTF-8. null
columnDelimiter The character used to separate columns, Specify null if using the comma (,) character as your delimiter. '|', '\t'
characterDelimiter The character used to delimit strings in the imported data. '"', ''''
timestampFormat

The format of timestamps stored in the file. You can set this to null if there are no time columns in the file, or if the format of any timestamps in the file match the Java.sql.Timestamp default format, which is: "yyyy-MM-dd HH:mm:ss".

All of the timestamps in the file you are importing must use the same format.

'yyyy-MM-dd HH:mm:ss.SSZ'

dateFormat The format of datestamps stored in the file. You can set this to null if there are no date columns in the file, or if the format of any dates in the file match pattern: "yyyy-MM-dd". yyyy-MM-dd
timeFormat The format of time values stored in the file. You can set this to null if there are no time columns in the file, or if the format of any times in the file match pattern: "HH:mm:ss". HH:mm:ss
Problem Logging badRecordsAllowed The number of rejected (bad) records that are tolerated before the import fails. If this count of rejected records is reached, the import fails, and any successful record imports are rolled back. Specify 0 to indicate that no bad records are tolerated, and specify -1 to indicate that all bad records should be logged and allowed. 25
badRecordDirectory

The directory in which bad record information is logged. Splice Machine logs information to the <import_file_name>.bad file in this directory; for example, bad records in an input file named foo.csv would be logged to a file named badRecordDirectory/foo.csv.bad.

On a cluster, this directory MUST be on S3, HDFS (or MapR-FS). If you're using our Database Service product, files can only be imported from S3.

'importErrsDir'
Bulk HFile Import bulkImportDirectory (outputDirectory)

For SYSCS_UTIL.BULK_IMPORT_HFILE, this is the name of the directory into which the generated HFiles are written prior to being imported into your database.

For the SYSCS_UTIL.COMPUTE_SPLIT_KEY procedure, where it is named outputDirectory, this parameter specifies the directory into which the split keys are written.

hdfs:///tmp/test_hfile_import/
skipSampling

The skipSampling parameter is a Boolean value that specifies how you want the split keys used for the bulk HFile import to be computed. Set to false to have SYSCS_UTIL.BULK_IMPORT_HFILE automatically determine splits for you.

This parameter is only used with the SYSCS_UTIL.BULK_IMPORT_HFILE system procedure.

false

Usage

The SYSCS_UTIL.BULK_IMPORT_HFILE procedure needs the data that you’re importing split into multiple HFiles before it actually imports the data into your database. You can achieve these splits in three ways:

  • You can call SYSCS_UTIL.BULK_IMPORT_HFILE with the skipSampling parameter to false. SYSCS_UTIL.BULK_IMPORT_HFILE samples the data to determine the splits, then splits the data into multiple HFiles, and then imports the data.

  • You can split the data into HFiles with the  [SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX] procedure, which both computes the keys and performs the splits. You then call  SYSCS_UTIL.BULK_IMPORT_HFILE with the skipSampling parameter to true to import your data.

  • You can split the data into HFiles by first calling this procedure,  SYSCS_UTIL.COMPUTE_SPLIT_KEY, and then calling the SYSCS_UTIL.SYSCS_SPLIT_TABLE_OR_INDEX_AT_POINTS procedure to split the table or index. You then call  SYSCS_UTIL.BULK_IMPORT_HFILE with the skipSampling parameter to true to import your data.

In all cases, SYSCS_UTIL.BULK_IMPORT_HFILE automatically deletes the HFiles after the import process has completed.

The Bulk HFile Import Examples section of our Importing Data Tutorial describes how these methods differ and provides examples of using them to import data.

Examples

The Importing Data: Bulk HFile Examples topic walks you through several examples of importing data with bulk HFiles.

See Also