Importing Data Into Your Splice Machine Database

This tutorial guides you through importing (loading) data into your Splice Machine database. It contains these topics:

Tutorial Topic Description
1: Tutorial Overview This topic. Introduces the import options that are available to you and helps you determine which option best meets your needs.
2: Parameter Usage Provides detailed specifications of the parameter values you must supply to the import procedures.
3: Input Data Handling Provides detailed information and tips about input data handling during ingestion.
4: Error Handling Helps you to understand and use logging to discover and repair any input data problems that occur during an ingestion process.
5: Usage Examples Walks you through examples of importing data with the   SYSCS_UTIL.IMPORT_DATA, SYSCS_UTIL.UPSERT_DATA_FROM_FILE, and SYSCS_UTIL.MERGE_DATA_FROM_FILE system procedures.
6: Bulk HFile Examples Walks you through examples of using the SYSCS_UTIL.BULK_IMPORT_HFILE system procedure.
7: Importing TPCH Data Walks you through importing TPCH sample data into your database.

Overview of Importing Data Into Your Database

The remainder of this topic introduces you to importing (loading) data into your Splice Machine database. It summarizes the different import procedures we provide, presents a quick look at the procedure declarations, and helps you to decide which one matches your conditions. It contains the following sections:

Data Import Procedures

Splice Machine includes four different procedures for importing data into your database, three of which use identical syntax; the fourth provides a more behind-the-scenes method that is quicker when loading large data sets, but requires more work and care on your part. The table below summarizes these import procedures:

System Procedure Description
SYSCS_UTIL.IMPORT_DATA Imports data into your database, creating a new record in your table for each record in the imported data. SYSCS_UTIL.IMPORT_DATA inserts the default value of each column that is not specified in the input.
SYSCS_UTIL.UPSERT_DATA_FROM_FILE Imports data into your database, creating new records and *updating existing records* in the table. Identical to SYSCS_UTIL.IMPORT_DATA except that will update matching records. SYSCS_UTIL.UPSERT_DATA_FROM_FILE also inserts or updates the value in the table of each column that is not specified in the input; inserting the default value (or NULL if there is no default) for that column.
SYSCS_UTIL.MERGE_DATA_FROM_FILE Imports data into your database, creating new records and *updating existing records* in the table. Identical to SYSCS_UTIL.UPSERT_DATA_FROM_FILE except that it does not replace values in the table for unspecified columns when updating an existing record in the table.
SYSCS_UTIL.BULK_IMPORT_HFILE Takes advantage of HBase bulk loading to import table data into your database by temporarily converting the table file that you’re importing into HFiles, importing those directly into your database, and then removing the temporary HFiles. This procedure uses syntax very similar to the other import procedures and has improved performance for large tables; however, the bulk HFile import requires extra work on your part and lacks constraint checking.

Which Procedure Should I Use to Import My Data?

The following diagram helps you decide which of our data importation procedures best fits your needs:

Notes

  • The IMPORT_DATA procedure imports new records into a database. The UPSERT_DATA_FROM_FILE and MERGE_DATA_FROM_FILE procedures import new records and update existing records. Importing all new records is faster because the database doesn’t need to check if the record already exists in the database.
  • If your table contains auto-generated column values and you don’t want those values overwritten when a record gets updated, use the MERGE_DATA_FROM_FILE procedure (UPSERT_DATA_FROM_FILE will overwrite).
  • The BULK_IMPORT_HFILE procedure is great when you’re importing a very large dataset and need extra performance. However, it does not perform constraint checking.

Import Procedures Syntax

Here are the declarations of our four data import procedures; as you can see, three of our four import procedures use identical parameters, and the fourth (SYSCS_UTIL.BULK_IMPORT_HFILE) adds a couple extra parameters at the end:

call SYSCS_UTIL.IMPORT_DATA (
        schemaName,
        tableName,
        insertColumnList | null,
        fileOrDirectoryName,
        columnDelimiter | null,
        characterDelimiter | null,
        timestampFormat | null,
        dateFormat | null,
        timeFormat | null,
        badRecordsAllowed,
        badRecordDirectory | null,
        oneLineRecords | null,
        charset | null
        );
call SYSCS_UTIL.UPSERT_DATA_FROM_FILE (
       schemaName,
       tableName,
       insertColumnList | null,
       fileOrDirectoryName,
       columnDelimiter | null,
       characterDelimiter | null,
       timestampFormat | null,
       dateFormat | null,
       timeFormat | null,
       badRecordsAllowed,
       badRecordDirectory | null,
       oneLineRecords | null,
       charset | null
);
call SYSCS_UTIL.MERGE_DATA_FROM_FILE (
       schemaName,
       tableName,
       insertColumnList | null,
       fileOrDirectoryName,
       columnDelimiter | null,
       characterDelimiter | null,
       timestampFormat | null,
       dateFormat | null,
       timeFormat | null,
       badRecordsAllowed,
       badRecordDirectory | null,
       oneLineRecords | null,
       charset | null
);
call SYSCS_UTIL.BULK_IMPORT_HFILE (
    schemaName,
    tableName,
    insertColumnList | null,
    fileName,
    columnDelimiter | null,
    characterDelimiter | null,
    timestampFormat | null,
    dateFormat | null,
    timeFormat | null,
    maxBadRecords,
    badRecordDirectory | null,
    oneLineRecords | null,
    charset | null,
    bulkImportDirectory,
    skipSampling
);

You’ll find descriptions and detailed reference information for all of these parameters in the Import Parameters topic of this tutorial.

And you’ll find detailed reference descriptions of all four procedures in our SQL Reference Manual.

See Also