SYSCS_UTIL.IMPORT_DATA system procedure imports data to a new record in a table. You can choose to import all or a subset of the columns from the input data into your database using the
After a successful import completes, a simple report displays, showing how many files were imported, and how many record imports succeeded or failed.
This procedure is one of several built-in system procedures provided by Splice Machine for importing data into your database. See our Best Practices: Ingestion for help with selecting the right process for your situation.
call SYSCS_UTIL.IMPORT_DATA ( schemaName, tableName, insertColumnList | null, fileOrDirectoryName, columnDelimiter | null, characterDelimiter | null, timestampFormat | null, dateFormat | null, timeFormat | null, badRecordsAllowed, badRecordDirectory | null, oneLineRecords | null, charset | null );
This table includes a brief description of each parameter; additional information is available in the Ingestion Parameter Values topic of our Importing Data tutorial.
|schemaName||The name of the schema of the table into which to import.||SPLICE|
|tableName||The name of the table into which to import.||playerTeams|
The names, in single quotes, of the columns to import. If this is
The individual column names in the
Either a single file or a directory. If this is a single file, that file is imported; if this is a directory, all of the files in that directory are imported. You can import compressed or uncompressed files.
On a cluster, the files to be imported
See the Configuring an S3 Bucket for Splice Machine Access topic for information about accessing data on S3.
|columnDelimiter||The character used to separate columns, Specify
|characterDelimiter||The character used to delimit strings in the imported data.||'"', ''''|
The format of timestamps stored in the file. You can set this to
All of the timestamps in the file you are importing must use the same format.
|dateFormat||The format of datestamps stored in the file. You can set this to
|timeFormat||The format of time values stored in the file. You can set this to null if there are no time columns in the file, or if the format of any times in the file match pattern: "HH:mm:ss".||HH:mm:ss|
|badRecordsAllowed||The number of rejected (bad) records that are tolerated before the import fails. If this count of rejected records is reached, the import fails, and any successful record imports are rolled back. Specify 0 to indicate that no bad records are tolerated, and specify -1 to indicate that all bad records should be logged and allowed.||25|
The directory in which bad record information is logged. Splice Machine logs information to the
On a cluster, this directory MUST be on S3, HDFS (or MapR-FS). If you're using our Database Service product, files can only be imported from S3.
|oneLineRecords||A Boolean value that specifies whether (
|charset||The character encoding of the import file. The default value is UTF-8.||null|
SYSCS_UTIL.IMPORT_DATA displays a summary of the import process
results that looks like this:
rowsImported |failedRows |files |dataSize |failedLog ------------------------------------------------------------- 94 |0 |1 |4720 |NONE
This procedure also logs rejected record activity into
.bad files in
badRecordDirectory directory; one file for each imported file.
Importing and Updating Records
SYSCS_UTIL.SYSCS_MERGE_DATA_FROM_FILE imports new records into your database in the same way as does the
SYSCS_UTIL.SYSCS_MERGE_DATA_FROM_FILE can also update existing records in your database; for this to work, the table you’re importing into must have a primary key. Because this procedure has to determine if a record already exists and how to update it,
MERGE_DATA is slightly slower than using
IMPORT_DATA; if you know that you’re ingesting all new records, you’ll get better performance with
Record Import Failure Reasons
Typical reasons for a row (record) import to fail include:
- Improper data expected for a column.
- Improper number of columns of data.
- A primary key violation:
SYSCS_UTIL.IMPORT_DATAwill only work correctly if the table into which you are inserting/updating has primary keys.
A few important notes:
Splice Machine advises you to run a full compaction (with the
SYSCS_UTIL.SYSCS_PERFORM_MAJOR_COMPACTION_ON_TABLEsystem procedure) after importing large amounts of data into your database.
On a cluster, the files to be imported MUST be on S3, HDFS (or MapR-FS), as must the
badRecordDirectorydirectory. If you’re using our Database Service product, files can only be imported from S3.
In addition, the files must be readable by the
hbaseuser, and the
badRecordDirectorydirectory must be writable by the
hbaseuser, either by setting the user explicity, or by opening up the permissions; for example:
sudo -su hdfs hadoop fs -chmod 777 /badRecordDirectory
This section presents a couple simple examples.
The Importing Flat Files topic contains a more extensive set of examples.
Example 1: Importing our doc examples player data
This example shows the
IMPORT_DATA call used to import the Players
table into our documentation examples database:
splice> CALL SYSCS_UTIL.IMPORT_DATA('SPLICEBBALL', 'Players', 'ID, Team, Name, Position, DisplayName, BirthDate', '/Data/DocExamplesDb/Players.csv', null, null, null, null, null, 0, null, true, null); rowsImported |failedRows |files |dataSize |failedLog-------------------------------------------------------------------------------------- 94 |0 |1 |4720 |NONE 1 row selected
Example 2: Specifying a timestamp format for an entire table
Use a single timestamp format for the entire table by explicitly
specifying a single
Mike,2013-04-21 09:21:24.98-05 Mike,2013-04-21 09:15:32.78-04 Mike,2013-03-23 09:45:00.68-05
You can then import the data with the following call:
splice> CALL SYSCS_UTIL.IMPORT_DATA('app','tabx','c1,c2', '/path/to/ts3.csv', ',', '''', 'yyyy-MM-dd HH:mm:ss.SSZ', null, null, 0, null, true, null);
Note that for any import use case shown above, the time shown in the imported table depends on the timezone setting in the server timestamp. In other words, given the same csv file, if imported on different servers with timestamps set to different time zones, the value in the table shown will be different. Additionally, daylight savings time may account for a 1-hour difference if timezone is specified.
See Importing Flat Files for more examples.