SYSCS_UTIL.MERGE_DATA_FROM_FILE

The SYSCS_UTIL.MERGE_DATA_FROM_FILE system procedure imports data to update an existing record or create a new record in your database. You can choose to import all or a subset of the columns from the input data into your database using the insertColumnList parameter.

After a successful import completes, a simple report displays, showing how many files were imported, and how many record imports succeeded or failed.

This procedure is one of several built-in system procedures provided by Splice Machine for importing data into your database. See our Importing Data Tutorial for help with selecting the right process for your situation.

Syntax

call SYSCS_UTIL.MERGE_DATA_FROM_FILE (
               schemaName,
               tableName,
               insertColumnList | null,
               fileName,
               columnDelimiter | null,
               characterDelimiter | null,
               timestampFormat | null,
               dateFormat | null,
               timeFormat | null,
               badRecordsAllowed,
               badRecordDirectory | null,
               oneLineRecords | null,
               charset | null
);

Parameters

The following table summarizes the parameters used by SYSCS_UTIL.MERGE_DATA_FROM_FILE. Each parameter name links to a more detailed description in our Importing Data Tutorial.

</tr> </tbody> </table> The `SYSCS_UTIL.MERGE_DATA_FROM_FILE` procedure only imports single files; it does not process directories. {: .noteNote} ## Results `SYSCS_UTIL.MERGE_DATA_FROM_FILE` displays a summary of the import process results that looks like this:
rowsImported   |failedRows   |files   |dataSize   |failedLog
-------------------------------------------------------------
94             |0            |1       |4720       |NONE
This procedure also logs rejected record activity into `.bad` files in the `badRecordDirectory` directory; one file for each imported file. {: .spaceAbove} ## Importing and Updating Records What distinguishes `SYSCS_UTIL.SYSCS_MERGE_DATA_FROM_FILE` from the similar [`SYSCS_UTIL.UPSERT_DATA_FROM_FILE`](sqlref_sysprocs_upsertdata.html) and [`SYSCS_UTIL.IMPORT_DATA`](sqlref_sysprocs_importdata.html) procedures is how each works with these specific conditions: * You are importing only a subset of data from the input data into your table, either because the table contains less columns than does the input file, or because you've specified a subset of the columns in your `insertColumnList` parameter. * Inserting and updating data in a column with generated values. * Inserting and updating data in a column with default values. * Handling of missing values. The [Importing Data Tutorial: Input Handling](tutorials_ingest_importinput.html) topic describes how each of these conditions is handled by the different system procedures. ## Record Import Failure Reasons When upserting data from a file, the input file you generate must contain: * the columns to be changed * all `NON_NULL` columns Typical reasons for a row (record) import to fail include: * Improper data expected for a column. * Improper number of columns of data. * A primary key violation:  [`SYSCS_UTIL.MERGE_DATA_FROM_FILE`](#) will only work correctly if the table into which you are inserting/updating has primary keys. A few important notes: * Splice Machine advises you to run a full compaction (with the [`SYSCS_UTIL.SYSCS_PERFORM_MAJOR_COMPACTION_ON_TABLE`](sqlref_sysprocs_compacttable.html) system procedure) after importing large amounts of data into your database. * On a cluster, the files to be imported **MUST be on S3, HDFS (or MapR-FS)**, as must the `badRecordDirectory` directory. If you're using our Database Service product, files can only be imported from S3. In addition, the files must be readable by the `hbase` user, and the `badRecordDirectory` directory must be writable by the `hbase` user, either by setting the user explicity, or by opening up the permissions; for example:
    sudo -su hdfs hadoop fs -chmod 777 /badRecordDirectory
## Examples {#Examples} This section presents a couple simple examples. The [Importing Data Usage Examples](tutorials_ingest_importexamples1.html) topic contains a more extensive set of examples. ### Example 1: Updating our doc examples player data This example shows the `MERGE_DATA` call used to update the Players in our documentation examples database: {: .body}
splice> CALL SYSCS_UTIL.MERGE_DATA_FROM_FILE('SPLICEBBALL', 'Players',
    'ID, Team, Name, Position, DisplayName, BirthDate',
    '/Data/DocExamplesDb/Players.csv',
    null, null, null, null, null, 0, null, true, null);
rowsImported        |failedRows          |files      |dataSize            |failedLog--------------------------------------------------------------------------------------
94                  |0                   |1          |4720                |NONE
1 row selected
### Example 2: Using single quotes to delimit strings This example uses single quotes instead of double quotes as the character delimiter in the input:
1,This field is one line,Able
2,'This field has two lines
This is the second line of the field',Baker
3,This field is also just one line,Charlie
Note that you must escape single quotes in SQL, which means that you actually define the character delimiter parameter with four single quotes, as follow
SYSCS_UTIL.MERGE_DATA_FROM_FILE('SPLICE','MYTABLE',null,'data.csv','\t','''',null,null,null,0,'BAD', false, null);
See [Importing Data Usage Examples](tutorials_ingest_importexamples1.html) for more examples. ## See Also * [Our Importing Data Tutorial](tutorials_ingest_importoverview.html) * [Importing Data Usage Examples](tutorials_ingest_importexamples1.html) * [`SYSCS_UTIL.IMPORT_DATA`](sqlref_sysprocs_importdata.html) * [`SYSCS_UTIL.UPSERT_DATA_FROM_FILE`](sqlref_sysprocs_upsertdata.html) </div> </section>
Parameter Description Example Value
schemaName The name of the schema of the table into which to import. SPLICE
tableName The name of the table into which to import. playerTeams
insertColumnList The names, in single quotes, of the columns to import. If this is null, all columns are imported. 'ID, TEAM'
fileName

The file is imported; you can import compressed or uncompressed files.

On a cluster, the file to be imported MUST be on S3, HDFS (or MapR-FS). If you're using our Database Service product, the file can only be imported from S3.

/data/mydata/mytable.csv </tr> columnDelimiter The character used to separate columns, Specify null if using the comma (,) character as your delimiter. '|', '\t'
characterDelimiter The character used to delimit strings in the imported data. '"', ''''
timestampFormat

The format of timestamps stored in the file. You can set this to null if there are no time columns in the file, or if the format of any timestamps in the file match the Java.sql.Timestamp default format, which is: "yyyy-MM-dd HH:mm:ss".

All of the timestamps in the file you are importing must use the same format.

'yyyy-MM-dd HH:mm:ss.SSZ'

dateFormat The format of datestamps stored in the file. You can set this to null if there are no date columns in the file, or if the format of any dates in the file match pattern: "yyyy-MM-dd". yyyy-MM-dd
timeFormat The format of time values stored in the file. You can set this to null if there are no time columns in the file, or if the format of any times in the file match pattern: "HH:mm:ss". HH:mm:ss
badRecordsAllowed The number of rejected (bad) records that are tolerated before the import fails. If this count of rejected records is reached, the import fails, and any successful record imports are rolled back. Specify 0 to indicate that no bad records are tolerated, and specify -1 to indicate that all bad records should be logged and allowed. 25
badRecordDirectory

The directory in which bad record information is logged. Splice Machine logs information to the <import_file_name>.bad file in this directory; for example, bad records in an input file named foo.csv would be logged to a file named badRecordDirectory/foo.csv.bad.

On a cluster, this directory MUST be on S3, HDFS (or MapR-FS). If you're using our Database Service product, files can only be imported from S3.

'importErrsDir'
oneLineRecords A Boolean value that specifies whether (true) each record in the import file is contained in one input line, or (false) if a record can span multiple lines. true
charset The character encoding of the import file. The default value is UTF-8. null