Importing Data: Logging and Error Handling
This topic describes the logging and error handling features of Splice Machine data imports.
Each of these import procedures includes a logging facility:
Errors are logged to a file in the directory that you specify in the
badRecordDirectory parameter when you call one of the procedures.
badRecordDirectory parameter is a string that specifies the directory in which bad record information is logged. The default value is the directory in which the import files are found.
Splice Machine logs information to the
<import_file_name>.bad file in this directory; for example, bad records in an input file named
foo.csv would be logged to a file named badRecordDirectory
badRecordDirectory directory must be writable by the hbase user,
either by setting the user explicity, or by opening up the permissions;
sudo -su hdfs hadoop fs -chmod 777 /badRecordDirectory
On a cluster, the
badRecordDirectory directory MUST be on S3, HDFS (or
MapR-FS). If you’re using our Database Service product, this directory must be on S3.
Stopping the Import Due to Too Many Errors
All of the import procedures also take a
maxBadRecords parameter, the value of which determines how many erroneous input data record errors are allowed before the import is stopped. If this count of rejected records is reached, the import fails, and any successful record imports are rolled back.
badRecordsAllowed values have special meaning:
- If you specify
-1, all record import failures are tolerated and logged.
- If you specify
0, the import will fail as soon as one bad record is detected.
Managing Logging When Importing Multiple Files
When you are importing a large amount of data and have divided the files you are importing into groups, then it’s a good idea to change the location of the bad record directory for each group; this will make debugging bad records a lot easier for you.
You can change the value of the
badRecordDirectory to include your
group name; for example, we typically use a strategy like the following:
|Group Files Location||badRecordDirectory Parameter Value|
You’ll then be able to more easily discover where the problem record is located.
- Importing Data: Tutorial Overview
- Importing Data: Input Parameters
- Importing Data: Input Data Handling
- Importing Data: Using Bulk HFile Import
- Importing Data: Usage Examples
- Importing Data: Bulk HFile Examples
- Importing Data: Importing TPCH Data