Backing Up and Restoring Your Database
ENTERPRISE ONLY: This feature is available only for the Splice Machine Enterprise version of our On-Premise Database product; contact Splice Machine Sales for information.
Splice Machine provides built-in system procedures that make it easy to back up and restore your entire database. You can:
- create full and incremental backups to run immediately
- restore your database from a backup
- validate backups
- manage your backups
- access logs of your backups
The rest of this topic will help you with working with your backups, in these sections:
Backup and Restore Types and Compatibility
To back up and restore your entire database, use these system procedures:
If you only want to back up or restore the tables and indexes belonging to a specific schema, you can use these procedures:
And if you only want to back up or restore a specific table, you can use these procedures:
Note that you can only use specific restore procedures with specific types of backups. For example, you can use the
RESTORE_TABLE procedure to restore from a backup created by
BACKUP_TABLE, but you cannot use
RESTORE_TABLE to restore from a backup created by
The following table summarizes backup-restore compatibility:
|If you backed up with this procedure:||You can restore with these procedures:|
Backup Resource Allocation
Splice Machine backup jobs use a Map Reduce job to copy HFiles; this process may hang up if the resources required for the Map Reduce job are not available from Yarn. See the Troubleshooting Backups section of our Best Practices Guide for specific information about allocation of resources.
About Splice Machine Backups
Splice Machine supports: both full and incremental backups:
- A full backup backs up all of the files/blocks that constitute your database.
- An incremental backup only stores database files/blocks that have changed since a previous backup. To use incremental backups, you must make a few HBase configuration changes and be aware of one significant restriction, as described below, in the Incremental Backup Configuration and Limitations section.
Because backups can consume a lot of disk space, most customers define a backup strategy that blends their needs for security, recover-ability, and space restrictions. Since incremental backups require a lot less space than do full backups, and allow for faster recovery of data, many customers choose to schedule frequent incremental backups.
Splice Machine automatically detects when it is the first run of an incremental backup and performs a one-time full backup; subsequent runs will only back up changed files/blocks.
Backup IDs, Backup Jobs, and Backup Tables
Splice Machine uses backup IDs to identify a specific full or incremental backup that is stored on a file system, and backup job IDs to identify each scheduled backup job. Information about backups and backup jobs is stored in these system tables:
|System Table||Contains Information About|
||Each database backup; you can query this table to find the ID of and details about a backup that was run at a specific time.|
||Each item (table) in a backup.|
Temporary Tables and Backups
There’s a subtle issue with performing a backup when you’re using a temporary table in your session: although the temporary table is (correctly) not backed up, the temporary table’s entry in the system tables will be backed up. When the backup is restored, the table entries will be restored, but the temporary table will be missing.
There’s a simple workaround:
- Exit your current session, which will automatically delete the temporary table and its system table entries.
- Start a new session (reconnect to your database).
- Start your backup job.
Incremental Backup Configuration and Limitations
If you’re going to perform incremental backups, you must follow these steps:
- Add the following options to your
hbase.master.hfilecleaner.plugins = com.splicemachine.hbase.SpliceHFileCleaner, org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner
Run a full backup.
- Run your first incremental backup.
Using the Backup Operations
This section summarizes and provides examples of using the Splice Machine backup operations:
- Running an Immediate Backup
- Restoring Your Database From a Previous Backup
- Validating Backups
- Reviewing Backup Information
- Canceling a Backup That’s In Progress
- Deleting a Backup
- Deleting Outdated Backups
You must make sure that the directory to which you are backing up or from which data is being restored is accessible to the HBase user who is initiating the restore. Make sure the directory permissions are set correctly on the backup directory.
Note that you can store your backups in a cloud-based storage service such as AWS; for more information, see the Backing Up to Cloud Storage section below.
Running an Immediate Backup
system procedure to immediately run a full or incremental backup.
SYSCS_UTIL.SYSCS_BACKUP_DATABASE( backupDir, backupType );
VARCHAR value that specifies the path to the directory in which you
want the backup stored.
Note that this directory can be cloud-based, as described in the Backing Up to Cloud Services section below.
VARCHAR(30) value that specifies the type of backup that you want
performed; use.one of the following values:
Example 1: Execute a full backup now
To execute a backup right now:
call SYSCS_UTIL.SYSCS_BACKUP_DATABASE('/home/backup', 'full');
Example 2: Execute an incremental backup now:
This call will run an incremental backup right now. Splice Machine
SYSBACKUP system table
to determine if there already is a backup for the system; if not, Splice
Machine will perform a full backup, and subsequent backups will be
incremental. The backup data is stored in the specified directory.
call SYSCS_UTIL.SYSCS_BACKUP_DATABASE('/home/backup', 'incremental');
Restoring Your Database From a Previous Backup
To restore your database from a previous backup, use the
SYSCS_UTIL.SYSCS_RESTORE_DATABASE(backupDir, backupId, validate);
VARCHAR value that specifies the path to the directory in which the
backup is stored.
BIGINT value that specifies which backup you want to use to restore
BOOLEAN value that specifies whether you want the backup validated before restoring the database from it. If this is
true and the validation finds inconsistencies, the database is not restored. If this is
false, validation is not performed.
There are several important things to know about restoring your database from a previous backup:
- Restoring a database wipes out your database and replaces it with what had been previously backed up.
- You cannot use your cluster while restoring your database.
- You must reboot your database after the restore is complete by first Starting Your Database.
Example: Restore the database from a local, full backup
This example restores your database from the backup stored in the
/home/backup directory that has
backupId=1273, after first validating the backup:
call SYSCS_UTIL.SYSCS_RESTORE_DATABASE('/home/backup', 1273, true);
You can validate a database backup with the
VARCHAR value that specifies the path to the directory in which the
BIGINT value that specifies which backup you want to use to validate
Example: Validating a backup
This example validates the backup stored in the
/home/backup directory that has
call SYSCS_UTIL.VALIDATE_BACKUP('/home/backup', 1273);
Splice Machine stores information about your backups and scheduled backup jobs in system tables that you can query, and stores a backup log file in the directory to which a backup is written when it runs.
Information about each backup of your database is stored in the
SYSBACKUP system table, including
the ID assigned to the backup and its location. You can query this table
to find the ID of a specific backup, which you need if you want to
restore your database from it, or to delete it:
splice> select * from SYS.SYSBACKUP; BACKUP_ID |BEGIN_TIMESTAMP |END_TIMESTAMP |STATUS |FILESYSTEM |SCOPE |INCR&|INCREMENTAL_PARENT_&|BACKUP_ITEM ------------------------------------------------------------------------------------------------------------------------------------------ 22275 |2015-04-03 18:40:56.877 |2015-04-03 18:43:42.631 |S |/data/backup/0101 |D |false|-1 |15 21428 |2015-04-03 18:30:55.964 |2015-04-03 18:33:49.494 |S |/data/backup/0101 |D |false|-1 |15 20793 |2015-04-03 18:23:53.574 |2015-04-03 18:27:07.07 |S |/data/backup/0101 |D |false|-1 |87
Backup Log Files
When you run a backup, a log file is created or updated in the directory
in which the backup is stored. This log file is named
backupStatus.log, and is stored in plain text, human-readable format.
Here’s a sample snippet from a log file:
Expected time for backup ~12 hours, expected finish at 15:30 on April 8, 2015 5 objects of 833 objects backed up.. 6 objects of 833 objects backed up Finished with Success. Total time taken for backup was 11 hours 32 minutes.
Canceling a Backup That’s In Progress
You can call the
procedure to cancel a backup that is currently running:
Example: Cancel a running backup
This example cancels the Splice Machine backup job that is currently running.
Deleting a Backup
procedure to delete a single backup:
SYSCS_UTIL.SYSCS_DELETE_BACKUP( backupId );
BIGINT value that specifies which backup you want to delete.
You can find the backupId you want to delete by querying the
SYSBACKUP system table,
Example: Delete a backup
This example deletes the backup that has
Deleting Outdated Backups
procedure to delete all backups that are older than a certain number of
SYSCS_UTIL.SYSCS_DELETE_OLD_BACKUPS( backupWindow );
INT value that specifies the number of days of backups that you
want retained. Any backups created more than
backupWindow days ago are
Example: Delete all backups more than a week old
This example deletes all backups that are more than a week old:
Backing Up to Cloud Storage - AWS
You can specify cloud-based directories as destinations for your backups. This section describes how to set up credentials to allow Splice Machine to create and manage backups on AWS.
You need to enable backups by storing your AWS Access Key ID and Secret Access Key values in your cluster’s HDFS core-site.xml file: how you set up your credentials depends on the Hadoop platform you are using; see the section below for your platform:
IMPORTANT: You must have access to the S3 bucket to which you are backing up your database. The instructions below give general guidelines; however, S3 access differs in every deployment. For more information, see these sites:
Enabling Splice Machine Backups on CDH
You can use Cloudera Manager to configure properties to enable Splice Machine backups; follow these steps:
- Navigate to the Cloudera Manager home screen.
Stop both HBase and HDFS:
- Click the HBase Actions drop-down
arrow associated with (to the right of)
HBasein the cluster summary section of the home screen, and then click Stop to stop HBase.
- Click the HDFS Actions drop-down arrow associated with (to the right of) and then click Stop to stop HDFS.
- Click the HBase Actions drop-down arrow associated with (to the right of)
- Click HDFS in the Cloudera Manager
home screen, then click the Configuration tab, and in category, click
Advanced. Then set these property
values in the
Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xmlfield:
fs.s3.awsAccessKeyId = <Your AWS Access Key> fs.s3.awsSecretAccessKey = <Your AWS Access Secret Key>
Restart both services:
- Click the HDFS Actions drop-down arrow associated with (to the right of) HDFS in the cluster summary section of the Cloudera Manager home screen, and then click Start to restart HDFS.
- Navigate to the HBase Status tab in Cloudera Manager. Then, using the Actions drop-down in the upper-right corner, click Start to create a start HBase.
Enabling Splice Machine Backups on HDP
You can use the Ambari dashboard to configure these properties. Follow these steps:
- Navigate to the HDFS Configs screen.
Select the Services tab at the top of the Ambari dashboard screen, then stop both HBase and HDFS:
- Click HBase in the left pane of the screen, then click Service Actions->Stop in the upper-right portion of the Ambari screen.
- Click HDFS in the left pane of the screen, the click Service Actions->Stop.
- Select Custom core-site and add
fs.s3.awsAccessKeyId = <Your AWS Access Key> fs.s3.awsSecretAccessKey = <Your AWS Secret Access Key>
Restart both services:
- Click HDFS in the left pane of the screen, the click Service Actions->Restart All.
- Click HBase in the left pane of the screen, the click Service Actions->Restart All.
Enabling Splice Machine Backups on MapR
To enable Amazon S3 access on a MapR cluster, you must stop services, change the configuration files on each node, and then restart services. Follow these steps:
Stop all MapR services by stopping the warden service on each host:
sudo service mapr-warden stop
You need to edit two files on each MapR-FS fileserver and HBase RegionServer in your cluster to allow hosts access to Amazon S3. You need to provide the fs.s3 access key ID and secret in each of these files:
/opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/core-site.xmlfor Hadoop/MapReduce/YARN 2.x site configuration
/opt/mapr/hadoop/hadoop-0.x.x/conf/core-site.xmlfor Hadoop/MapReduce 0.x/1.x site configuration
If both MapReduce v1 and YARN/MapReduce 2 are installed on the MapR compute hosts, the newer hadoop-2.x.x version of the file will be canonical, and the older hadoop-0.x.x file symbolically linked to it. You can check this using the following
$ ls -al /opt/mapr/hadoop/hadoop-0*/conf/core-site.xml /opt/mapr/hadoop/hadoop-2*/etc/hadoop/core-site.xml lrwxrwxrwx 1 mapr root 54 Apr 24 11:01 /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml -> /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/core-site.xml -rw-r--r-- 1 mapr root 775 Apr 24 12:50 /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/core-site.xml
$ file /opt/mapr/hadoop/hadoop-0*/conf/core-site.xml /opt/mapr/hadoop/hadoop-2*/etc/hadoop/core-site.xml /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xml: symbolic link to `/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/core-site.xml' /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/core-site.xml: XML document text
Add your access key ID and secret key in each file by adding the following properties between the
<!-- AWS s3://bucket/... block-based access --> <property> <name>fs.s3.awsAccessKeyId</name> <value>_AWS_ACCESS_KEY_ID_</value> </property> <property> <name>fs.s3.awsSecretAccessKey</name> <value>_AWS_SECRET_ACCESS_KEY_</value> </property> <!-- AWS s3n://bucket/... filesystem-like access --> <property> <name>fs.s3n.awsAccessKeyId</name> <value>_AWS_ACCESS_KEY_ID_</value> </property> <property> <name>fs.s3n.awsSecretAccessKey</name> <value>_AWS_SECRET_ACCESS_KEY_</value> </property>
hadoopcommand to view your configuration changes:
$ hadoop conf | grep fs\\.s3 | grep -i access | sort -u <property><name>fs.s3.awsAccessKeyId</name><value>_AWS_ACCESS_KEY_ID_</value><source>core-site.xml</source></property> <property><name>fs.s3.awsSecretAccessKey</name><value>_AWS_SECRET_ACCESS_KEY_</value><source>core-site.xml</source></property> <property><name>fs.s3n.awsAccessKeyId</name><value>_AWS_ACCESS_KEY_ID_</value><source>core-site.xml</source></property> <property><name>fs.s3n.awsSecretAccessKey</name><value>_AWS_SECRET_ACCESS_KEY_</value><source>core-site.xml</source></property>
You can also verify that access is correctly configured with the
hadoopcommand to list the contents of an existing bucket. For example:
sudo -iu mapr hadoop fs -ls s3n://yourbucketname/
Finally, restart MapR services on each node via MapR’s warden::
sudo service mapr-warden start