CREATE EXTERNAL TABLE

A CREATE EXTERNAL TABLE statement creates a table in Splice Machine that you can use to query data that is stored externally in a flat file, such as a file in Parquet, ORC, or plain text format. External tables are largely used as a convenient means of moving data into and out of your database.

You can query external tables just as you would a regular Splice Machine table; however, you cannot perform any DML operations on an external table, once it has been created. That also means that you cannot create an index on an external table.

If the schema of the external file that you are querying is modified outside of Splice, you need to manually refresh the Splice Machine table by calling the REFRESH EXTERNAL TABLE built-in system procedure.

If a qualified table name is specified, the schema name cannot begin with SYS.

Syntax

CREATE EXTERNAL TABLE table-Name
  {
    ( column-definition* )
    [ COMPRESSED WITH compression-format ]
    [ PARTITIONED BY (column-name ) ]}
    [ ROW FORMAT DELIMITED 
         [ FIELDS TERMINATED BY char [ESCAPED BY char] ]
         [ LINES TERMINATED BY char ]
    ]
    STORED AS file-format LOCATION location
    [ MERGE SCHEMA ]
  }

table-Name

The name to assign to the new table.

compression-format

The compression algorithm used to compress the flat file source of this external table. You can specify one of the following values:

  • ZLIB
  • SNAPPY

If you don’t specify a compression format, the default is uncompressed. You cannot specify a compression-format when using the TEXTFILE file-format; doing so generates an error.

column-definition

A column definition.

The maximum number of columns allowed in a table is 100000.

column-name

The name of a column.

char

A single character used as a delimiter or escape character. Enclose this character in single quotes; for example, ‘,’.

To specify a special character that includes the backslash character, you must escape the backslash character itself. For example:

  • \\ to indicate a backslash character
  • \n to indicate a newline character
  • \t to indicate a tab character

file-format

The format of the flat file source of this external table. This is currently one of these values:

  • ORC is a columnar storage format
  • PARQUET is a columnar storage format
  • Avro is a data serialization system
  • TEXTFILE is a plain text file

location

The location at which the file is stored.

MERGE SCHEMA

This optional setting tells Splice Machine to infer the table schema from all data files, rather than the default behavior of inferring the schema from one data file; it is useful when the external data schema evolves. Using this setting can be very expensive performance-wise.

The MERGE SCHEMA option only works with PARQUET data files because Spark does not support this feature for ORC or AVRO data files; this means that Splice Machine can only handle schema evolution for external tables on existing PARQUET data.

Usage Notes

Here are some notes about using external tables:

  • If the data types in the table schema you specify do not match the schema of the external file, an error occurs and the table is not created.
  • You cannot define indexes or constraints on external tables
  • The ROW FORMAT parameter is only applicable to plain text (TextFile) not supported for columnar storage format files (ORC or PARQUET files)

  • If you specify the location of a non-existent file when you create an external table, Splice Machine automatically creates an external file at that location.
  • AVRO external tables do not currently work with compressed files; any compression format you specify will be ignored.
  • Splice Machine isn’t able to know when the schema of the file represented by an external table is updated; when this occurs, you need to update the external table in Splice Machine by calling the  SYSCS_UTIL.SYSCS_REFRESH_EXTERNAL_TABLE built-in system procedure.
  • You cannot specify a compression-format when using the TEXTFILE file-format; doing so generates an error.

Examples

This section presents examples of the CREATE EXTERNAL TABLE statement.

This example creates an external table for the PARQUET data files in the users/myName/myParquetData directory, inferring the table schema from one data file in that directory:

splice> CREATE EXTERNAL TABLE myParquetTable
                    (col1 INT, col2 VARCHAR(24))
                    PARTITIONED BY (col1)
                    STORED AS PARQUET
                    LOCATION '/users/myName/myParquetData';
0 rows inserted/updated/deleted

This example creates an external table for a PARQUET file:

This example creates an external table for the PARQUET data files in the users/myName/myParquetData directory, inferring the table schema from all of the data files in that directory:

splice> CREATE EXTERNAL TABLE myParquetTable
                    (col1 INT, col2 VARCHAR(24))
                    PARTITIONED BY (col1)
                    STORED AS PARQUET
                    LOCATION '/users/myName/myParquetData'
                    MERGE SCHEMA;
0 rows inserted/updated/deleted

This example creates an external table for an AVRO file:

splice> CREATE EXTERNAL TABLE myAvroTable
                    (col1 INT, col2 VARCHAR(24))
                    PARTITIONED BY (col1)
                    STORED AS AVRO
                    LOCATION '/users/myName/myAvroFile';
0 rows inserted/updated/deleted

This example creates an external table for an ORC file stored in an AWS S3 bucket and inserts data into it:

splice> CREATE EXTERNAL TABLE myOrcTable
                    (col1 INT, col2 VARCHAR(24))
                    PARTITIONED BY (col1)
                    STORED AS ORC
                    LOCATION 's3a://myOrcData/myName/myOrcFile';
0 rows inserted/updated/deleted
splice> INSERT INTO myOrcTable VALUES (1, 'One'), (2, 'Two'), (3, 'Three');
3 rows inserted/updated/deleted
splice> SELECT * FROM myOrcTable;
COL1        |COL2
------------------------------------
3           |Three
2           |Two
1           |One

This example creates an external table for a plain text file stored on HDFS:

splice> CREATE EXTERNAL TABLE myTextTable
                    (col1 INT, col2 VARCHAR(24))
                    PARTITIONED BY (col1)
                    ROW FORMAT DELIMITED FIELDS
                    TERMINATED BY ','
                    ESCAPED BY '\\' 
                    LINES TERMINATED BY '\\n'
                    STORED AS TEXTFILE
                    LOCATION 'hdfs:///tmp/myTables/myTextFile';
0 rows inserted/updated/deleted

This example creates an external table for a PARQUET file that was compressed with Snappy compression:

splice> CREATE EXTERNAL TABLE mySnappyParquetTable
                    (col1 INT, col2 VARCHAR(24))
                    COMPRESSED WITH SNAPPY
                    PARTITIONED BY (col1)
                    STORED AS PARQUET
                    LOCATION '/users/myName/mySnappyParquetFile';
0 rows inserted/updated/deleted

See Also

</section>