Using the Native Spark DataSource in Zeppelin Notebooks

This topic shows you how to use the Native Spark DataSource in an Apache Zeppelin notebook. We use the %spark and %splicemachine Zeppelin interpreters to create a simple Splice Machine database table, and then access and modify that table, in these steps:

We have posted a blog article on our website walks that you through this Zeppelin notebook example in greater detail.

1. Set Up the Native Spark DataSource

Before you can use the Native Spark DataSource, you need to create a SplicemachineContext object, specifying the URL to use to connect to your database. For example:

%spark
import com.splicemachine.spark.splicemachine._
import com.splicemachine.derby.utils._

val JDBC_URL = "jdbc:splice://XXXX:1527/splicedb;user=YourUserId;password=YourPassword"
val SpliceContext = new SplicemachineContext(JDBC_URL)</pre>

The Native Spark DataSource has a few special (optional) requirements related to database permissions, which you can configure in your JDBC connection URL; for information, please see the Accessing Database Objects section in the Using the Native Spark DataSource topic in this chapter.

2. Create and Populate a Table in Your Database

Now let’s create a simple table in our Splice Machine database:

%splicemachine
drop table if exists carsTbl;
create table carsTbl ( number int primary key, make varchar(20), model varchar(20) );

3. Use a DataFrame to Populate the Table

Next we’ll create and populate a Spark DataFrame with some data:

%spark
val carsDF = Seq(
 (1, "Toyota", "Camry"),
 (2, "Honda", "Accord"),
 (3, "Subaru", "Impreza"),
 (4, "Chevy", "Volt")
).toDF("NUMBER", "MAKE", "MODEL")

Then we use the Splice Machine Native Spark DataSource to insert that data into our database table:

%spark
SpliceContext.insert(carsDF, "SPLICE.CARSTBL")

4. Perform Table Operations

Now we can use the Native Spark DataSource to directly interact with our database table using Spark, as shown in the following basic table operations examples:

This section provides simple examples of using the Native Spark DataSource to perform several simple database operations; for a complete list of operations available from the DataSource, see the Native Spark DataSource API topic in this chapter.

Select Data from the Table

You can use Spark with the Adapter to select data from your table just as you would with the splice> command line interface:

%spark
SpliceContext.df("SELECT * FROM SPLICE.CARSTBL").show()

Update Data in the Table

You can use Spark with the Adapter to update data in your table just as you would with the splice> command line interface:

%spark
val updateCarsDF = Seq(
   (1, "Toyota", "Rav 4 XLE"),
   (4, "Honda", "Accord Hybrid")
).toDF("NUMBER", "MAKE", "MODEL")
SpliceContext.update(updateCarsDF, "SPLICE.CARSTBL")

Delete Data From the Table

You can also use Spark with the Adapter to delete data from your table just as you would with the splice> command line interface:

%spark
val deleteCarsDF = Seq(
   (1, "Toyota", "Rav 4 XLE"),
   (4, "Honda", "Accord Hybrid")
).toDF("NUMBER", "MAKE", "MODEL")
SpliceContext.delete(deleteCarsDF, "SPLICE.CARSTBL")

Drop the Table

And you can use Spark with the Adapter to drop your table just as you would with the splice> command line interface:

%spark
if (SpliceContext.tableExists("SPLICE.CARSTBL")) {
   SpliceContext.dropTable("SPLICE.CARSTBL") }

See Also