Splice Machine Database Console Guide

This topic introduces the Splice Machine Database Console, a browser-based tool that you can use to monitor database queries on your cluster in real time. The Console UI allows you to see the Spark queries that are currently running in Splice Machine on your cluster, and to then drill down into each job to see the current progress of the queries, and to identify any potential bottlenecks. If you see something amiss, you can also terminate a query.

The Splice Machine Database Console leverages the Spark cluster manager Web UI, which is described here: http://spark.apache.org/docs/latest/monitoring.html.

This section is organized into the following topics:

About the Splice Machine Database Console

The Splice Machine Spark Database Console is a browser-based tool that you can use to watch your active Spark queries execute and to review the execution of completed queries. You can use the console to:

  • View any completed jobs
  • Monitor active jobs as they execute
  • View a timeline chart of the events in a job and its stages
  • View a Directed Acyclic Graph (DAG) visualization of a job’s stages and the tasks within each stage
  • Monitor persisted and cached storage in realtime

How you access the Splice Machine Database Console depends on which Splice Machine product you’re using:

Product DB Console Access
Database-as-Service
  • To monitor the Splice Machine jobs running on your cluster, click the DB Console button at the top right of your Management screen or click the DB Console link in the cluster created email that you received from Splice Machine.
  • To monitor any non-Splice Machine Spark jobs that are running on your cluster, you need to use a different Spark console, which you can access by clicking the External Spark Console link that is displayed in the bottom left corner of your cluster's dashboard page.
On-Premise Database
http://localhost:4040

The Database Console URL will only be active after you’ve run at least one query on our Spark engine; prior to using the Spark engine, your browser will report an error such as Connection Refused.

Here are some of the terms you’ll encounter while using the Database Console:

Term Description
Accumulators Accumulators are variables programmers can declare in Spark applications that can be efficiently supported in parallel operations, and are typically used to implement counters and sums.
Additional Metrics You can indicate that you want to display additional metrics for a stage or job by clicking the Show Additional Metrics arrow and then selecting which metrics you want shown.
DAG Visualization A visual depiction of the execution Directed Acyclic Graph (DAG) for a job or job stage, which shows the details and flow of data. You can click the DAG Visualization arrow to switch to this view.
Enable Zooming For event timeline views, you can enable zooming to expand the view detail for a portion of the timeline. You can click the Event Timeline arrow to switch to this view.
Event Timeline A view that graphically displays the sequence of all jobs, a specific job, or a stage within a job.
Executor A process that runs tasks on a cluster node.
GC Time The amount of time spent performing garbage collection in a stage.
Job

The basic unit of execution in the Spark engine, consisting of a set of stages. With some exceptions, each query submitted to the Spark engine is a single job.

Each job is assigned a unique Job Id and is part of a unique Job Group.

Locality Level To minimize data transfers, Spark tries to execute as close to the data as possible. The Locality Level value indicates whether a task was able to run on the local node.
Scheduling Mode

The scheduling mode used for a job.

In FIFO scheduling, the first job gets priority on all available resources while its stages have tasks to launch. Then the second job gets priority, and so on.

In FAIR scheduling, Spark assigns tasks between jobs in a round robin manner, meaning that all jobs get a roughly equal share of the available cluster resources. Which means that short jobs can gain fair access to resources immediately without having to wait for longer jobs to complete.

Scheduling Pool The FAIR schedule groups jobs into pools, each of which can have a different priority weighting value, which allows you to submit jobs with higher or lower priorities.
ScrollInsensitive row A row in a result set that is scrollable, and is not sensitive to changes committed by other transactions or by other statements in the same transaction.
Shuffling

Shuffling is the reallocation of data between multiple stages in a Spark job.

Shuffle Write is amount of data that is serialized and written at the end of a stage for transmission to the next stage. Shuffle Read is the amount of serialized data that is read at the beginning of a stage.

Stage

The Splice Machine Spark scheduler splits the execution of a job into stages, based on the RDD transformations required to complete the job.

Each stage contains a group of tasks that perform a computation in parallel.

Task A computational command sent from the application driver to an executor as part of a stage.

See Also