Window and Aggregate Functions

This section contains the Aggregate and Window (analytic) functions built into Splice Machine SQL.

  • Aggregate functions, which are sometimes referred to as set functions, provide a means of evaluating an expression over a set of rows. Each aggregate function outputs one value for the set of rows on which it operates. All aggregate functions can also be used as window functions.
  • Window functions, which are sometimes referred to as analytic functions, perform calculations across a set of table rows that are related to the current row. Each window function outputs one value for each row on which it operates. Some of the window functions cannot be used as aggregate functions.

    A subset of the window functions are sometimes referred to as ranking functions, as noted below.

Function Name Window or Aggregate Description Permitted Data Types
AVG Both Returns the average computed over a subset (partition) of a table. The numeric built-in data types.
COUNT Both Returns the number of rows in a partition. All types.
DENSE_RANK Window A ranking function that returns the ranking of a row within a partition. The numeric built-in data types.
FIRST_VALUE Window Returns the first value within a partition.. All types.
LAG Window Returns the value of an expression evaluated at a specified offset number of rows before the current row in a partition. All types.
LAST_VALUE Window Returns the last value within a partition.. All types.
LEAD Window Returns the value of an expression evaluated at a specified offset number of rows after the current row in a partition. All types.
MAX Both Returns the maximum value computed over a partition. The numeric built-in data types.
MIN Both Returns the minimum value computed over a partition. The numeric built-in data types.
RANK Window A ranking function that returns the ranking of a row within a subset of a table. The numeric built-in data types.
ROW_NUMBER Window

A ranking function that returns the row number of a row within a partition.

The numeric built-in data types.
STDDEV_POP Both Returns the sum of a value calculated over a partition. The numeric built-in data types.
STDDEV_SAMP Both Returns the sum of a value calculated over a partition. The numeric built-in data types.
SUM Both Returns the sum of a value calculated over a partition. The numeric built-in data types.

The Numeric Built-in Data Types

The numeric built-in data types are: SMALLINT.

Using Aggregate Functions

Aggregate functions (also described as set functions in ANSI SQL and as column functions in some database literature). They provide a means of evaluating an expression over a set of rows.

Whereas the other built-in functions operate on a single expression, aggregates operate on a set of values and reduce them to a single scalar value. Built-in aggregates can calculate the minimum, maximum, sum, count, and average of an expression over a set of values as well as count rows. The following table shows the data types on which each built-in aggregate function can operate.

Aggregates are permitted only in the following:

All expressions in SelectItems in the SelectExpression must be either aggregates or grouped columns (see  GROUP BY clause).(The same is true if there is a HAVING clause without a GROUP BY clause.)

This is because the ResultSet of a SelectExpression must be either a scalar (single value) or a vector (multiple values), but not a mixture of both. (Aggregates evaluate to a scalar value, and the referenceto a column can evaluate to a vector.) For example, the following query mixes scalar and vector values and thus is not valid:

   -- not valid
SELECT MIN(flying_time), flight_id
  FROM Flights;

Aggregates are not allowed on outer references (correlations). This means that if a subquery contains an aggregate, that aggregate cannot evaluate an expression that includes a reference to a column in the outer query block. For example, the following query is not valid because SUM operates on a column from the outer query:

SELECT c1
  FROM t1
  GROUP BY c1
  HAVING c2 >
  SELECT t2.x
  FROM t2
  WHERE t2.y = SUM(t1.c3));

Using Window Functions

Window functions, sometimes referred to as analytic functions, perform calculations across a set of table rows that are related to the current row. They are similar to aggregate functions, with several significant differences; a window function:

  • Always includes an  OVER clause
  • Outputs one row for each input value it operates upon.
  • Groups rows with window partitioning and frame clauses, rather than using  GROUP BY clauses

Most developers who are new to window functions find that the easiest way to understand them is to view examples, such as the ones in the Window Functions topic.

Window functions can be used to handle complex analysis and reporting. For example, it’s easy to output running totals, moving averages for a specific time frame, and similar functions. There are a few simple rules about window function usage you need to know:

  • Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. These restrictions are due to the fact that window functions execute after the processing of those clauses.
  • You can include an aggregate function call in the arguments to a window function; however, you cannot include a window function in the arguments to a regular aggregate function. This is again due to the fact that window functions execute after regular aggregate functions have executed.
  • When the function runs, a window of rows is computed in relation to the current row; as the current row advances, the window moves along with it.

The rows considered by a window function (its input rows) are those of the virtual table produced by the query’s FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses, if any. This means, for example, that a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

For examples and further explanation of window functions, please see the Window Functions topic.

See Also