SQL Syntax
query_statement:
query_expr
query_expr:
[ WITH with_query_name AS ( query_expr ) [, ...] ]
{ select | ( query_expr ) | query_expr set_op query_expr }
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ LIMIT count [ OFFSET skip_rows ] ]
select:
SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
{ [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]
[ FROM from_item [, ...] ]
[ WHERE bool_expression ]
[ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ]
[ HAVING bool_expression ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
set_op:
UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT
from_item: {
table_name [ [ AS ] alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ] |
join |
( query_expr ) [ [ AS ] alias ] |
field_path |
{ UNNEST( array_expression ) | UNNEST( array_path ) | array_path }
[ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] |
with_query_name [ [ AS ] alias ]
}
join:
from_item [ join_type ] JOIN from_item
[ { ON bool_expression | USING ( join_column [, ...] ) } ]
join_type:
{ INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }
Notations
- Square brackets “[ ]” indicate optional clauses.
- Parentheses “( )” indicate literal parentheses.
- The vertical bar “|” indicates a logical OR.
- Curly braces “{ }” enclose a set of options.
- A comma followed by an ellipsis within square brackets “[, … ]” indicates that the preceding item can repeat in a comma-separated list.
SELECT List
Syntax:
SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
{ [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]
The SELECT list defines the columns that the query will return. Expressions in the SELECT list can refer to columns in any of the from_items in its corresponding FROM clause.
Each item in the SELECT list is one of:
- *
- expression
- expression.*
SELECT*
SELECT *
, often referred to as select star, produces one output column for each column that is visible after executing the full query.
SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable);
+-------+-----------+
| fruit | vegetable |
+-------+-----------+
| apple | carrot |
+-------+-----------+
SELECT expression
Items in a SELECT
list can be expressions. These expressions evaluate to a single value and produce one output column, with an optional explicit alias
.
If the expression does not have an explicit alias, it receives an implicit alias according to the rules forimplicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.
SELECT expression.*
An item in a SELECT
list can also take the form of expression.*
. This produces one output column for each column or top-level field of expression
. The expression must either be a table alias or evaluate to a single value of a data type with fields, such as a STRUCT.
The following query produces one output column for each column in the table groceries
, aliased as g
.
WITH groceries AS
(SELECT "milk" AS dairy,
"eggs" AS protein,
"bread" AS grain)
SELECT g.*
FROM groceries AS g;
+-------+---------+-------+
| dairy | protein | grain |
+-------+---------+-------+
| milk | eggs | bread |
+-------+---------+-------+
More Examples —
WITH locations AS
(SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location
UNION ALL
SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location)
SELECT l.location.*
FROM locations l;
+---------+------------+
| city | state |
+---------+------------+
| Seattle | Washington |
| Phoenix | Arizona |
+---------+------------+
WITH locations AS
(SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"),
("Phoenix", "Arizona")] AS location)
SELECT l.LOCATION[offset(0)].*
FROM locations l;
+---------+------------+
| city | state |
+---------+------------+
| Seattle | Washington |
+---------+------------+
SELECT modifiers
You can modify the results returned from a SELECT
query, as follows.
SELECT DISTINCT —
A SELECT DISTINCT
statement discards duplicate rows and returns only the remaining rows. SELECT DISTINCT
cannot return columns of the following types:
- STRUCT
- ARRAY
SELECT * EXCEPT —
A SELECT * EXCEPT
statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;
+-----------+----------+
| item_name | quantity |
+-----------+----------+
| sprocket | 200 |
+-----------+----------+
NOTE: SELECT * EXCEPT
does not exclude columns that do not have names.
SELECT * REPLACE —
A SELECT * REPLACE
statement specifies one or more expression AS identifier
clauses. Each identifier must match a column name from the SELECT *
statement. In the output column list, the column that matches the identifier in a REPLACE
clause is replaced by the expression in that REPLACE
clause.
A SELECT * REPLACE
statement does not change the names or order of columns. However, it can change the value and the value type.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE ("widget" AS item_name)
FROM orders;
+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | widget | 200 |
+----------+-----------+----------+
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE (quantity/2 AS quantity)
FROM orders;
+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | sprocket | 100 |
+----------+-----------+----------+
NOTE: SELECT * REPLACE
does not replace columns that do not have names.
SELECT ALL —
A SELECT ALL
statement returns all rows, including duplicate rows. SELECT ALL
is the default behavior of SELECT
.
Value Tables
In Kochava Query, a value table is a table where the row type is a single value. In a regular table, each row is made up of columns, each of which has a name and a type. In a value table, the row type is just a single value, and there are no column names.
NOTE: In Kochava Query, a query can only return a value table with a type of STRUCT
.
In contexts where a query with exactly one column is expected, a value table query can be used instead. For example, scalar subqueries and array subqueries(see Subqueries) normally require a single-column query, but in Kochava Query, they also allow using a value table query.
A query will produce a value table if it uses SELECT AS
, using one of the syntaxes below:
SELECT AS STRUCT —
Syntax:
SELECT AS STRUCT expr1 [struct_field_name1] [,... ]
This produces a value table with a STRUCT row type, where the STRUCT field names and types match the column names and types produced in the SELECT
list. Anonymous columns and duplicate columns are allowed.
Example:
SELECT AS STRUCT 1 x, 2, 3 x
The query above produces STRUCT values of type STRUCT<int64 x, int64, int64 x
. The first and third fields have the same name x, and the second field is anonymous.
The example above produces the same result as this query using a struct constructor:
SELECT AS VALUE STRUCT(1 AS x, 2, 3 AS x)
Example:
SELECT
ARRAY(SELECT AS STRUCT t.f1, t.f2 WHERE t.f3=true)
FROM
Table t
SELECT AS STRUCT
can be used in a scalar or array subquery to produce a single STRUCT type grouping multiple values together. Scalar and array subqueries(see Subqueries) are normally not allowed to return multiple columns.
SELECT AS VALUE —
SELECT AS VALUE
produces a value table from any SELECT
list that produces exactly one column. Instead of producing an output table with one column, possibly with a name, the output will be a value table where the row type is just the value type that was produced in the one SELECT
column. Any alias the column had will be discarded in the value table.
Example:
SELECT AS VALUE STRUCT(1 a, 2 b) xyz FROM Table;
The query above produces a table with row type STRUCT<a int64, b int64>
.
Aliases
SeeAliases for information on syntax and visibility for SELECT
list aliases.
Analytic Functions
Clauses related to analytic functions are documented elsewhere.
-
OVER
Clause andPARTITION BY
:See Analytic Functions. -
WINDOW
Clause and Window Functions:See WINDOW Clause.
FROM Clause
The FROM
clause indicates the table or tables from which to retrieve rows, and specifies how to join those rows together to produce a single stream of rows for processing in the rest of the query.
Syntax
from_item: {
table_name [ [ AS ] alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ] |
join |
( query_expr ) [ [ AS ] alias ] |
field_path |
{ UNNEST( array_expression ) | UNNEST( array_path ) | array_path }
[ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] |
with_query_name [ [ AS ] alias ]
}
table_name
The name (optionally qualified) of an existing table.
SELECT * FROM Roster;
SELECT * FROM dataset.Roster;
SELECT * FROM project.dataset.Roster;
FOR SYSTEM_TIME AS OF
FOR SYSTEM_TIME AS OF
references the historical versions of the table definition and rows that were current at timestamp_expression
.
Limitations:
The source table in the FROM
clause containing FOR SYSTEM_TIME AS OF
must not be any of the following:
- An
ARRAY
scan, including a flattened array or the output of theUNNEST
operator. - A common table expression defined by a
WITH
clause.
timestamp_expression
must be a constant expression. It cannot contain the following:
- Subqueries.
- Correlated references (references to columns of a table that appear at a higher level of the query statement, such as in the
SELECT
list). - User-defined functions (UDFs).
The value of timestamp_expression
cannot fall into the following ranges:
- After the current timestamp (in the future).
- More than seven (7) days before the current timestamp.
A single query statement cannot reference a single table at more than one point in time, including the current time. That is, a query can reference a table multiple times at the same timestamp, but not the current version and a historical version, or two different historical versions.
Examples:
The following query returns a historical version of the table from one hour ago.
SELECT *
FROM t
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
The following query returns a historical version of the table at an absolute point in time.
SELECT *
FROM t
FOR SYSTEM_TIME AS OF '2017-01-01 10:00:00-07:00';
The following query returns an error because the timestamp_expression contains a correlated reference to a column in the containing query.
SELECT *
FROM t1
WHERE t1.a IN (SELECT t2.a
FROM t2 FOR SYSTEM_TIME AS OF t1.timestamp_column);
join
SeeJOIN Types below.
select
( select ) [ [ AS ] alias ]
is a tablesubquery.
field_path
In the FROM
clause, field_path
is any path that resolves to a field within a data type. field_path
can go arbitrarily deep into a nested data structure.
Some examples of valid field_path values include:
SELECT * FROM T1 t1, t1.array_column;
SELECT * FROM T1 t1, t1.struct_column.array_field;
SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1;
SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a;
SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1;
Field paths in the FROM clause must end in an array field. In addition, field paths cannot contain arrays before the end of the path. For example, the path array_column.some_array.some_array_field
is invalid because it contains an array before the end of the path.
NOTE: If a path has only one name, it is interpreted as a table. To work around this, wrap the path using UNNEST
, or use the fully-qualified path.
UNNEST
The UNNEST
operator takes an ARRAY
and returns a table, with one row for each element in the ARRAY
. You can also use UNNEST
outside of the FROM
clause with theIN operator.
For input ARRAY
s of most element types, the output of UNNEST
generally has one column. This single column has an optional alias
, which you can use to refer to the column elsewhere in the query. ARRAYS
with these element types return multiple columns:
- STRUCT
UNNEST
destroys the order of elements in the input ARRAY
. Use the optional WITH OFFSET
clause to return a second column with the array element indexes (see below).
For an input ARRAY
of STRUCT
s, UNNEST
returns a row for each STRUCT
, with a separate column for each field in the STRUCT. The alias for each column is the name of the corresponding STRUCT
field.
Example:
SELECT *
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]);
+---+-----+
| x | y |
+---+-----+
| 3 | bar |
| 1 | foo |
+---+-----+
Because the UNNEST
operator returns avalue table, you can alias UNNEST
to define a range variable that you can reference elsewhere in the query. If you reference the range variable in the SELECT
list, the query returns a STRUCT
containing all of the fields of the original STRUCT
in the input table.
Example:
SELECT *, struct_value
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')])
AS struct_value;
+---+-----+--------------+
| x | y | struct_value |
+---+-----+--------------+
| 3 | bar | {3, bar} |
| 1 | foo | {1, foo} |
+---+-----+--------------+
ARRAY unnesting can be either explicit or implicit. In explicit unnesting, array_expression
must return an ARRAY value but does not need to resolve to an ARRAY, and the UNNEST
keyword is required.
Example:
SELECT * FROM UNNEST ([1, 2, 3]);
In implicit unnesting, array_path
must resolve to an ARRAY and the
UNNEST
keyword is optional.
Example:
SELECT x
FROM mytable AS t,
t.struct_typed_column.array_typed_field1 AS x;
In this scenario, array_path
can go arbitrarily deep into a data structure, but the last field must be ARRAY-typed. No previous field in the expression can be ARRAY-typed because it is not possible to extract a named field from an ARRAY.
UNNEST
treats NULLs as follows:
- NULL and empty ARRAYs produces zero rows.
- An ARRAY containing NULLs produces rows containing NULL values.
The optional WITH OFFSET
clause returns a separate column containing the “offset” value (i.e. counting starts at zero) for each row produced by the UNNEST
operation. This column has an optional alias
; the default alias is offset.
Example:
SELECT * FROM UNNEST ( ) WITH OFFSET AS num;
See theArrays topic for more ways to use UNNEST
, including construction, flattening, and filtering.
with_query_name
The query names in a WITH
clause (see WITH Clause) act like names of temporary tables that you can reference anywhere in the FROM
clause. In the example below, subQ1
and subQ2
are with_query_names
.
Example:
WITH
subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52),
subQ2 AS (SELECT SchoolID FROM subQ1)
SELECT DISTINCT * FROM subQ2;
The WITH
clause hides any permanent tables with the same name for the duration of the query, unless you qualify the table name, e.g., dataset.Roster
or project.dataset.Roster
.
Subqueries
A subquery is a query that appears inside another statement, and is written inside parentheses. These are also referred to as “sub-SELECTs” or “nested SELECTs”. The full SELECT
syntax is valid in subqueries.
There are two types of subquery:
- Expression Subqueries, which you can use in a query wherever expressions are valid. Expression subqueries return a single value.
- Table subqueries, which you can use only in a
FROM
clause. The outer query treats the result of the subquery as a table.
NOTE: There must be parentheses around both types of subqueries.
Example:
SELECT AVG ( PointsScored )
FROM
( SELECT PointsScored
FROM Stats
WHERE SchoolID = 77 )
Optionally, a table subquery can have an alias.
Example:
SELECT r.LastName
FROM
( SELECT * FROM Roster) AS r;
Aliases
SeeAliases for information on syntax and visibility for FROM
clause aliases.
JOIN types
Syntax
join:
from_item [ join_type ] JOIN from_item
[ ON bool_expression | USING ( join_column [, ...] ) ]
join_type:
{ INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }
The JOIN
clause merges two from_items
so that the SELECT
clause can query them as one source. The join_type
and ON
or USING
clause (a “join condition”) specify how to combine and discard rows from the two from_item
s to form a single source.
All JOIN
clauses require a join_type
.
A JOIN
clause requires a join condition unless one of the following conditions is true:
join_type
isCROSS
.- One or both of the
from_item
s is not a table, e.g., anarray_path
orfield_path
.
[INNER] JOIN
An INNER JOIN
, or simply JOIN
, effectively calculates the Cartesian product of the two from_items
and discards all rows that do not meet the join condition. “Effectively” means that it is possible to implement an INNER JOIN
without actually calculating the Cartesian product.
CROSS JOIN
CROSS JOIN
returns the Cartesian product of the two from_item
s. In other words, it retains all rows from both from_item
s and combines each row from the first from_item
s with each row from the second from_item
s.
Comma Cross Joins —
CROSS JOIN
s can be written explicitly (see directly above) or implicitly using a comma to separate the from_item
s.
Example of an implicit “comma cross join”:
SELECT * FROM Roster, TeamMascot;
Here is the explicit cross join equivalent:
SELECT * FROM Roster CROSS JOIN TeamMascot;
You cannot write comma cross joins inside parentheses.
Invalid – comma cross join inside parentheses:
SELECT * FROM t CROSS JOIN (Roster, TeamMascot); // INVALID.
SeeSequences of JOINs for details on how a comma cross join behaves in a sequence of JOINs.
FULL [OUTER] JOIN
A FULL OUTER JOIN
(or simply FULL JOIN
) returns all fields for all rows in both from_item
s that meet the join condition.
FULL
indicates that all rows from both from_item
s are returned, even if they do not meet the join condition.
OUTER
indicates that if a given row from one from_item
does not join to any row in the other from_item
, the row will return with NULLs for all columns from the other from_item
.
LEFT [OUTER] JOIN
The result of a LEFT OUTER JOIN
(or simply LEFT JOIN
) for two from_item
s always retains all rows of the left from_item
in the JOIN
clause, even if no rows in the right from_item
satisfy the join predicate.
LEFT
indicates that all rows from the left from_item
are returned; if a given row from the left from_item
does not join to any row in the right from_item
, the row will return with NULLs for all columns from the right from_item. Rows from the right from_item
that do not join to any row in the left from_item
are discarded.
RIGHT [OUTER] JOIN
The result of a RIGHT OUTER JOIN
(or simply RIGHT JOIN
) is similar and symmetric to that of LEFT OUTER JOIN
.
ON Clause
The ON
clause contains a bool_expression
. A combined row (the result of joining two rows) meets the join condition if bool_expression
returns TRUE.
Example:
SELECT * FROM Roster INNER JOIN PlayerStats
ON Roster.LastName = PlayerStats.LastName;
USING Clause
The USING
clause requires a column_list
of one or more columns which occur in both input tables. It performs an equality comparison on that column, and the rows meet the join condition if the equality comparison returns TRUE.
In most cases, a statement with the USING
keyword is equivalent to using the ON
keyword. For example, the statement
SELECT FirstName
FROM Roster INNER JOIN PlayerStats
USING (LastName);
is equivalent to:
SELECT FirstName
FROM Roster INNER JOIN PlayerStats
ON Roster.LastName = PlayerStats.LastName;
The results from queries with USING
do differ from queries that use ON
when you use SELECT *
. To illustrate this, consider the query:
SELECT * FROM Roster INNER JOIN PlayerStats
USING (LastName);
This statement returns the rows from Roster
and PlayerStats
where Roster.LastName
is the same as PlayerStats.LastName
. The results include a single LastName
column.
By contrast, consider the following query:
SELECT * FROM Roster INNER JOIN PlayerStats
ON Roster.LastName = PlayerStats.LastName;
This statement returns the rows from Roster
and PlayerStats
where Roster.LastName
is the same as PlayerStats.LastName
. The results include two LastName
columns; one from Roster
and one from PlayerStats
.
Sequence of JOINs
The FROM
clause can contain multiple JOIN
clauses in sequence.
Example:
SELECT * FROM a LEFT JOIN b ON TRUE LEFT JOIN c ON TRUE;
where a
, b
, and c
are any from_items. JOINs are bound from left to right, but you can insert parentheses to group them in a different order.
Consider the following queries: A (without parentheses) and B (with parentheses) are equivalent to each other but not to C. The FULL JOIN
in bold binds first.
A.
SELECT * FROM Roster FULL JOIN TeamMascot USING (SchoolID)
FULL JOIN PlayerStats USING (LastName);
B.
SELECT * FROM ( (Roster FULL JOIN TeamMascot USING (SchoolID))
FULL JOIN PlayerStats USING (LastName));
C.
SELECT * FROM (Roster FULL JOIN (TeamMascot FULL JOIN PlayerStats USING
(LastName)) USING (SchoolID)) ;
When comma cross joins are present in a query with a sequence of JOINs, they group from left to right like other JOIN types.
Example:
SELECT * FROM a JOIN b ON TRUE, b JOIN c ON TRUE;
The query above is equivalent to
SELECT * FROM ((a JOIN b ON TRUE) CROSS JOIN b) JOIN c ON TRUE);
There cannot be a RIGHT JOIN
or FULL JOIN
after a comma join.
Invalid – RIGHT JOIN
after a comma cross join:
SELECT * FROM Roster, TeamMascot RIGHT JOIN PlayerStats ON TRUE; // INVALID.
WHERE Clause
Syntax
WHERE bool_expression
The WHERE
clause filters out rows by evaluating each row against bool_expression
, and discards all rows that do not return TRUE (that is, rows that return FALSE or NULL).
Example:
SELECT * FROM Roster
WHERE SchoolID = 52;
The bool_expression
can contain multiple sub-conditions.
Example:
SELECT * FROM Roster
WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac");
You cannot reference column aliases from the SELECT
list in the WHERE
clause.
Expressions in an INNER JOIN
have an equivalent expression in the WHERE
clause. For example, a query using INNER JOIN
and ON
has an equivalent expression using CROSS JOIN
and WHERE
.
Example – this query:
SELECT * FROM Roster INNER JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
is equivalent to:
SELECT * FROM Roster CROSS JOIN TeamMascot
WHERE Roster.SchoolID = TeamMascot.SchoolID;
GROUP BY Clause
Syntax
GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) }
The GROUP BY
clause groups together rows in a table with non-distinct values for the expression in the GROUP BY
clause. For multiple rows in the source table with non-distinct values for expression
, the GROUP BY
clause produces a single combined row. GROUP BY
is commonly used when aggregate functions are present in the SELECT
list, or to eliminate redundancy in the output. The data type of expression
must begroupable.
Example:
SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName;
The GROUP BY
clause can refer to expression names in the SELECT
list. The GROUP BY
clause also allows ordinal references to expressions in the SELECT
list using integer values. 1
refers to the first expression in the SELECT
list, 2
the second, and so forth. The expression list can combine ordinals and expression names.
Example:
SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;
The query above is equivalent to:
SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY 2, FirstName;
GROUP BY
clauses may also refer to aliases. If a query contains aliases in the SELECT
clause, those aliases override names in the corresponding FROM
clause.
Example:
SELECT SUM(PointsScored), LastName as last_name
FROM PlayerStats
GROUP BY last_name;
GROUP BY ROLLUP
returns the results of GROUP BY
for prefixes of the expressions in the ROLLUP
list, each of which is known as a grouping set. For the ROLLUP
list (a, b, c)
, the grouping sets are (a, b, c)
, (a, b)
, (a)
, ()
. When evaluating the results of GROUP BY
for a particular grouping set, GROUP BY ROLLUP
treats expressions that are not in the grouping set as having a NULL
value. A SELECT
statement like this one:
SELECT a, b, SUM(c) FROM Input GROUP BY ROLLUP(a, b);
uses the rollup list (a, b)
. The result will include the results of GROUP BY
for the grouping sets (a, b)
, (a)
, and ()
, which includes all rows. This returns the same rows as:
SELECT NULL, NULL, SUM(c) FROM Input UNION ALL
SELECT a, NULL, SUM(c) FROM Input GROUP BY a UNION ALL
SELECT a, b, SUM(c) FROM Input GROUP BY a, b;
This allows the computation of aggregates for the grouping sets defined by the expressions in the ROLLUP
list and the prefixes of that list.
Example:
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT 456, 3, 4.25 UNION ALL
SELECT 789, 3, 0.99
)
SELECT
day,
SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(day);
The query above outputs a row for each day in addition to the rolled up total across all days, as indicated by a NULL
day:
+------+-------+
| day | total |
+------+-------+
| NULL | 39.77 |
| 1 | 23.54 |
| 2 | 9.99 |
| 3 | 6.24 |
+------+-------+
Example:
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT 456, 3, 4.25 UNION ALL
SELECT 789, 3, 0.99
)
SELECT
sku,
day,
SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;
The query above returns rows grouped by the following grouping sets:
- sku and day
- sku (day is
NULL
) - The empty grouping set (day and sku are
NULL
)
The sums for these grouping sets correspond to the total for each distinct sku-day combination, the total for each sku across all days, and the grand total:
+------+------+-------+
| sku | day | total |
+------+------+-------+
| NULL | NULL | 39.77 |
| 123 | NULL | 28.97 |
| 123 | 1 | 18.98 |
| 123 | 2 | 9.99 |
| 456 | NULL | 8.81 |
| 456 | 1 | 4.56 |
| 456 | 3 | 4.25 |
| 789 | 3 | 1.99 |
| 789 | NULL | 1.99 |
+------+------+-------+
HAVING Clause
Syntax
HAVING bool_expression
The HAVING
clause is similar to the WHERE
clause: it filters out rows that do not return TRUE when they are evaluated against the bool_expression
.
As with the WHERE
clause, the bool_expression
can be any expression that returns a boolean, and can contain multiple sub-conditions.
The HAVING
clause differs from the WHERE
clause in that:
- The
HAVING
clause requiresGROUP BY
or aggregation to be present in the query. - The
HAVING
clause occurs afterGROUP BY
and aggregation, and beforeORDER BY
. This means that theHAVING
clause is evaluated once for every aggregated row in the result set. This differs from theWHERE
clause, which is evaluated beforeGROUP BY
and aggregation.
The HAVING
clause can reference columns available via the FROM
clause, as well as SELECT
list aliases. Expressions referenced in the HAVING
clause must either appear in the GROUP BY
clause or they must be the result of an aggregate function:
GROUP BYSELECT LastName
FROM Roster
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
If a query contains aliases in the SELECT
clause, those aliases override names in a FROM
clause.
SELECT LastName, SUM(PointsScored) AS ps
FROM Roster
GROUP BY LastName
HAVING ps > 0;
Mandatory Aggregation
Aggregation does not have to be present in the HAVING
clause itself, but aggregation must be present in at least one of the following forms:
Aggregation function in the SELECT list
SELECT LastName, SUM(PointsScored) AS total
FROM PlayerStats
GROUP BY LastName
HAVING total > 15;
Aggregation function in the ‘HAVING’ clause
SELECT LastName
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
Aggregation in both the SELECT list and HAVING clause
When aggregation functions are present in both the SELECT
list and HAVING
clause, the aggregation functions and the columns they reference do not need to be the same. In the example below, the two aggregation functions, COUNT()
and SUM()
, are different and also use different columns.
SELECT LastName, COUNT(*)
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
ORDER BY Clause
Syntax
ORDER BY expression [{ ASC | DESC }] [, ...]
The ORDER BY
clause specifies a column or expression as the sort criterion for the result set. If an ORDER BY clause is not present, the order of the results of a query is not defined. The default sort direction is ASC
, which sorts the results in ascending order of expression
values. DESC
sorts the results in descending order. Column aliases from a FROM
clause or SELECT
list are allowed. If a query contains aliases in the SELECT
clause, those aliases override names in the corresponding FROM
clause.
It is possible to order by multiple columns. In the example below, the result set is ordered first by SchoolID
and then by LastName
:
SELECT LastName, PointsScored, OpponentID
FROM PlayerStats
ORDER BY SchoolID, LastName;
The following rules apply when ordering values:
- NULLs: In the context of the
ORDER BY
clause, NULLs are the minimum possible value; that is, NULLs appear first inASC
sorts and last inDESC
sorts. - Floating point data types: seeFloating Point Semantics on ordering and grouping.
When used in conjunction withset operators, the ORDER BY
clause applies to the result set of the entire query; it does not apply only to the closest SELECT
statement. For this reason, it can be helpful (though it is not required) to use parentheses to show the scope of the ORDER BY.
This query without parentheses:
SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot
ORDER BY SchoolID;
is equivalent to this query with parentheses:
( SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot )
ORDER BY SchoolID;
but is not equivalent to this query, where the ORDER BY clause applies only to the second SELECT statement:
SELECT * FROM Roster
UNION ALL
( SELECT * FROM TeamMascot
ORDER BY SchoolID );
You can also use integer literals as column references in ORDER BY
clauses. An integer literal becomes an ordinal (for example, counting starts at 1) into the SELECT
list.
Example – the following two queries are equivalent:
SELECT SUM(PointsScored), LastName
FROM PlayerStats
ORDER BY LastName;
SELECT SUM(PointsScored), LastName
FROM PlayerStats
ORDER BY 2;
Set Operators
Syntax
UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT
Set operators combine results from two or more input queries into a single result set. You must specify ALL
or DISTINCT
; if you specify ALL
, then all rows are retained. If DISTINCT
is specified, duplicate rows are discarded.
If a given row R appears exactly m times in the first input query and n times in the second input query (m >= 0, n >= 0):
- For
UNION ALL
, R appears exactly m + n times in the result. - For
UNION DISTINCT
, theDISTINCT
is computed after theUNION
is computed, so R appears exactly one time. - For
INTERSECT DISTINCT
, theDISTINCT
is computed after the result above is computed. - For
EXCEPT DISTINCT
, row R appears once in the output if m > 0 and n = 0. - If there are more than two input queries, the above operations generalize and the output is the same as if the inputs were combined incrementally from left to right.
The following rules apply:
- For set operations other than
UNION ALL
, all column types must support equality comparison. - The input queries on each side of the operator must return the same number of columns.
- The operators pair the columns returned by each input query according to the columns’ positions in their respective
SELECT
lists. That is, the first column in the first input query is paired with the first column in the second input query. - The result set always uses the column names from the first input query.
- The result set always uses the supertypes of input types in corresponding columns, so paired columns must also have either the same data type or a common supertype.
- You must use parentheses to separate different set operations; for this purpose, set operations such as
UNION ALL
andUNION DISTINCT
are different. If the statement only repeats the same set operation, parentheses are not necessary.
Example:
query1 UNION ALL (query2 UNION DISTINCT query3)
query1 UNION ALL query2 UNION ALL query3
Invalid:
query1 UNION ALL query2 UNION DISTINCT query3
query1 UNION ALL query2 INTERSECT ALL query3; // INVALID.
UNION
The UNION
operator combines the result sets of two or more input queries by pairing columns from the result set of each query and vertically concatenating them.
INTERSECT
The INTERSECT
operator returns rows that are found in the result sets of both the left and right input queries. Unlike EXCEPT
, the positioning of the input queries (to the left vs. right of the INTERSECT
operator) does not matter.
EXCEPT
The EXCEPT
operator returns rows from the left input query that are not present in the right input query.
LIMIT Clause and OFFSET Clause
Syntax
LIMIT count [ OFFSET skip_rows ]
LIMIT
specifies a non-negative count
of type INT64, and no more than count
rows will be returned. LIMIT 0
returns 0 rows. If there is a set operation, LIMIT
is applied after the set operation is evaluated.
OFFSET
specifies a non-negative skip_rows
of type INT64, and only rows from that offset in the table will be considered.
These clauses accept only literal or parameter values.
The rows that are returned by LIMIT
and OFFSET
is unspecified unless these operators are used after ORDER BY
.
WITH Clause
The WITH
clause contains one or more named subqueries which execute every time a subsequent SELECT
statement references them. Any clause or subquery can reference subqueries you define in the WITH
clause. This includes any SELECT
statements on either side of a set operator, such as UNION
.
The WITH
clause is useful primarily for readability, because Kochava Query does not materialize the result of the queries inside the WITH
clause. If a query appears in more than one WITH
clause, it executes in each clause.
Example:
WITH subQ1 AS (SELECT SchoolID FROM Roster),
subQ2 AS (SELECT OpponentID FROM PlayerStats)
SELECT * FROM subQ1
UNION ALL
SELECT * FROM subQ2;
Another useful role of the WITH
clause is to break up more complex queries into a WITH SELECT
statement and WITH
clauses, where the less desirable alternative is writing nested table subqueries. If a WITH clause contains multiple subqueries, the subquery names cannot repeat.
Kochava Query supports WITH
clauses in subqueries, such as table subqueries, expression subqueries, and so on.
WITH q1 AS (my_query)
SELECT *
FROM
(WITH q2 AS (SELECT * FROM q1) SELECT * FROM q2)
The following are scoping rules for WITH
clauses:
- Aliases are scoped so that the aliases introduced in a
WITH
clause are visible only in the later subqueries in the sameWITH
clause, and in the query under theWITH
clause. - Aliases introduced in the same
WITH
clause must be unique, but the same alias can be used in multipleWITH
clauses in the same query. The local alias overrides any outer aliases anywhere that the local alias is visible. - Aliased subqueries in a
WITH
clause can never be correlated. No columns from outside the query are visible. The only names from outside that are visible are otherWITH
aliases that were introduced earlier in the sameWITH
clause.
Here’s an example of a statement that uses aliases in WITH subqueries:
WITH q1 AS (my_query)
SELECT *
FROM
(WITH q2 AS (SELECT * FROM q1), # q1 resolves to my_query
q3 AS (SELECT * FROM q1), # q1 resolves to my_query
q1 AS (SELECT * FROM q1), # q1 (in the query) resolves to my_query
q4 AS (SELECT * FROM q1) # q1 resolves to the WITH subquery
# on the previous line.
SELECT * FROM q1) # q1 resolves to the third inner WITH subquery.
Kochava Query does not support WITH RECURSIVE
.
Aliases
An alias is a temporary name given to a table, column, or expression present in a query. You can introduce explicit aliases in the SELECT
list or FROM
clause, or Kochava Query will infer an implicit alias for some expressions. Expressions with neither an explicit nor implicit alias are anonymous and the query cannot reference them by name.
Explicit Alias Syntax
You can introduce explicit aliases in either the FROM
clause or the SELECT
list.
In a FROM
clause, you can introduce explicit aliases for any item, including tables, arrays, subqueries, and UNNEST
clauses, using [AS] alias
. The AS
keyword is optional.
Example:
SELECT s.FirstName, s2.SongName
FROM Singers AS s, (SELECT * FROM Songs) AS s2;
You can introduce explicit aliases for any expression in the SELECT
list using [AS] alias
. The AS
keyword is optional.
Example:
SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname
FROM Singers s;
Explicit Alias Visibility
After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of Kochava Query’s name scoping rules.
FROM Clause Aliases
Kochava Query processes aliases in a FROM
clause from left to right, and aliases are visible only to subsequent path expressions in a FROM
clause.
Example:
Assume the Singers table had a Concerts column of ARRAY
type.
SELECT FirstName
FROM Singers AS s, s.Concerts;
Invalid:
SELECT FirstName
FROM s.Concerts, Singers AS s; // INVALID.
FROM
clause aliases are not visible to subqueries in the same FROM
clause. Subqueries in a FROM
clause cannot contain correlated references to other tables in the same FROM
clause.
Invalid:
SELECT FirstName
FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s) // INVALID.
You can use any column name from a table in the FROM
as an alias anywhere in the query, with or without qualification with the table name.
Example:
SELECT FirstName, s.ReleaseDate
FROM Singers s WHERE ReleaseDate = 1975;
If the FROM
clause contains an explicit alias, you must use the explicit alias instead of the implicit alias for the remainder of the query(see Implicit Aliases). A table alias is useful for brevity or to eliminate ambiguity in cases such as self-joins, where the same table is scanned multiple times during query processing.
Example:
SELECT * FROM Singers as s, Songs as s2
ORDER BY s.LastName
Invalid — ORDER BY does not use the table alias:
SELECT * FROM Singers as s, Songs as s2
ORDER BY Singers.LastName; // INVALID.
SELECT list Aliases
Aliases in the SELECT
list are visible only to the following clauses:
- GROUP BY clause
- ORDER BY clause
- HAVING clause
Example:
SELECT LastName AS last, SingerID
FROM Singers
ORDER BY last;
Explicit Aliases in GROUP BY, ORDER BY, and HAVING Clauses
These three clauses, GROUP BY
, ORDER BY
, and HAVING
, can refer to only the following values:
- Tables in the
FROM
clause and any of their columns. - Aliases from the
SELECT
list.
GROUP BY
and ORDER BY
can also refer to a third group:
- Integer literals, which refer to items in the
SELECT
list. The integer1
refers to the first item in theSELECT
list,2
refers to the second item, etc.
Example:
SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY 1
ORDER BY 2 DESC;
The query above is equivalent to:
SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY sid
ORDER BY s2id DESC;
Ambiguous Aliases
Kochava Query provides an error if a name is ambiguous, meaning it can resolve to more than one unique object.
Examples:
This query contains column names that conflict between tables, since both Singers
and Songs
have a column named SingerID
:
SELECT SingerID
FROM Singers, Songs;
This query contains aliases that are ambiguous in the GROUP BY
clause because they are duplicated in the SELECT
list:
SELECT FirstName AS name, LastName AS name,
FROM Singers
GROUP BY name;
Ambiguity between a FROM
clause column name and a SELECT
list alias in GROUP BY
:
SELECT UPPER(LastName) AS LastName
FROM Singers
GROUP BY LastName;
The query above is ambiguous and will produce an error because LastName
in the GROUP BY
clause could refer to the original column LastName
in Singers
, or it could refer to the alias AS LastName
, whose value is UPPER(LastName)
.
The same rules for ambiguity apply to path expressions. Consider the following query where table
has columns x
and y
, and column z
is of type STRUCT and has fields v
, w
, and x
.
Example:
SELECT x, z AS T
FROM table T
GROUP BY T.x;
The alias T
is ambiguous and will produce an error because T.x
in the GROUP BY
clause could refer to either table.x
or table.z.x
.
A name is not ambiguous in GROUP BY
, ORDER BY
or HAVING
if it is both a column name and a SELECT
list alias, as long as the name resolves to the same underlying object.
Example:
SELECT LastName, BirthYear AS BirthYear
FROM Singers
GROUP BY BirthYear;
The alias BirthYear
is not ambiguous because it resolves to the same underlying column, Singers.BirthYear
.
Implicit Aliases
In the SELECT
list, if there is an expression that does not have an explicit alias, Kochava Query assigns an implicit alias according to the following rules. There can be multiple columns with the same alias in the SELECT
list.
- For identifiers, the alias is the identifier. For example,
SELECT abc
impliesAS abc
. - For path expressions, the alias is the last identifier in the path. For example,
SELECT abc.def.ghi
impliesAS ghi
. - For field access using the “dot” member field access operator, the alias is the field name. For example,
SELECT
(struct_function()).fname
impliesAS fname
.
In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.
In a FROM
clause, from_items
are not required to have an alias. The following rules apply:
- If there is an expression that does not have an explicit alias, Kochava Query assigns an implicit alias in these cases:
- For identifiers, the alias is the identifier. For example,
FROM abc
impliesAS abc
. - For path expressions, the alias is the last identifier in the path. For example,
FROM abc.def.ghi
impliesAS ghi
- The column produced using
WITH OFFSET
has the implicit alias offset. - Table subqueries do not have implicit aliases.
-
FROM UNNEST(x)
does not have an implicit alias.
Examples with Sample Data
Sample Tables
The following three tables contain sample data about athletes, their schools, and the points they score during the season. These tables will be used to illustrate the behavior of different query clauses.
Table Roster
LasName | SchoolID |
---|---|
Adams | 50 |
Buchanan | 52 |
Coolidge | 52 |
Davis | 51 |
Eisenhower | 77 |
The Roster table includes a list of player names (LastName) and the unique ID assigned to their school (SchoolID).
Table PlayerStats
LasName | OpponentID | |
---|---|---|
Adams | 51 | 3 |
Buchanan | 77 | 0 |
Coolidge | 77 | 1 |
Adams | 52 | 4 |
Buchanan | 50 | 13 |
The PlayerStats table includes a list of player names (LastName) and the unique ID assigned to the opponent they played in a given game (OpponentID) and the number of points scored by the athlete in that game (PointsScored).
Table TeamMascot
SchoolID | Mascot |
---|---|
50 | Jaguars |
51 | Kinghts |
52 | Lakers |
53 | Mustangs |
The TeamMascot table includes a list of unique school IDs (SchoolID) and the mascot for that school (Mascot).
JOIN Types
[INNER] JOIN
Example:
SELECT * FROM Roster JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
LastName | Roster.SchoolId | TeamMascot.SchoolId | Masoct |
---|---|---|---|
Adams | 50 | 50 | Jaguars |
Buchanan | 52 | 52 | Lakers |
Coolidge | 52 | 52 | Lakers |
Davis | 51 | 51 | Knights |
CROSS JOIN
Example:
SELECT * FROM Roster CROSS JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
---|---|---|---|
Adams | 50 | 50 | Jaguars |
Adams | 50 | 51 | Knights |
Adams | 50 | 52 | Lakers |
Adams | 50 | 53 | Mustangs |
Buchanan | 52 | 50 | Jaguars |
Buchanan | 52 | 51 | Knights |
Buchanan | 52 | 52 | Lakers |
Buchanan | 52 | 53 | Mustangs |
Coolidge | 52 | 50 | Jaguars |
Coolidge | 52 | 51 | Knights |
Coolidge | 52 | 52 | Lakers |
Coolidge | 52 | 53 | Mustangs |
Davis | 51 | 50 | Jaguars |
Davis | 51 | 51 | Knights |
Davis | 51 | 52 | Lakers |
Davis | 51 | 53 | Mustangs |
Eisenhower | 77 | 50 | Jaguars |
Eisenhower | 77 | 51 | Knights |
Eisenhower | 77 | 52 | Lakers |
Eisenhower | 77 | 53 | Mustangs |
FULL [OUTER]JOIN
Example:
SELECT * FROM Roster FULL JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
LastName | Roster.SchoolId | TeamMaascot.SchoolId | Mascot |
---|---|---|---|
Adams | 50 | 50 | Jaguars |
Buchanan | 52 | 52 | Lakers |
Coolidge | 52 | 52 | Lakers |
Davis | 51 | 51 | Knights |
Eisenhower | 71 | NULL | NULL |
NULL | NULL | 53 | Mustangs |
LEFT [OUTER] JOIN
Example:
SELECT * FROM Roster LEFT JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
---|---|---|---|
Adams | 50 | 50 | Jaguars |
Buchanan | 52 | 52 | Lakers |
Coolidge | 52 | 52 | Lakers |
Davis | 51 | 51 | Knights |
Eisenhower | 77 | NULL | NULL |
RIGHT [OUTER] JOIN
Example:
SELECT * FROM Roster RIGHT JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
---|---|---|---|
Adams | 50 | 50 | Jaguars |
Davis | 51 | 51 | Knights |
Coolidge | 52 | 52 | Lakers |
Buchanan | 52 | 52 | Lakers |
NULL | NULL | 53 | Mustangs |
GROUP BY Clause
Example:
SELECT LastName, SUM(PointsScored)
FROM PlayerStats
GROUP BY LastName;
LastName | SUM |
---|---|
Adams | 7 |
Buchanan | 13 |
Coolidge | 1 |
Set Operators
UNION
The UNION
operator combines the result sets of two or more SELECT
statements by pairing columns from the result set of each SELECT
statement and vertically concatenating them.
Example:
SELECT Mascot AS X, SchoolID AS Y
FROM TeamMascot
UNION ALL
SELECT LastName, PointsScored
FROM PlayerStats;
Results:
X | Y |
---|---|
Jaguars | 50 |
Knights | 51 |
Lakers | 52 |
Mustangs | 53 |
Adams | 3 |
Buchanan | 0 |
Coolidge | 1 |
Adams | 4 |
Buchanan | 13 |
INTERSECT
This query returns the last names that are present in both Roster and PlayerStats.
SELECT LastName
FROM Roster
INTERSECT DISTINCT
SELECT LastName
FROM PlayerStats;
Results:
LastName |
---|
Adams |
Coolidge |
Buchanan |
EXCEPT
The query below returns last names in Roster that are not present in PlayerStats.
SELECT LastName
FROM Roster
EXCEPT DISTINCT
SELECT LastName
FROM PlayerStats;
Results:
LastName |
---|
Eisenhower |
Davis |
Reversing the order of the SELECT statements will return last names in PlayerStats that are not present in Roster:
SELECT LastName
FROM PlayerStats
EXCEPT DISTINCT
SELECT LastName
FROM Roster;
Results:
(empty)