Support Home > Advanced Tools > Custom Query > Standard SQL Data Types

Standard SQL Data Types

Standard SQL is compliant with the SQL 2011 standard and has extensions that support querying nested and repeated data. If you are using legacy SQL, refer to our SQL Functions and Operators.


Standard SQL Data Types

Kochava Query supports simple data types such as integers, as well as more complex types such as ARRAY and STRUCT. This page provides an overview of each data type, including allowed values. For information on data type literals and constructors, seeLexical Structure and Syntax.

 

Data Types Properties:

When storing and querying data, it is helpful to keep the following data type properties in mind:

PropertyDescriptionApplies To
NullableNULL is a valid value.All data types, with the following exceptions:

  • ARRAYs cannot be NULL.
  • NULL ARRAY elements cannot persist to a table.
OrderableCan be used in an ORDER BY clause.All data types except for:

  • ARRAY
  • STRUCT
  • GEOGRAPHY
GroupableCan generally appear in an expression following GROUP BY, DISTINCT, or PARTITION BY.
However, PARTITION BY expressions cannot include the floating point types FLOAT and DOUBLE.
All data types except for:

  • ARRAY
  • STRUCT
  • GEOGRAPHY
ComparableValues of the same type can be compared to each other.All data types, with the following exceptions: ARRAY comparisons are not supported.

Equality comparisons for STRUCTs are supported field by field, in field order. Field names are ignored. Less than and greater than comparisons are not supported.

GEOGRAPHY comparisons are not supported. To compare GEOGRAPHY values, useST_Equals.

All types that support comparisons can be used in a JOIN condition. SeeJOIN Types for an explanation of join conditions.


Numeric Types:

Numeric types include integer types, floating point types and the NUMERIC data type.

 

Integer Type —

Integers are numeric values that do not have fractional components.

NameStorage SizeRange
INT648 bytes-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

 

Floating Point Type —

The NUMERIC data type is an exact numeric value with 38 digits of precision and 9 decimal digits of scale. Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point.

This type can represent decimal fractions exactly, and is suitable for financial calculations.

NameStorage SizeDescriptionRange
NUMERIC16 bytesDecimal values with 38 decimal digits of precision and 9 decimal digits of scale.
-99999999999999999999999999999.999999999
to
99999999999999999999999999999.999999999

 

Floating Point Semantics —

Floating point values are approximate numeric values with fractional components.

NameStorage SizeDescription
FLOAT648 bytesDouble precision (approximate) decimal values.

 

Floating Point Semantics —

When working with floating point numbers, there are special non-numeric values that need to be considered: NaN and +/-inf

Arithmetic operators provide standard IEEE-754 behavior for all finite input values that produce finite output and for all operations for which at least one input is non-finite.

Function calls and operators return an overflow error if the input is finite but the output would be non-finite. If the input contains non-finite values, the output can be non-finite. In general functions do not introduce NaNs or +/-inf. However, specific functions like IEEE_DIVIDE can return non-finite values on finite input. All such cases are noted explicitly inMathematical functions.

 

Mathematical Function Examples —

Left TermOperatorRight TermReturns
Any value+NaNNaN
1.0++inf=inf
1.0+-inf-inf
-inf++infNaN
Maximum FLOAT64 value+Maximum FLOAT64 valueOverflow error
Maximum FLOAT64 value/2.00.0
1.0/0.0“Divide by zero” error

Comparison operators provide standard IEEE-754 behavior for floating point input.

Left TermOperatorRIght TermReturns
NaN=Any valueFALSE
NaN<Any valueFALSE
Any value<NaNFALSE
-0.0=0.0TRUE
-0.0<0.0FALSE

Floating point values are sorted in this order, from least to greatest:

  1. NULL
  2. NaN — All NaN values are considered equal when sorting.
  3. -inf
  4. Negative numbers
  5. 0 or -0 — All zero values are considered equal when sorting.
  6. Positive numbers
  7. +inf

Special floating point values are grouped this way, including both grouping done by a GROUP BY clause and grouping done by the DISTINCT keyword:

  • NULL
  • NaN — All NaN values are considered equal when grouping.
  • -inf
  • 0 or -0 — All zero values are considered equal when grouping.
  • +inf

Boolean Type:

NameDescription
BOOLBoolean values are represented by the keywords TRUE and FALSE (case insensitive).

String Type:

NameDescription
STRINGVariable-length character (Unicode) data.

Input STRING values must be UTF-8 encoded and output STRING values will be UTF-8 encoded. Alternate encodings like CESU-8 and Modified UTF-8 are not treated as valid UTF-8.

All functions and operators that act on STRING values operate on Unicode characters rather than bytes. For example, functions like SUBSTR and LENGTH applied to STRING input count Unicode characters, not bytes. Comparisons are defined on Unicode characters. Comparisons for less than and ORDER BY compare character by character, and lower unicode code points are considered lower characters.

Most functions on STRING are also defined on BYTES. The BYTES version operates on raw bytes rather than Unicode characters. STRING and BYTES are separate types that cannot be used interchangeably. There is no implicit casting in either direction. Explicit casting between STRING and BYTES does UTF-8 encoding and decoding. Casting BYTES to STRING returns an error if the bytes are not valid UTF-8.


Bytes Type:

NameDescription
BYTESVariable-length binary data.

STRING and BYTES are separate types that cannot be used interchangeably. Most functions on STRING are also defined on BYTES. The BYTES version operates on raw bytes rather than Unicode characters. Casts between STRING and BYTES enforce that the bytes are encoded using UTF-8.


Date Type:

NameDescriptionRange
DATERepresents a logical calendar date.0001-01-01 to 9999-12-31.

The DATE type represents a logical calendar date, independent of time zone. A DATE value does not represent a specific 24-hour time period. Rather, a given DATE value represents a different 24-hour period when interpreted in different time zones, and may represent a shorter or longer day during Daylight Savings Time transitions. To represent an absolute point in time, use a timestamp.

 

Canonical Format —

‘YYYY-[M]M-[D]D’

  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day

Datetime Type:

NameDescriptionRange
DATETIMERepresents a year, month, day, hour, minute, second, and subsecond.0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999.

A DATETIME represents a point in time. Each DATETIME contains the following:

  • year
  • month
  • day
  • hour
  • minute
  • second
  • subsecond

UnlikeTimestamps, a DATETIME object does not refer to an absolute instance in time. Instead, it is the civil time, or the time that a user would see on a watch or calendar.

 

Canonical Format —

YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]

  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a T separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (i.e. up to microsecond precision)

Geography Type:

NameDescription
GEOGRAPHYA collection of points, lines, and polygons, which is represented as a point set, or a subset of the surface of the Earth.

The GEOGRAPHY type is based on theOGC Simple Features specification (SFS), and is a collection of points, lines, or polygons that form a “simple” arrangement on theWGS84 reference ellipsoid. A simple arrangement is one where no point on the WGS84 surface is contained by multiple elements of the collection.

A GEOGRAPHY is the result of, or an argument to, aGeography Function.


Time Type:

NameDescriptionRange
TIMERepresents a time, independent of a specific date.00:00:00 to 23:59:59.999999.

A TIME data type represents a time, independent of a specific date.

 

Canonical Format —

[H]H:[M]M:[S]S[.DDDDDD]
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (i.e. up to microsecond precision)

Timestamp Type:

NameDescriptionRange
TIMERepresents an absolute point in time, with microsecond precision.0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 UTC.

A timestamp represents an absolute point in time, independent of any time zone or convention such as Daylight Savings Time.

TIMESTAMP provides microsecond precision.

 

Canonical Format —

YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]

  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a T separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (i.e. up to microsecond precision)
  • [time zone]: String representing the time zone. See thetime zones section for details.

 

Time zones are used when parsing timestamps or formatting timestamps for display. The timestamp value itself does not store a specific time zone. A string-formatted timestamp may include a time zone. When a time zone is not explicitly specified, the default time zone, UTC, is used.

 

Time Zones —

Time zones are represented by strings in one of these two canonical formats:

  • Offset from Coordinated Universal Time (UTC), or the letter Z for UTC
  • Time zone name from thetz database
  •  

    Offset from Coordinated Universal Time (UTC) —

    Offset Format
    (+|-)H[H][:M[M]] Z

    Examples
    -08:00
    -8:15
    +3:00
    +07:30
    -7
    Z

    When using this format, no space is allowed between the time zone and the rest of the timestamp.

    2014-09-27 12:30:00.45-8:00
    2014-09-27T12:30:00.45Z

 

Time Zone Name —

Time zone names are from thetz database. For a less comprehensive but simpler reference, see theList of tz database time zones on Wikipedia.

Format
continent/[region/]city

Examples
America/Los_Angeles
America/Argentina/Buenos_Aires

 

When using a time zone name, a space is required between the name and the rest of the timestamp:

2014-09-27 12:30:00.45 America/Los_Angeles
Note that not all time zone names are interchangeable even if they do happen to report the same time during a given part of the year. For example, America/Los_Angeles reports the same time as UTC-7:00 during Daylight Savings Time, but reports the same time as UTC-8:00 outside of Daylight Savings Time.

If a time zone is not specified, the default time zone value is used.

 

Leap Seconds —

A timestamp is simply an offset from 1970-01-01 00:00:00 UTC, assuming there are exactly 60 seconds per minute. Leap seconds are not represented as part of a stored timestamp.

If your input contains values that use “:60” in the seconds field to represent a leap second, that leap second is not preserved when converting to a timestamp value. Instead that value is interpreted as a timestamp with “:00” in the seconds field of the following minute.

Leap seconds do not affect timestamp computations. All timestamp computations are done using Unix-style timestamps, which do not reflect leap seconds. Leap seconds are only observable through functions that measure real-world time. In these functions, it is possible for a timestamp second to be skipped or repeated when there is a leap second.


Array Type:

NameDescription
ARRAYOrdered list of zero or more elements of any non-ARRAY type.

An ARRAY is an ordered list of zero or more elements of non-ARRAY values. ARRAYs of ARRAYs are not allowed. Queries that would produce an ARRAY of ARRAYs will return an error. Instead a STRUCT must be inserted between the ARRAYs using the SELECT AS STRUCT construct.

Currently, Kochava Query has two following limitations with respect to NULLs and ARRAYs:

  • Kochava Query raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
  • Kochava Query translates

 

Declaring an Array Type —

ARRAY types are declared using the angle brackets (< and >). The type of the elements of an ARRAY can be arbitrarily complex with the exception that an ARRAY cannot directly contain another ARRAY.

Format
ARRAY

Examples

Type DeclarationMeaning
ARRAYSimple ARRAY of 64-bit integers.
ARRAY>An ARRAY of STRUCTs, each of which contains two 64-bit integers.
ARRAY>
(not supported)
This is an invalid type declaration which is included here just in case you came looking for how to create a multi-level ARRAY. ARRAYs cannot contain ARRAYs directly. Instead see the next example.
ARRAY>>An ARRAY of ARRAYS of 64-bit integers. Notice that there is a STRUCT between the two ARRAYs because ARRAYs cannot hold other ARRAYs directly.

Struct Type:

NameDescription
STRUCTContainer of ordered fields each with a type (required) and field name (optional).

 

Declaring a STRUCT Type —

STRUCT types are declared using the angle brackets (< and >). The type of the elements of a STRUCT can be arbitrarily complex.

Format
STRUCT

Examples

Type DeclarationMeaning
STRUCTSimple STRUCT with a single unnamed 64-bit integer field.
STRUCT>A STRUCT with a nested STRUCT named x inside it. The STRUCT x has two fields, y and z, both of which are 64-bit integers.
STRUCT>A STRUCT containing an ARRAY named inner_array that holds 64-bit integer elements.

 

Constructing a STRUCT —

STRUCT types are declared using the angle brackets (< and >). The type of the elements of a STRUCT can be arbitrarily complex.

Format
STRUCT

Examples

Type DeclarationMeaning
STRUCTSimple STRUCT with a single unnamed 64-bit integer field.
STRUCT>A STRUCT with a nested STRUCT named x<>/strong> inside it. The STRUCT x has two fields, y and z, both of which are 64-bit integers.
STRUCT>A STRUCT containing an ARRAY named inner_array that holds 64-bit integer elements.

 

Constructing a STRUCT —

Tuple Syntax

Format
(expr1, expr2 [, … ])

The output type is an anonymous STRUCT type with anonymous fields with types matching the types of the input expressions. There must be at least two expressions specified. Otherwise this syntax is indistinguishable from an expression wrapped with parentheses.

Examples

SyntaxOutput TypeNotes
(x, x+y)STRUCTIf column names are used (unquoted strings), the STRUCT field data type is derived from the column data type. x and y are columns, so the data types of the STRUCT fields are derived from the column types and the output type of the addition operator.

This syntax can also be used with STRUCT comparison for comparison expressions using multi-part keys, e.g. in a WHERE clause:

WHERE (Key1,Key2) IN ( (12,34), (56,78) )

Typeless Struct Syntax

Format

STRUCT( expr1 [AS field_name] [, … ])

Duplicate field names are allowed. Fields without names are considered anonymous fields and cannot be referenced by name. STRUCT values can be NULL, or can have NULL field values.

Examples

SyntaxOutput Type
STRUCT(1,2,3)STRUCT
STRUCT()STRUCT<>
STRUCT(‘abc’)STRUCT
STRUCT(1, t.str_col)STRUCT
STRUCT(1 AS a, ‘abc’ AS b)STRUCT
STRUCT(str_col AS abc)STRUCT

Limited Comparisons for STRUCT:

STRUCTs can be directly compared using equality operators:

  • Equal (=)
  • Not Equal (!= or <>)
  • [NOT] IN

Notice, though, that these direct equality comparisons compare the fields of the STRUCT pairwise in ordinal order ignoring any field names. If instead you want to compare identically named fields of a STRUCT, you can compare the individual fields directly.

 
 

Last Modified: Dec 28, 2023 at 11:46 am