Functions used in GROUP BY queries and OVER clauses. SQE inherits ~40 aggregates from DataFusion plus 12 Trino UDAFs from sqe-trino-functions.
Function Origin Notes Trino Snowflake Spark SQL DuckDB
count(expr)datafusion-builtinCounts non-NULL rows. countcountcountcount
count(*)datafusion-builtinCounts all rows. countcountcountcount
count(distinct expr)datafusion-builtinDistinct non-NULL count. count(distinct ...)count(distinct ...)count(distinct ...)count(distinct ...)
sum(expr)datafusion-builtinSum. NULL-skipping. sumsumsumsum
sum(distinct expr)datafusion-builtinDistinct sum. sum(distinct ...)sum(distinct ...)sum(distinct ...)sum(distinct ...)
avg(expr) / mean(expr)datafusion-builtinArithmetic mean. NULL-skipping. avgavgavg / meanavg / mean
min(expr)datafusion-builtinMinimum. minminminmin
max(expr)datafusion-builtinMaximum. maxmaxmaxmax
median(expr)datafusion-builtinExact median. Slower than approx_median on big inputs. - medianmedianmedian
Function Origin Notes Trino Snowflake Spark SQL DuckDB
variance(x) / var_samp(x)datafusion-builtinSample variance. variance / var_sampvariance_sampvariance / var_sampvariance / var_samp
var_pop(x)datafusion-builtinPopulation variance. var_popvariance_popvar_popvar_pop
stddev(x) / stddev_samp(x)datafusion-builtinSample stddev. stddev / stddev_sampstddev_sampstddev / stddev_sampstddev
stddev_pop(x)datafusion-builtinPopulation stddev. stddev_popstddev_popstddev_popstddev_pop
corr(y, x)datafusion-builtinPearson correlation. corrcorrcorrcorr
covar_samp(y, x) / covar_pop(y, x)datafusion-builtinSample / population covariance. covar_samp / covar_popcovar_samp / covar_popcovar_samp / covar_popcovar_samp / covar_pop
regr_slope(y, x)datafusion-builtinLinear regression slope. regr_sloperegr_slope- regr_slope
regr_intercept(y, x)datafusion-builtiny-intercept. regr_interceptregr_intercept- regr_intercept
regr_r2(y, x)datafusion-builtinR-squared. regr_r2regr_r2- regr_r2
regr_count, regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgydatafusion-builtinRegression sums and counts. yes yes - yes
Function Origin Notes Trino Snowflake Spark SQL DuckDB
approx_distinct(expr)datafusion-builtinHyperLogLog distinct count. ~1% error. approx_distinctapprox_count_distinctapprox_count_distinctapprox_count_distinct
approx_median(expr)datafusion-builtinMedian estimate via t-digest. - - approx_percentile(0.5)approx_quantile(0.5)
approx_percentile_cont(expr, p)datafusion-builtinPercentile estimate via t-digest. p in [0, 1]. approx_percentileapprox_percentileapprox_percentileapprox_quantile
approx_percentile(expr, p)sqe-trino-functionsTrino-named alias of approx_percentile_cont. trino_functions.rs:164 approx_percentile- - -
Function Origin Notes Trino Snowflake Spark SQL DuckDB
bool_and(x)datafusion-builtinTrue only if every row is true. bool_andbooland_aggbool_and / everybool_and
bool_or(x)datafusion-builtinTrue if any row is true. bool_orboolor_aggbool_or / anybool_or
every(x)sqe-trino-functionsTrino-named alias of bool_and. trino_functions.rs:170 every- every-
bit_and(x)datafusion-builtinBitwise AND of all values. bitwise_and_aggbitand_aggbit_and-
bit_or(x)datafusion-builtinBitwise OR. bitwise_or_aggbitor_aggbit_or-
bit_xor(x)datafusion-builtinBitwise XOR. bitwise_xor_aggbitxor_aggbit_xor-
bitwise_and_agg(x)sqe-trino-functionsTrino name for bit_and. trino_functions.rs:144 bitwise_and_agg- - -
bitwise_or_agg(x)sqe-trino-functionsTrino name for bit_or. trino_functions.rs:150 bitwise_or_agg- - -
bitwise_xor_agg(x)sqe-trino-functionsTrino name for bit_xor. trino_functions.rs:158 bitwise_xor_agg- - -
Function Origin Notes Trino Snowflake Spark SQL DuckDB
first_value(expr [order by ...])datafusion-builtinFirst row’s value. Most useful with OVER ordering. first_valuefirst_valuefirst_valuefirst_value
last_value(expr [order by ...])datafusion-builtinLast row’s value. last_valuelast_valuelast_valuelast_value
nth_value(expr, n [order by ...])datafusion-builtinNth row’s value. nth_valuenth_valuenth_valuenth_value
max_by(value, key)sqe-trino-functionsvalue from the row with the max key. trino_functions.rs:177max_bymax_by- arg_max
min_by(value, key)sqe-trino-functionsvalue from the row with the min key. trino_functions.rs:178min_bymin_by- arg_min
arbitrary(expr)sqe-trino-functions (ext)Any one non-NULL value. Trino-named alias of any_value. trino_functions_ext.rs:68 arbitraryany_valueany_valueany_value
Function Origin Notes Trino Snowflake Spark SQL DuckDB
array_agg(expr)datafusion-builtinCollect into an array. NULLs included. array_aggarray_aggcollect_listarray_agg / list
array_agg(distinct expr)datafusion-builtinDistinct array. array_agg(distinct)array_agg(distinct)collect_setlist_distinct
string_agg(expr, sep)datafusion-builtinConcatenate with separator. SQL standard. array_join/listagglistagg- string_agg
listagg(expr, sep)sqe-trino-functionsSame as string_agg; Snowflake / Trino name. trino_functions.rs:138 listagglistagg- -
histogram(expr)sqe-trino-functionsMap of value -> count. trino_functions.rs:188 histogram- - histogram
map_agg(key, value)sqe-trino-functionsBuild a map by aggregating key-value pairs. Last write wins. trino_functions.rs:189 map_aggobject_aggmap_from_arraysmap
multimap_agg(key, value)sqe-trino-functionsMap where each value is an array (collects duplicates). trino_functions.rs:190 multimap_agg- - -
map_union(map_col)sqe-trino-functionsAggregate already-built maps into one. trino_functions.rs:191 map_union- - -
Modifier Notes
agg_func(expr) FILTER (WHERE pred)Filter rows before aggregation. Cleaner than agg_func(CASE WHEN pred THEN expr END).
agg_func(distinct expr)Distinct values only.
agg_func(expr) OVER (...)Window form. Uses PARTITION BY, ORDER BY, frame clauses. See Window functions .
agg_func(expr) WITHIN GROUP (ORDER BY ...)Ordered aggregate (e.g. listagg).
Example using FILTER:
SELECT
region,
count(*) AS total_orders,
count(*) FILTER (WHERE status = 'cancelled') AS cancelled,
sum(amount) FILTER (WHERE status = 'shipped') AS shipped_revenue
FROM orders
GROUP BY region;
Construct Origin Notes Trino Snowflake Spark SQL DuckDB
GROUP BY GROUPING SETS ((a, b), (a), ())datafusion-builtinMultiple grouping levels in one query. yes yes yes yes
GROUP BY CUBE (a, b, c)datafusion-builtinAll 2^N grouping combinations. yes yes yes yes
GROUP BY ROLLUP (a, b, c)datafusion-builtinHierarchical: (), (a), (a, b), (a, b, c). yes yes yes yes
GROUPING(col)datafusion-builtinReturns 1 if col was rolled up in this row, else 0. yes yes yes yes
SELECT
region,
product,
sum(amount) AS revenue,
GROUPING(region) AS region_rolled_up,
GROUPING(product) AS product_rolled_up
FROM orders
GROUP BY ROLLUP (region, product)
ORDER BY region, product;
count(distinct) exact . sub-second on millions of rows; avoid above ~1B distinct values.
approx_distinct HyperLogLog . order of magnitude faster on huge inputs. ~1% relative error.
median exact . sorts the entire group; expensive on big partitions.
approx_median / approx_percentile_cont t-digest . sub-percent error, much cheaper memory profile.
For dashboards over multi-billion-row tables, default to approximations. For audit queries that need exact counts, default to exact.