Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Array, map, struct

DataFusion’s datafusion-functions-nested crate ships ~40 array and map helpers. SQE adds Trino-named aggregate constructors (map_agg, histogram, multimap_agg).

Array construction

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
[1, 2, 3] (literal)datafusion-builtinElement type from common supertype.yesyesyesyes
make_array(a, b, ...)datafusion-builtinFunction form of literal.-yesyesyes
array(...)datafusion-builtinAlias for make_array.yesyesyesyes
range(start, stop)datafusion-builtinHalf-open integer array.---yes
range(start, stop, step)datafusion-builtinWith step.---yes
array_repeat(elem, n)datafusion-builtinArray of n copies.yes--yes

Array inspection

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
array_length(a) / cardinality(a)datafusion-builtinNumber of elements.yesyesyesyes
array_dims(a)datafusion-builtinArray of per-dimension sizes (for nested arrays).---yes
array_ndims(a)datafusion-builtinNesting depth.---yes
array_position(a, elem)datafusion-builtin1-based offset of first match; 0 if missing.yesyesyesyes
array_positions(a, elem)datafusion-builtinArray of all matching offsets.---yes
array_contains(a, elem) / array_has(a, elem)datafusion-builtinBoolean membership.yesyesyesyes
array_has_all(a, sub)datafusion-builtinAll of sub are in a.---yes
array_has_any(a, sub)datafusion-builtinAny of sub is in a.yes--yes

Array transformation

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
array_append(a, elem)datafusion-builtinAdd to end.yesyesyesyes
array_prepend(elem, a)datafusion-builtinAdd to start.yes-yesyes
array_concat(a1, a2, ...)datafusion-builtinVariadic concat.yesyesyesyes
array_remove(a, elem)datafusion-builtinRemove first occurrence.---yes
array_remove_all(a, elem)datafusion-builtinRemove all occurrences.yes--yes
array_replace(a, from, to)datafusion-builtinReplace first match.yes-yesyes
array_replace_all(a, from, to)datafusion-builtinReplace all matches.---yes
array_reverse(a)datafusion-builtinReverse order.yesyesyesyes
array_sort(a)datafusion-builtinAscending sort. NULLs last.yesyesyesyes
array_distinct(a)datafusion-builtinDeduplicate. Preserves first occurrence.yesyesyesyes
array_slice(a, start, end)datafusion-builtin1-based, inclusive. Negative indexes count from end.yesyesyesyes
array_pop_front(a) / array_pop_back(a)datafusion-builtinRemove first / last.yes--yes
array_resize(a, n [, fill])datafusion-builtinTruncate or pad to length n.---yes
array_flatten(a) / flatten(a)datafusion-builtinOne level of flattening.yesyesyesyes

Array set operations

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
array_intersect(a, b)datafusion-builtinCommon elements (set-style).yesyesyesyes
array_union(a, b)datafusion-builtinDistinct combination.yesyesyesyes
array_except(a, b)datafusion-builtinIn a but not in b.yes-yesyes

Array reductions

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
array_min(a)datafusion-builtinMinimum element.yesyesyesyes
array_max(a)datafusion-builtinMaximum.yesyesyesyes
array_sum(a)datafusion-builtinSum of numeric elements.yes--yes
array_mean(a)datafusion-builtinAverage.----
array_any_value(a)datafusion-builtinFirst non-NULL element.----

Array unnesting (lateral)

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
unnest(a)datafusion-builtinOne row per element. Used in FROM.yesyesyes (explode)yes
unnest(a) WITH ORDINALITYdatafusion-builtinAdds 1-based offset column.yes---
-- One row per (order, item) pair
SELECT order_id, item
FROM orders, UNNEST(items) AS t(item);

-- Numbered
SELECT order_id, item, idx
FROM orders, UNNEST(items) WITH ORDINALITY AS t(item, idx);

Map functions

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
map(keys_array, values_array)datafusion-builtinBuild a map from two parallel arrays.yes-yes (map_from_arrays)yes
map_keys(m)datafusion-builtinArray of keys.yesyesyesyes
map_values(m)datafusion-builtinArray of values.yesyesyesyes
map_extract(m, key)datafusion-builtinLookup. NULL if missing. Also accessible via m[key].yes (element_at)yes (get)yes (element_at)yes (element_at)
cardinality(m)datafusion-builtinNumber of keys.yesyesyesyes
m['key']datafusion-builtinSubscript syntax for map lookup.yesyesyesyes

Aggregates that build maps / arrays

See Aggregate functions for array_agg, map_agg, histogram, multimap_agg, map_union. The names differ slightly across engines:

SQETrinoSnowflakeSpark SQLDuckDB
array_agg(x)array_aggarray_aggcollect_listarray_agg / list
map_agg(k, v)map_aggobject_aggmap_from_arraysmap
histogram(x)histogram--histogram
multimap_agg(k, v)multimap_agg---

Struct / row

ConstructOriginNotesTrinoSnowflakeSpark SQLDuckDB
struct(a, b, ...)datafusion-builtinAnonymous record.-yes (object_construct)yesyes
named_struct('a', x, 'b', y)datafusion-builtinNamed-field record.yes (row(...))yes (object_construct)yesyes
s.fielddatafusion-builtinField access.yesyesyesyes
(a, b, ...) (row literal)datafusion-builtinAnonymous tuple.yes-yesyes
SELECT named_struct('host', host, 'port', port) AS endpoint
FROM servers;

SELECT endpoint.host, endpoint.port FROM ...;

Examples

Tag-set membership

-- Find products with both 'sale' and 'new' tags
SELECT * FROM products
WHERE array_has_all(tags, ARRAY['sale', 'new']);

-- Find products with any of the listed tags
SELECT * FROM products
WHERE array_has_any(tags, ARRAY['sale', 'clearance']);

Top-K frequencies via histogram

SELECT k, v
FROM events, UNNEST(map_keys(histogram(event_type)), map_values(histogram(event_type))) AS t(k, v)
ORDER BY v DESC
LIMIT 10;

Build a map from joined tables

SELECT
    user_id,
    map_agg(setting_key, setting_value) AS preferences
FROM user_settings
GROUP BY user_id;

map_agg errors on duplicate keys. For multimap-style behaviour use multimap_agg.

Lateral pattern: filter then unnest

SELECT order_id, tag
FROM orders, UNNEST(tags) AS t(tag)
WHERE order_id > 100 AND tag LIKE 'priority_%';

Lambda functions

DataFusion’s parser does not support lambda syntax (x -> x + 1). Trino, Spark, DuckDB do. The audit rows in features.md note this. Workarounds:

  • Pre-compute via a CTE plus unnest.
  • Use map_filter / transform from datafusion-functions-nested once parser support lands upstream.

What is NOT registered

  • zip(a, b) (parallel-iterate two arrays). Use unnest against an indexed pair instead.
  • reduce(a, init, lambda, finish). Aggregate within a CTE instead.
  • Snowflake flatten table function (with PATH and OUTER options). Use UNNEST directly.

These are tracked but blocked on lambda support in DataFusion.