DataFusion contributes ~35 string functions plus a unicode submodule and a regex submodule. SQE adds Trino-name aliases and a few extras.
Function Origin Notes Trino Snowflake Spark SQL DuckDB
concat(a, b, ...)datafusion-builtinVariadic. NULL inputs become empty string. yes yes yes yes
concat_ws(sep, a, b, ...)datafusion-builtinConcat with separator; skips NULL args. yes yes yes yes
a || bdatafusion-builtinSQL standard concat. NULL propagates. yes yes yes yes
Function Origin Notes Trino Snowflake Spark SQL DuckDB
length(s) / char_length(s) / character_length(s)datafusion-builtinNumber of characters. UTF-8-aware. yes yes yes yes
octet_length(s)datafusion-builtinNumber of bytes. yes yes yes yes
bit_length(s)datafusion-builtinoctet_length * 8.yes yes yes yes
position(needle in haystack) / strpos(haystack, needle)datafusion-builtin1-based offset of needle. 0 if not found. yes yes yes yes
find_in_set(needle, comma_list)datafusion-builtin1-based offset in a comma-separated list. - - yes -
codepoint(s)sqe-trino-functionsUnicode codepoint of the first character. trino_functions.rs:101 yes - - -
chr(n) / char(n)datafusion-builtinCodepoint -> character. yes yes yes yes
ascii(s)datafusion-builtinCodepoint of the first character. yes yes yes yes
Function Origin Notes Trino Snowflake Spark SQL DuckDB
substring(s from start [for length])datafusion-builtinSQL standard. 1-based. yes yes yes yes
substr(s, start [, length])datafusion-builtinFunction form. yes yes yes yes
left(s, n)datafusion-builtinLeftmost n chars. yes yes yes yes
right(s, n)datafusion-builtinRightmost n chars. yes yes yes yes
split_part(s, delim, n)datafusion-builtinNth part after splitting by delim. yes yes yes yes
split(s, delim)sqe-trino-functionsReturns array of parts. trino_functions.rs:118 yes yes yes yes
split_part(s, delim, n)datafusion-builtinSingle part by index. yes yes yes yes
Function Origin Notes Trino Snowflake Spark SQL DuckDB
trim(s) / trim(both ' ' from s)datafusion-builtinDefault trims whitespace. yes yes yes yes
ltrim(s [, chars])datafusion-builtinTrim from left. yes yes yes yes
rtrim(s [, chars])datafusion-builtinTrim from right. yes yes yes yes
btrim(s [, chars])datafusion-builtinTrim both sides; explicit alias of trim. - yes yes yes
Function Origin Notes Trino Snowflake Spark SQL DuckDB
lower(s) / upper(s)datafusion-builtinCase conversion. yes yes yes yes
initcap(s)datafusion-builtinTitle-case each word. yes yes yes yes
lpad(s, n [, fill])datafusion-builtinPad on left to length n. yes yes yes yes
rpad(s, n [, fill])datafusion-builtinPad on right. yes yes yes yes
repeat(s, n)datafusion-builtinRepeat n times. yes yes yes yes
reverse(s)datafusion-builtinReverse the string. yes yes yes yes
replace(s, from, to)datafusion-builtinReplace all occurrences. yes yes yes yes
translate(s, from, to)datafusion-builtinPer-character substitution. yes yes yes yes
overlay(s placing rep from start [for length])datafusion-builtinSplice substring. yes yes yes yes
format(pattern, args...)sqe-trino-functions (ext)C-style printf. trino_functions_ext.rs:79 yes - yes -
Function Origin Notes Trino Snowflake Spark SQL DuckDB
starts_with(s, prefix)datafusion-builtinTrue if s begins with prefix. yes yes yes yes
ends_with(s, suffix)datafusion-builtinTrue if s ends with suffix. yes yes yes yes
contains(s, sub)datafusion-builtinTrue if s contains sub. - yes yes yes
s LIKE patterndatafusion-builtinSQL standard pattern. _ and %. yes yes yes yes
s ILIKE patterndatafusion-builtinCase-insensitive LIKE. yes yes partial yes
s SIMILAR TO patterndatafusion-builtinSQL/POSIX-light regex. yes yes - yes
Function Origin Notes Trino Snowflake Spark SQL DuckDB
regexp_like(s, pattern)datafusion-builtinTrue if pattern matches anywhere. yes yes yes yes
regexp_match(s, pattern)datafusion-builtinReturns array of capture groups. yes yes yes yes
regexp_replace(s, pattern, repl [, flags])datafusion-builtinReplace matches. Flags g (global), i (insensitive). yes yes yes yes
regexp_count(s, pattern)datafusion-builtinCount of non-overlapping matches. yes yes yes yes
regexp_extract(s, pattern [, group])sqe-trino-functions (ext)Extract first match (or capture group N). trino_functions_ext.rs:46 yes yes yes yes
regexp_extract_all(s, pattern [, group])sqe-trino-functions (ext)All matches as array. trino_functions_ext.rs:47 yes - partial -
regexp_split(s, pattern)sqe-trino-functions (ext)Split by regex. trino_functions_ext.rs:48 yes - - -
Function Origin Notes Trino Snowflake Spark SQL DuckDB
md5(s)datafusion-builtin128-bit hex. yes yes yes yes
sha224(s)datafusion-builtinSHA-224 hex. yes - yes -
sha256(s)datafusion-builtin + sqe-policySHA-256 hex. SQE registers an alias used by column masks. yes yes yes yes
sha384(s)datafusion-builtinSHA-384 hex. yes - yes -
sha512(s)datafusion-builtinSHA-512 hex. yes yes yes yes
digest(s, algorithm)datafusion-builtinGeneric digest. Algos: md5, sha224, sha256, sha384, sha512, blake2s, blake2b, blake3. - - - -
Function Origin Notes Trino Snowflake Spark SQL DuckDB
levenshtein(a, b)datafusion-builtinEdit distance. yes yes yes yes
hamming_distance(a, b)sqe-trino-functions (ext)Hamming distance for equal-length strings. trino_functions_ext.rs:35 yes - - -
soundex(s)sqe-trino-functions (ext)Soundex code. trino_functions_ext.rs:34 yes yes yes -
word_stem(s [, lang])sqe-trino-functions (ext)Stemmer. Default English. trino_functions_ext.rs:61 yes - - -
See Encoding, hashing, URL for from_base64, to_base64, from_hex, to_hex, encode, decode.
Function Origin Notes Trino Snowflake Spark SQL DuckDB
normalize(s [, form])sqe-trino-functions (ext)Unicode normalization. Forms: NFC (default), NFD, NFKC, NFKD. trino_functions_ext.rs:49 yes - - -
from_utf8(bytes)sqe-trino-functionsConvert binary to UTF-8 string. trino_functions.rs:85 yes - - -
to_utf8(s)sqe-trino-functionsConvert string to UTF-8 binary. trino_functions.rs:86 yes - - -
SELECT
initcap(trim(lower(name))) AS name_clean,
regexp_replace(email, '\\s+', '') AS email_clean,
coalesce(nullif(trim(comment), ''), 'no comment') AS comment_clean
FROM users;
SELECT
url,
regexp_extract(url, 'https?://([^/]+)', 1) AS host_via_regex,
url_extract_host(url) AS host_via_helper
FROM access_logs;
url_extract_host is dramatically faster than the regex version. Prefer it whenever the input is well-formed URLs.
SELECT id, token
FROM articles, UNNEST(split(body, ' ')) AS t(token)
WHERE length(token) > 3;
SELECT
user_id,
sha256(email) AS email_hash,
concat('***', right(phone, 4)) AS phone_masked
FROM users;
(For declarative masking via grants, see GRANT and REVOKE .)
SOUNDEX_DIFFERENCE (T-SQL). use levenshtein(soundex(a), soundex(b)).
PARSE_URL family with named parts . Use the url_extract_* family in Encoding, hashing, URL .
STRING_SPLIT_TO_ARRAY . use split(s, delim).