Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

String functions

DataFusion contributes ~35 string functions plus a unicode submodule and a regex submodule. SQE adds Trino-name aliases and a few extras.

Concatenation

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
concat(a, b, ...)datafusion-builtinVariadic. NULL inputs become empty string.yesyesyesyes
concat_ws(sep, a, b, ...)datafusion-builtinConcat with separator; skips NULL args.yesyesyesyes
a || bdatafusion-builtinSQL standard concat. NULL propagates.yesyesyesyes

Length and offsets

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
length(s) / char_length(s) / character_length(s)datafusion-builtinNumber of characters. UTF-8-aware.yesyesyesyes
octet_length(s)datafusion-builtinNumber of bytes.yesyesyesyes
bit_length(s)datafusion-builtinoctet_length * 8.yesyesyesyes
position(needle in haystack) / strpos(haystack, needle)datafusion-builtin1-based offset of needle. 0 if not found.yesyesyesyes
find_in_set(needle, comma_list)datafusion-builtin1-based offset in a comma-separated list.--yes-
codepoint(s)sqe-trino-functionsUnicode codepoint of the first character. trino_functions.rs:101yes---
chr(n) / char(n)datafusion-builtinCodepoint -> character.yesyesyesyes
ascii(s)datafusion-builtinCodepoint of the first character.yesyesyesyes

Slicing

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
substring(s from start [for length])datafusion-builtinSQL standard. 1-based.yesyesyesyes
substr(s, start [, length])datafusion-builtinFunction form.yesyesyesyes
left(s, n)datafusion-builtinLeftmost n chars.yesyesyesyes
right(s, n)datafusion-builtinRightmost n chars.yesyesyesyes
split_part(s, delim, n)datafusion-builtinNth part after splitting by delim.yesyesyesyes
split(s, delim)sqe-trino-functionsReturns array of parts. trino_functions.rs:118yesyesyesyes
split_part(s, delim, n)datafusion-builtinSingle part by index.yesyesyesyes

Trimming

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
trim(s) / trim(both ' ' from s)datafusion-builtinDefault trims whitespace.yesyesyesyes
ltrim(s [, chars])datafusion-builtinTrim from left.yesyesyesyes
rtrim(s [, chars])datafusion-builtinTrim from right.yesyesyesyes
btrim(s [, chars])datafusion-builtinTrim both sides; explicit alias of trim.-yesyesyes

Case and padding

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
lower(s) / upper(s)datafusion-builtinCase conversion.yesyesyesyes
initcap(s)datafusion-builtinTitle-case each word.yesyesyesyes
lpad(s, n [, fill])datafusion-builtinPad on left to length n.yesyesyesyes
rpad(s, n [, fill])datafusion-builtinPad on right.yesyesyesyes
repeat(s, n)datafusion-builtinRepeat n times.yesyesyesyes
reverse(s)datafusion-builtinReverse the string.yesyesyesyes
replace(s, from, to)datafusion-builtinReplace all occurrences.yesyesyesyes
translate(s, from, to)datafusion-builtinPer-character substitution.yesyesyesyes
overlay(s placing rep from start [for length])datafusion-builtinSplice substring.yesyesyesyes
format(pattern, args...)sqe-trino-functions (ext)C-style printf. trino_functions_ext.rs:79yes-yes-

Predicates

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
starts_with(s, prefix)datafusion-builtinTrue if s begins with prefix.yesyesyesyes
ends_with(s, suffix)datafusion-builtinTrue if s ends with suffix.yesyesyesyes
contains(s, sub)datafusion-builtinTrue if s contains sub.-yesyesyes
s LIKE patterndatafusion-builtinSQL standard pattern. _ and %.yesyesyesyes
s ILIKE patterndatafusion-builtinCase-insensitive LIKE.yesyespartialyes
s SIMILAR TO patterndatafusion-builtinSQL/POSIX-light regex.yesyes-yes

Regular expressions

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
regexp_like(s, pattern)datafusion-builtinTrue if pattern matches anywhere.yesyesyesyes
regexp_match(s, pattern)datafusion-builtinReturns array of capture groups.yesyesyesyes
regexp_replace(s, pattern, repl [, flags])datafusion-builtinReplace matches. Flags g (global), i (insensitive).yesyesyesyes
regexp_count(s, pattern)datafusion-builtinCount of non-overlapping matches.yesyesyesyes
regexp_extract(s, pattern [, group])sqe-trino-functions (ext)Extract first match (or capture group N). trino_functions_ext.rs:46yesyesyesyes
regexp_extract_all(s, pattern [, group])sqe-trino-functions (ext)All matches as array. trino_functions_ext.rs:47yes-partial-
regexp_split(s, pattern)sqe-trino-functions (ext)Split by regex. trino_functions_ext.rs:48yes---

Hashing

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
md5(s)datafusion-builtin128-bit hex.yesyesyesyes
sha224(s)datafusion-builtinSHA-224 hex.yes-yes-
sha256(s)datafusion-builtin + sqe-policySHA-256 hex. SQE registers an alias used by column masks.yesyesyesyes
sha384(s)datafusion-builtinSHA-384 hex.yes-yes-
sha512(s)datafusion-builtinSHA-512 hex.yesyesyesyes
digest(s, algorithm)datafusion-builtinGeneric digest. Algos: md5, sha224, sha256, sha384, sha512, blake2s, blake2b, blake3.----

Distance / phonetic

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
levenshtein(a, b)datafusion-builtinEdit distance.yesyesyesyes
hamming_distance(a, b)sqe-trino-functions (ext)Hamming distance for equal-length strings. trino_functions_ext.rs:35yes---
soundex(s)sqe-trino-functions (ext)Soundex code. trino_functions_ext.rs:34yesyesyes-
word_stem(s [, lang])sqe-trino-functions (ext)Stemmer. Default English. trino_functions_ext.rs:61yes---

Encoding

See Encoding, hashing, URL for from_base64, to_base64, from_hex, to_hex, encode, decode.

Unicode normalization

FunctionOriginNotesTrinoSnowflakeSpark SQLDuckDB
normalize(s [, form])sqe-trino-functions (ext)Unicode normalization. Forms: NFC (default), NFD, NFKC, NFKD. trino_functions_ext.rs:49yes---
from_utf8(bytes)sqe-trino-functionsConvert binary to UTF-8 string. trino_functions.rs:85yes---
to_utf8(s)sqe-trino-functionsConvert string to UTF-8 binary. trino_functions.rs:86yes---

Examples

Cleanse user input

SELECT
    initcap(trim(lower(name))) AS name_clean,
    regexp_replace(email, '\\s+', '') AS email_clean,
    coalesce(nullif(trim(comment), ''), 'no comment') AS comment_clean
FROM users;

Extract domain from URL

SELECT
    url,
    regexp_extract(url, 'https?://([^/]+)', 1) AS host_via_regex,
    url_extract_host(url) AS host_via_helper
FROM access_logs;

url_extract_host is dramatically faster than the regex version. Prefer it whenever the input is well-formed URLs.

Tokenise a sentence

SELECT id, token
FROM articles, UNNEST(split(body, ' ')) AS t(token)
WHERE length(token) > 3;

Mask sensitive data

SELECT
    user_id,
    sha256(email) AS email_hash,
    concat('***', right(phone, 4)) AS phone_masked
FROM users;

(For declarative masking via grants, see GRANT and REVOKE.)

What is NOT supported

  • SOUNDEX_DIFFERENCE (T-SQL). use levenshtein(soundex(a), soundex(b)).
  • PARSE_URL family with named parts. Use the url_extract_* family in Encoding, hashing, URL.
  • STRING_SPLIT_TO_ARRAY. use split(s, delim).