All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
(?m)
flag), only \n
is recognized as a line terminator. Additionally, the (?d)
flag is
not supported and must not be used.(?i)
flag) is always
performed in a Unicode-aware manner. However, context-sensitive and
local-sensitive matching is not supported. Additionally, the (?u)
flag is not supported and must not be used.\uD800\uDC00
is
not treated as U+10000
and must be specified as \x{10000}
.\b
) are incorrectly handled for a non-spacing mark
without a base character.\Q
and \E
are not supported in character classes (such as
[A-Z123]
) and are instead treated as literals.\p{prop}
) are supported with the
following differences:
OldItalic
instead of Old_Italic
.Is
, script=
or sc=
prefixes. Example: \p{Hiragana}
In
prefix. The block=
and
blk=
prefixes are not supported. Example: \p{Mongolian}
Is
,
general_category=
or gc=
prefixes. Example: \p{L}
Is
.
Example: \p{NoncharacterCodePoint}
regexp_count(string, pattern)
→ bigint
Returns the number of occurrence of pattern
in string
:
regexp_extract_all(string, pattern)
→ array(varchar)
Returns the substring(s) matched by the regular expression pattern
in
string
:
regexp_extract_all(string, pattern, group)
→ array(varchar)
Finds all occurrences of the regular expression pattern
in string
and returns the capturing group number group
:
regexp_extract(string, pattern)
→ varchar
Returns the f substring matched by the regular expression pattern
in string
:
regexp_extract(string, pattern, group)
→ varchar
Finds the f occurrence of the regular expression pattern
in
string
and returns the capturing group number group
:
regexp_like(string, pattern)
→ boolean
Evaluates the regular expression pattern
and determines if it is
contained within string
.
The pattern
only needs to be contained within string
, rather than
needing to match all of string
. In other words, this performs a
contains operation rather than a match operation. You can match the
entire string by anchoring the pattern using ^
and $
:
regexp_position(string, pattern)
→ integer
Returns the index of the f occurrence (counting from 1) of pattern
in string
. Returns -1 if not found:
regexp_position(string, pattern, start)
→ integer
Returns the index of the f occurrence of pattern
in string
,
starting from start
(include start
). Returns -1 if not found:
regexp_position(string, pattern, start, occurrence)
→ integer
Returns the index of the nth occurrence
of pattern
in string
,
starting from start
(include start
). Returns -1 if not found:
regexp_replace(string, pattern)
→ varchar
Removes every instance of the substring matched by the regular
expression pattern
from string
:
SELECT regexp_replace(‘1a 2b 14m’, ‘\d+[ab] ’); — ‘14m’
regexp_replace(string, pattern, replacement)
→ varchar
Replaces every instance of the substring matched by the regular
expression pattern
in string
with replacement
. Capturing
groups can be referenced in replacement
using $g
for a numbered
group or ${name}
for a named group. A dollar sign ($
) may be
included in the replacement by escaping it with a backslash (\$
):
regexp_replace(string, pattern, function)
→ varchar
Replaces every instance of the substring matched by the regular
expression pattern
in string
using function
. The
lambda expression <lambda>
function is
invoked for each match with the capturing groups passed as an array.
Capturing group numbers start at one; there is no group for the entire
match (if you need this, surround the entire expression with
parenthesis). :
SELECT regexp_replace(‘new york’, ’(\w)(\w*)’, x -> upper(x[1]) || lower(x[2])); —‘New York’
regexp_split(string, pattern)
→ array(varchar)
Splits string
using the regular expression pattern
and returns an
array. Trailing empty strings are preserved:
All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
(?m)
flag), only \n
is recognized as a line terminator. Additionally, the (?d)
flag is
not supported and must not be used.(?i)
flag) is always
performed in a Unicode-aware manner. However, context-sensitive and
local-sensitive matching is not supported. Additionally, the (?u)
flag is not supported and must not be used.\uD800\uDC00
is
not treated as U+10000
and must be specified as \x{10000}
.\b
) are incorrectly handled for a non-spacing mark
without a base character.\Q
and \E
are not supported in character classes (such as
[A-Z123]
) and are instead treated as literals.\p{prop}
) are supported with the
following differences:
OldItalic
instead of Old_Italic
.Is
, script=
or sc=
prefixes. Example: \p{Hiragana}
In
prefix. The block=
and
blk=
prefixes are not supported. Example: \p{Mongolian}
Is
,
general_category=
or gc=
prefixes. Example: \p{L}
Is
.
Example: \p{NoncharacterCodePoint}
regexp_count(string, pattern)
→ bigint
Returns the number of occurrence of pattern
in string
:
regexp_extract_all(string, pattern)
→ array(varchar)
Returns the substring(s) matched by the regular expression pattern
in
string
:
regexp_extract_all(string, pattern, group)
→ array(varchar)
Finds all occurrences of the regular expression pattern
in string
and returns the capturing group number group
:
regexp_extract(string, pattern)
→ varchar
Returns the f substring matched by the regular expression pattern
in string
:
regexp_extract(string, pattern, group)
→ varchar
Finds the f occurrence of the regular expression pattern
in
string
and returns the capturing group number group
:
regexp_like(string, pattern)
→ boolean
Evaluates the regular expression pattern
and determines if it is
contained within string
.
The pattern
only needs to be contained within string
, rather than
needing to match all of string
. In other words, this performs a
contains operation rather than a match operation. You can match the
entire string by anchoring the pattern using ^
and $
:
regexp_position(string, pattern)
→ integer
Returns the index of the f occurrence (counting from 1) of pattern
in string
. Returns -1 if not found:
regexp_position(string, pattern, start)
→ integer
Returns the index of the f occurrence of pattern
in string
,
starting from start
(include start
). Returns -1 if not found:
regexp_position(string, pattern, start, occurrence)
→ integer
Returns the index of the nth occurrence
of pattern
in string
,
starting from start
(include start
). Returns -1 if not found:
regexp_replace(string, pattern)
→ varchar
Removes every instance of the substring matched by the regular
expression pattern
from string
:
SELECT regexp_replace(‘1a 2b 14m’, ‘\d+[ab] ’); — ‘14m’
regexp_replace(string, pattern, replacement)
→ varchar
Replaces every instance of the substring matched by the regular
expression pattern
in string
with replacement
. Capturing
groups can be referenced in replacement
using $g
for a numbered
group or ${name}
for a named group. A dollar sign ($
) may be
included in the replacement by escaping it with a backslash (\$
):
regexp_replace(string, pattern, function)
→ varchar
Replaces every instance of the substring matched by the regular
expression pattern
in string
using function
. The
lambda expression <lambda>
function is
invoked for each match with the capturing groups passed as an array.
Capturing group numbers start at one; there is no group for the entire
match (if you need this, surround the entire expression with
parenthesis). :
SELECT regexp_replace(‘new york’, ’(\w)(\w*)’, x -> upper(x[1]) || lower(x[2])); —‘New York’
regexp_split(string, pattern)
→ array(varchar)
Splits string
using the regular expression pattern
and returns an
array. Trailing empty strings are preserved: