String Functions and Operators¶
This page documents string operators and functions available in Polars SQL.
String Operators¶
The || operator performs concatenation. NULL operands are treated as
empty strings.
String Functions¶
Note
These functions assume that the input strings contain valid UTF-8 encoded Unicode code points. There are no explicit checks for valid UTF-8 and the functions may return incorrect results on invalid UTF-8.
Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.
The lower() and upper() functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
- concat(string1, ..., stringN) -> varchar()¶
Returns the concatenation of
string1,string2,...,stringN. This function provides the same functionality as the SQL-standard concatenation operator (||).NULLarguments are treated as empty strings.
- length(string) -> bigint()¶
Returns the length of
stringin Unicode code points.
- char_length(string) -> bigint()¶
Alias for
length().
- character_length(string) -> bigint()¶
Alias for
length().
- lower(string) -> varchar()¶
Converts
stringto lowercase.
- ltrim(string) -> varchar()¶
Removes leading SQL whitespace from
string.
- replace(string, search, replace) -> varchar()¶
Replaces all instances of
searchwithreplaceinstring.If
searchis an empty string, insertsreplacein front of every character and at the end of thestring.
- rtrim(string) -> varchar()¶
Removes trailing SQL whitespace from
string.
- substr(string, start) -> varchar()¶
Returns the rest of
stringfrom the starting positionstart. Positions start with1. Supported coverage is for positive positions.
- substr(string, start, length) -> varchar()¶
Returns a substring from
stringof lengthlengthfrom the starting positionstart. Positions start with1. Supported coverage is for positive positions.
- mid(string, start, length) -> varchar()¶
Alias for
substr().
- substring(string, start) -> varchar()¶
Alias for
substr().
- substring(string, start, length) -> varchar()¶
Alias for
substr().
- char(codepoint) -> varchar()¶
Returns a one-character string for a valid Unicode code point.
- beginwith(string, prefix) -> boolean()¶
Returns whether
stringstarts withprefix.
- startwith(string, prefix) -> boolean()¶
Alias for
beginwith().
- endwith(string, suffix) -> boolean()¶
Returns whether
stringends withsuffix.
- find(substring, string) -> bigint()¶
Returns the one-based position of
substringinstring, or0if the substring is not found.
- find(substring, string, start) -> bigint()
Returns the one-based position of
substringinstringstarting from one-based positionstart, or0if the substring is not found.
- trim(string) -> varchar()¶
Removes leading and trailing SQL whitespace from
string.Recognized whitespace characters:
Code
Description
Code
Description
9
TAB (horizontal tab)
U+1680
Ogham Space Mark
10
LF (NL line feed, new line)
U+2000
En Quad
11
VT (vertical tab)
U+2001
Em Quad
12
FF (NP form feed, new page)
U+2002
En Space
13
CR (carriage return)
U+2003
Em Space
28
FS (file separator)
U+2004
Three-Per-Em Space
29
GS (group separator)
U+2005
Four-Per-Em Space
30
RS (record separator)
U+2006
Four-Per-Em Space
31
US (unit separator)
U+2008
Punctuation Space
32
Space
U+2009
Thin Space
_
_
U+200a
Hair Space
_
_
U+200a
Hair Space
_
_
U+2028
Line Separator
_
_
U+2029
Paragraph Separator
_
_
U+205f
Medium Mathematical Space
_
_
U+3000
Ideographic Space
- upper(string) -> varchar()¶
Converts
stringto uppercase.