String Functions and Operators

This page documents string operators and functions available in Polars SQL.

String Operators

The || operator performs concatenation. NULL operands are treated as empty strings.

String Functions

Note

These functions assume that the input strings contain valid UTF-8 encoded Unicode code points. There are no explicit checks for valid UTF-8 and the functions may return incorrect results on invalid UTF-8.

Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.

The lower() and upper() functions do not perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish and Azeri.

concat(string1, ..., stringN) -> varchar()

Returns the concatenation of string1, string2, ..., stringN. This function provides the same functionality as the SQL-standard concatenation operator (||). NULL arguments are treated as empty strings.

length(string) -> bigint()

Returns the length of string in Unicode code points.

char_length(string) -> bigint()

Alias for length().

character_length(string) -> bigint()

Alias for length().

lower(string) -> varchar()

Converts string to lowercase.

ltrim(string) -> varchar()

Removes leading SQL whitespace from string.

replace(string, search, replace) -> varchar()

Replaces all instances of search with replace in string.

If search is an empty string, inserts replace in front of every character and at the end of the string.

rtrim(string) -> varchar()

Removes trailing SQL whitespace from string.

substr(string, start) -> varchar()

Returns the rest of string from the starting position start. Positions start with 1. Supported coverage is for positive positions.

substr(string, start, length) -> varchar()

Returns a substring from string of length length from the starting position start. Positions start with 1. Supported coverage is for positive positions.

mid(string, start, length) -> varchar()

Alias for substr().

substring(string, start) -> varchar()

Alias for substr().

substring(string, start, length) -> varchar()

Alias for substr().

char(codepoint) -> varchar()

Returns a one-character string for a valid Unicode code point.

beginwith(string, prefix) -> boolean()

Returns whether string starts with prefix.

startwith(string, prefix) -> boolean()

Alias for beginwith().

endwith(string, suffix) -> boolean()

Returns whether string ends with suffix.

find(substring, string) -> bigint()

Returns the one-based position of substring in string, or 0 if the substring is not found.

find(substring, string, start) -> bigint()

Returns the one-based position of substring in string starting from one-based position start, or 0 if the substring is not found.

trim(string) -> varchar()

Removes leading and trailing SQL whitespace from string.

Recognized whitespace characters:

Code

Description

Code

Description

9

TAB (horizontal tab)

U+1680

Ogham Space Mark

10

LF (NL line feed, new line)

U+2000

En Quad

11

VT (vertical tab)

U+2001

Em Quad

12

FF (NP form feed, new page)

U+2002

En Space

13

CR (carriage return)

U+2003

Em Space

28

FS (file separator)

U+2004

Three-Per-Em Space

29

GS (group separator)

U+2005

Four-Per-Em Space

30

RS (record separator)

U+2006

Four-Per-Em Space

31

US (unit separator)

U+2008

Punctuation Space

32

Space

U+2009

Thin Space

_

_

U+200a

Hair Space

_

_

U+200a

Hair Space

_

_

U+2028

Line Separator

_

_

U+2029

Paragraph Separator

_

_

U+205f

Medium Mathematical Space

_

_

U+3000

Ideographic Space

upper(string) -> varchar()

Converts string to uppercase.