Splits the string EXPR into a list of strings and returns the list in list context, or the size of the list in scalar context.
If only PATTERN is given, EXPR defaults to
Anything in EXPR that matches PATTERN is taken to be a separator that separates the EXPR into substrings (called "fields") that do not include the separator. Note that a separator may be longer than one character or even have no characters at all (the empty string, which is a zero-width match).
The PATTERN need not be constant; an expression may be used to specify a pattern that varies at runtime.
If PATTERN matches the empty string, the EXPR is split at the match position (between characters). As an example, the following:
uses the 'b' in 'abc' as a separator to produce the output 'a:c'. However, this:
uses empty string matches as separators to produce the output 'a:b:c'; thus, the empty string may be used to split EXPR into a list of its component characters.
If PATTERN is
, then it is treated as if it used the
multiline modifier (
), since it
isn't much use otherwise.
As another special case,
split emulates the default behavior of the
command line tool awk when the PATTERN is either omitted or a literal
string composed of a single space character (such as
, but not e.g.
). In this case, any leading
whitespace in EXPR is removed before splitting occurs, and the PATTERN is
instead treated as if it were
; in particular, this means that
any contiguous whitespace (not just a single space character) is used as
a separator. However, this special treatment can be avoided by specifying
instead of the string
, thereby allowing
only a single space character to be a separator.
If omitted, PATTERN defaults to a single space,
the previously described awk emulation.
If LIMIT is specified and positive, it represents the maximum number
of fields into which the EXPR may be split; in other words, LIMIT is
one greater than the maximum number of times EXPR may be split. Thus,
the LIMIT value
means that EXPR may be split a maximum of zero
times, producing a maximum of one field (namely, the entire value of
EXPR). For instance:
produces the output 'abc', and this:
produces the output 'a:bc', and each of these:
produces the output 'a:b:c'.
If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case). Thus, the following:
produces the output 'a:b:c', but the following:
produces the output 'a:b:c:::'.
In time-critical applications, it is worthwhile to avoid splitting into more fields than necessary. Thus, when assigning to a list, if LIMIT is omitted (or zero), then LIMIT is treated as though it were one larger than the number of variables in the list; for the following, LIMIT is implicitly 4:
- ($login, $passwd, $remainder) = split(/:/);
Note that splitting an EXPR that evaluates to the empty string always produces zero fields, regardless of the LIMIT specified.
An empty leading field is produced when there is a positive-width match at the beginning of EXPR. For instance:
produces the output ':abc'. However, a zero-width match at the beginning of EXPR never produces an empty field, so that:
produces the output ' :a:b:c' (rather than ': :a:b:c').
An empty trailing field, on the other hand, is produced when there is a match at the end of EXPR, regardless of the length of the match (of course, unless a non-zero LIMIT is given explicitly, such fields are removed, as in the last example). Thus:
produces the output ' :a:b:c:'.
If the PATTERN contains
then for each separator, an additional field is produced for each substring
captured by a group (in the order in which the groups are specified,
as per backreferences); if any group does not
match, then it captures the
undef value instead of a substring. Also,
note that any such additional field is produced whenever there is a
separator (that is, whenever a split occurs), and such an additional field
does not count towards the LIMIT. Consider the following expressions
evaluated in list context (each returned list is provided in the associated
- split(/-|,/, "1-10,20", 3)
- # ('1', '10', '20')
- split(/(-|,)/, "1-10,20", 3)
- # ('1', '-', '10', ',', '20')
- split(/-|(,)/, "1-10,20", 3)
- # ('1', undef, '10', ',', '20')
- split(/(-)|,/, "1-10,20", 3)
- # ('1', '-', '10', undef, '20')
- split(/(-)|(,)/, "1-10,20", 3)
- # ('1', '-', undef, '10', undef, ',', '20')