Modules

  • ABCDE
  • FGHIL
  • MNOPS
  • TUX

Tools

split

Perl 5 version 18.1 documentation
Recently read

split

  • split /PATTERN/,EXPR,LIMIT

  • split /PATTERN/,EXPR
  • split /PATTERN/
  • split

    Splits the string EXPR into a list of strings and returns the list in list context, or the size of the list in scalar context.

    If only PATTERN is given, EXPR defaults to $_ .

    Anything in EXPR that matches PATTERN is taken to be a separator that separates the EXPR into substrings (called "fields") that do not include the separator. Note that a separator may be longer than one character or even have no characters at all (the empty string, which is a zero-width match).

    The PATTERN need not be constant; an expression may be used to specify a pattern that varies at runtime.

    If PATTERN matches the empty string, the EXPR is split at the match position (between characters). As an example, the following:

    1. print join(':', split('b', 'abc')), "\n";

    uses the 'b' in 'abc' as a separator to produce the output 'a:c'. However, this:

    1. print join(':', split('', 'abc')), "\n";

    uses empty string matches as separators to produce the output 'a:b:c'; thus, the empty string may be used to split EXPR into a list of its component characters.

    As a special case for split, the empty pattern given in match operator syntax (// ) specifically matches the empty string, which is contrary to its usual interpretation as the last successful match.

    If PATTERN is /^/ , then it is treated as if it used the multiline modifier (/^/m ), since it isn't much use otherwise.

    As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as ' ' or "\x20" , but not e.g. / / ). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/ ; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator. However, this special treatment can be avoided by specifying the pattern / / instead of the string " " , thereby allowing only a single space character to be a separator. In earlier Perl's this special case was restricted to the use of a plain " " as the pattern argument to split, in Perl 5.18.0 and later this special case is triggered by any expression which evaluates as the simple string " " .

    If omitted, PATTERN defaults to a single space, " " , triggering the previously described awk emulation.

    If LIMIT is specified and positive, it represents the maximum number of fields into which the EXPR may be split; in other words, LIMIT is one greater than the maximum number of times EXPR may be split. Thus, the LIMIT value 1 means that EXPR may be split a maximum of zero times, producing a maximum of one field (namely, the entire value of EXPR). For instance:

    1. print join(':', split(//, 'abc', 1)), "\n";

    produces the output 'abc', and this:

    1. print join(':', split(//, 'abc', 2)), "\n";

    produces the output 'a:bc', and each of these:

    1. print join(':', split(//, 'abc', 3)), "\n";
    2. print join(':', split(//, 'abc', 4)), "\n";

    produces the output 'a:b:c'.

    If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.

    If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case). Thus, the following:

    1. print join(':', split(',', 'a,b,c,,,')), "\n";

    produces the output 'a:b:c', but the following:

    1. print join(':', split(',', 'a,b,c,,,', -1)), "\n";

    produces the output 'a:b:c:::'.

    In time-critical applications, it is worthwhile to avoid splitting into more fields than necessary. Thus, when assigning to a list, if LIMIT is omitted (or zero), then LIMIT is treated as though it were one larger than the number of variables in the list; for the following, LIMIT is implicitly 3:

    1. ($login, $passwd) = split(/:/);

    Note that splitting an EXPR that evaluates to the empty string always produces zero fields, regardless of the LIMIT specified.

    An empty leading field is produced when there is a positive-width match at the beginning of EXPR. For instance:

    1. print join(':', split(/ /, ' abc')), "\n";

    produces the output ':abc'. However, a zero-width match at the beginning of EXPR never produces an empty field, so that:

    1. print join(':', split(//, ' abc'));

    produces the output ' :a:b:c' (rather than ': :a:b:c').

    An empty trailing field, on the other hand, is produced when there is a match at the end of EXPR, regardless of the length of the match (of course, unless a non-zero LIMIT is given explicitly, such fields are removed, as in the last example). Thus:

    1. print join(':', split(//, ' abc', -1)), "\n";

    produces the output ' :a:b:c:'.

    If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures the undef value instead of a substring. Also, note that any such additional field is produced whenever there is a separator (that is, whenever a split occurs), and such an additional field does not count towards the LIMIT. Consider the following expressions evaluated in list context (each returned list is provided in the associated comment):

    1. split(/-|,/, "1-10,20", 3)
    2. # ('1', '10', '20')
    3. split(/(-|,)/, "1-10,20", 3)
    4. # ('1', '-', '10', ',', '20')
    5. split(/-|(,)/, "1-10,20", 3)
    6. # ('1', undef, '10', ',', '20')
    7. split(/(-)|,/, "1-10,20", 3)
    8. # ('1', '-', '10', undef, '20')
    9. split(/(-)|(,)/, "1-10,20", 3)
    10. # ('1', '-', undef, '10', undef, ',', '20')