ECMAScript® 2024 Language Specification

Draft ECMA-262 / February 15, 2024

22.2 RegExp (Regular Expression) Objects

A RegExp object contains a regular expression and the associated flags.

Note

The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.

22.2.1 Patterns

The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of Pattern.

Syntax

Pattern[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Disjunction[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: Alternative[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Alternative[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] | Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Alternative[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: [empty] Alternative[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Term[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Term[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: Assertion[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Atom[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Atom[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Quantifier Assertion[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: ^ $ \b \B (?= Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] ) (?! Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] ) (?<= Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] ) (?<! Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] ) Quantifier :: QuantifierPrefix QuantifierPrefix ? QuantifierPrefix :: * + ? { DecimalDigits[~Sep] } { DecimalDigits[~Sep] ,} { DecimalDigits[~Sep] , DecimalDigits[~Sep] } Atom[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: PatternCharacter . \ AtomEscape[?UnicodeMode, ?NamedCaptureGroups] CharacterClass[?UnicodeMode, ?UnicodeSetsMode] ( GroupSpecifier[?UnicodeMode]opt Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] ) (?: Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] ) SyntaxCharacter :: one of ^ $ \ . * + ? ( ) [ ] { } | PatternCharacter :: SourceCharacter but not SyntaxCharacter AtomEscape[UnicodeMode, NamedCaptureGroups] :: DecimalEscape CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode] [+NamedCaptureGroups] k GroupName[?UnicodeMode] CharacterEscape[UnicodeMode] :: ControlEscape c AsciiLetter 0 [lookahead ∉ DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?UnicodeMode] IdentityEscape[?UnicodeMode] ControlEscape :: one of f n r t v GroupSpecifier[UnicodeMode] :: ? GroupName[?UnicodeMode] GroupName[UnicodeMode] :: < RegExpIdentifierName[?UnicodeMode] > RegExpIdentifierName[UnicodeMode] :: RegExpIdentifierStart[?UnicodeMode] RegExpIdentifierName[?UnicodeMode] RegExpIdentifierPart[?UnicodeMode] RegExpIdentifierStart[UnicodeMode] :: IdentifierStartChar \ RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart[UnicodeMode] :: IdentifierPartChar \ RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpUnicodeEscapeSequence[UnicodeMode] :: [+UnicodeMode] u HexLeadSurrogate \u HexTrailSurrogate [+UnicodeMode] u HexLeadSurrogate [+UnicodeMode] u HexTrailSurrogate [+UnicodeMode] u HexNonSurrogate [~UnicodeMode] u Hex4Digits [+UnicodeMode] u{ CodePoint } UnicodeLeadSurrogate :: any Unicode code point in the inclusive interval from U+D800 to U+DBFF UnicodeTrailSurrogate :: any Unicode code point in the inclusive interval from U+DC00 to U+DFFF

Each \u HexTrailSurrogate for which the choice of associated u HexLeadSurrogate is ambiguous shall be associated with the nearest possible u HexLeadSurrogate that would otherwise have no corresponding \u HexTrailSurrogate.

HexLeadSurrogate :: Hex4Digits but only if the MV of Hex4Digits is in the inclusive interval from 0xD800 to 0xDBFF HexTrailSurrogate :: Hex4Digits but only if the MV of Hex4Digits is in the inclusive interval from 0xDC00 to 0xDFFF HexNonSurrogate :: Hex4Digits but only if the MV of Hex4Digits is not in the inclusive interval from 0xD800 to 0xDFFF IdentityEscape[UnicodeMode] :: [+UnicodeMode] SyntaxCharacter [+UnicodeMode] / [~UnicodeMode] SourceCharacter but not UnicodeIDContinue DecimalEscape :: NonZeroDigit DecimalDigits[~Sep]opt [lookahead ∉ DecimalDigit] CharacterClassEscape[UnicodeMode] :: d D s S w W [+UnicodeMode] p{ UnicodePropertyValueExpression } [+UnicodeMode] P{ UnicodePropertyValueExpression } UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue LoneUnicodePropertyNameOrValue UnicodePropertyName :: UnicodePropertyNameCharacters UnicodePropertyNameCharacters :: UnicodePropertyNameCharacter UnicodePropertyNameCharactersopt UnicodePropertyValue :: UnicodePropertyValueCharacters LoneUnicodePropertyNameOrValue :: UnicodePropertyValueCharacters UnicodePropertyValueCharacters :: UnicodePropertyValueCharacter UnicodePropertyValueCharactersopt UnicodePropertyValueCharacter :: UnicodePropertyNameCharacter DecimalDigit UnicodePropertyNameCharacter :: AsciiLetter _ CharacterClass[UnicodeMode, UnicodeSetsMode] :: [ [lookahead ≠ ^] ClassContents[?UnicodeMode, ?UnicodeSetsMode] ] [^ ClassContents[?UnicodeMode, ?UnicodeSetsMode] ] ClassContents[UnicodeMode, UnicodeSetsMode] :: [empty] [~UnicodeSetsMode] NonemptyClassRanges[?UnicodeMode] [+UnicodeSetsMode] ClassSetExpression NonemptyClassRanges[UnicodeMode] :: ClassAtom[?UnicodeMode] ClassAtom[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] ClassAtom[?UnicodeMode] - ClassAtom[?UnicodeMode] ClassContents[?UnicodeMode, ~UnicodeSetsMode] NonemptyClassRangesNoDash[UnicodeMode] :: ClassAtom[?UnicodeMode] ClassAtomNoDash[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] ClassAtomNoDash[?UnicodeMode] - ClassAtom[?UnicodeMode] ClassContents[?UnicodeMode, ~UnicodeSetsMode] ClassAtom[UnicodeMode] :: - ClassAtomNoDash[?UnicodeMode] ClassAtomNoDash[UnicodeMode] :: SourceCharacter but not one of \ or ] or - \ ClassEscape[?UnicodeMode] ClassEscape[UnicodeMode] :: b [+UnicodeMode] - CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode] ClassSetExpression :: ClassUnion ClassIntersection ClassSubtraction ClassUnion :: ClassSetRange ClassUnionopt ClassSetOperand ClassUnionopt ClassIntersection :: ClassSetOperand && [lookahead ≠ &] ClassSetOperand ClassIntersection && [lookahead ≠ &] ClassSetOperand ClassSubtraction :: ClassSetOperand -- ClassSetOperand ClassSubtraction -- ClassSetOperand ClassSetRange :: ClassSetCharacter - ClassSetCharacter ClassSetOperand :: NestedClass ClassStringDisjunction ClassSetCharacter NestedClass :: [ [lookahead ≠ ^] ClassContents[+UnicodeMode, +UnicodeSetsMode] ] [^ ClassContents[+UnicodeMode, +UnicodeSetsMode] ] \ CharacterClassEscape[+UnicodeMode] Note 1

The first two lines here are equivalent to CharacterClass.

ClassStringDisjunction :: \q{ ClassStringDisjunctionContents } ClassStringDisjunctionContents :: ClassString ClassString | ClassStringDisjunctionContents ClassString :: [empty] NonEmptyClassString NonEmptyClassString :: ClassSetCharacter NonEmptyClassStringopt ClassSetCharacter :: [lookahead ∉ ClassSetReservedDoublePunctuator] SourceCharacter but not ClassSetSyntaxCharacter \ CharacterEscape[+UnicodeMode] \ ClassSetReservedPunctuator \b ClassSetReservedDoublePunctuator :: one of && !! ## $$ %% ** ++ ,, .. :: ;; << == >> ?? @@ ^^ `` ~~ ClassSetSyntaxCharacter :: one of ( ) [ ] { } / - \ | ClassSetReservedPunctuator :: one of & - ! # % , : ; < = > @ ` ~ Note 2

A number of productions in this section are given alternative definitions in section B.1.2.

22.2.1.1 Static Semantics: Early Errors

Note

This section is amended in B.1.2.1.

Pattern :: Disjunction QuantifierPrefix :: { DecimalDigits , DecimalDigits } AtomEscape :: k GroupName AtomEscape :: DecimalEscape NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue CharacterClassEscape :: P{ UnicodePropertyValueExpression } CharacterClass :: [^ ClassContents ] NestedClass :: [^ ClassContents ] ClassSetRange :: ClassSetCharacter - ClassSetCharacter

22.2.1.2 Static Semantics: CountLeftCapturingParensWithin ( node )

The abstract operation CountLeftCapturingParensWithin takes argument node (a Parse Node) and returns a non-negative integer. It returns the number of left-capturing parentheses in node. A left-capturing parenthesis is any ( pattern character that is matched by the ( terminal of the Atom :: ( GroupSpecifieropt Disjunction ) production.

Note

This section is amended in B.1.2.2.

It performs the following steps when called:

  1. Assert: node is an instance of a production in the RegExp Pattern grammar.
  2. Return the number of Atom :: ( GroupSpecifieropt Disjunction ) Parse Nodes contained within node.

22.2.1.3 Static Semantics: CountLeftCapturingParensBefore ( node )

The abstract operation CountLeftCapturingParensBefore takes argument node (a Parse Node) and returns a non-negative integer. It returns the number of left-capturing parentheses within the enclosing pattern that occur to the left of node.

Note

This section is amended in B.1.2.2.

It performs the following steps when called:

  1. Assert: node is an instance of a production in the RegExp Pattern grammar.
  2. Let pattern be the Pattern containing node.
  3. Return the number of Atom :: ( GroupSpecifieropt Disjunction ) Parse Nodes contained within pattern that either occur before node or contain node.

22.2.1.4 Static Semantics: CapturingGroupNumber

The syntax-directed operation CapturingGroupNumber takes no arguments and returns a positive integer.

Note

This section is amended in B.1.2.1.

It is defined piecewise over the following productions:

DecimalEscape :: NonZeroDigit
  1. Return the MV of NonZeroDigit.
DecimalEscape :: NonZeroDigit DecimalDigits
  1. Let n be the number of code points in DecimalDigits.
  2. Return (the MV of NonZeroDigit × 10n plus the MV of DecimalDigits).

The definitions of “the MV of NonZeroDigit” and “the MV of DecimalDigits” are in 12.9.3.

22.2.1.5 Static Semantics: IsCharacterClass

The syntax-directed operation IsCharacterClass takes no arguments and returns a Boolean.

Note

This section is amended in B.1.2.3.

It is defined piecewise over the following productions:

ClassAtom :: - ClassAtomNoDash :: SourceCharacter but not one of \ or ] or - ClassEscape :: b - CharacterEscape
  1. Return false.
ClassEscape :: CharacterClassEscape
  1. Return true.

22.2.1.6 Static Semantics: CharacterValue

The syntax-directed operation CharacterValue takes no arguments and returns a non-negative integer.

Note 1

This section is amended in B.1.2.4.

It is defined piecewise over the following productions:

ClassAtom :: -
  1. Return the numeric value of U+002D (HYPHEN-MINUS).
ClassAtomNoDash :: SourceCharacter but not one of \ or ] or -
  1. Let ch be the code point matched by SourceCharacter.
  2. Return the numeric value of ch.
ClassEscape :: b
  1. Return the numeric value of U+0008 (BACKSPACE).
ClassEscape :: -
  1. Return the numeric value of U+002D (HYPHEN-MINUS).
CharacterEscape :: ControlEscape
  1. Return the numeric value according to Table 65.
Table 65: ControlEscape Code Point Values
ControlEscape Numeric Value Code Point Unicode Name Symbol
t 9 U+0009 CHARACTER TABULATION <HT>
n 10 U+000A LINE FEED (LF) <LF>
v 11 U+000B LINE TABULATION <VT>
f 12 U+000C FORM FEED (FF) <FF>
r 13 U+000D CARRIAGE RETURN (CR) <CR>
CharacterEscape :: c AsciiLetter
  1. Let ch be the code point matched by AsciiLetter.
  2. Let i be the numeric value of ch.
  3. Return the remainder of dividing i by 32.
CharacterEscape :: 0 [lookahead ∉ DecimalDigit]
  1. Return the numeric value of U+0000 (NULL).
Note 2

\0 represents the <NUL> character and cannot be followed by a decimal digit.

CharacterEscape :: HexEscapeSequence
  1. Return the MV of HexEscapeSequence.
RegExpUnicodeEscapeSequence :: u HexLeadSurrogate \u HexTrailSurrogate
  1. Let lead be the CharacterValue of HexLeadSurrogate.
  2. Let trail be the CharacterValue of HexTrailSurrogate.
  3. Let cp be UTF16SurrogatePairToCodePoint(lead, trail).
  4. Return the numeric value of cp.
RegExpUnicodeEscapeSequence :: u Hex4Digits
  1. Return the MV of Hex4Digits.
RegExpUnicodeEscapeSequence :: u{ CodePoint }
  1. Return the MV of CodePoint.
HexLeadSurrogate :: Hex4Digits HexTrailSurrogate :: Hex4Digits HexNonSurrogate :: Hex4Digits
  1. Return the MV of Hex4Digits.
CharacterEscape :: IdentityEscape
  1. Let ch be the code point matched by IdentityEscape.
  2. Return the numeric value of ch.
ClassSetCharacter :: SourceCharacter but not ClassSetSyntaxCharacter
  1. Let ch be the code point matched by SourceCharacter.
  2. Return the numeric value of ch.
ClassSetCharacter :: \ ClassSetReservedPunctuator
  1. Let ch be the code point matched by ClassSetReservedPunctuator.
  2. Return the numeric value of ch.
ClassSetCharacter :: \b
  1. Return the numeric value of U+0008 (BACKSPACE).

22.2.1.7 Static Semantics: MayContainStrings

The syntax-directed operation MayContainStrings takes no arguments and returns a Boolean. It is defined piecewise over the following productions:

CharacterClassEscape :: d D s S w W P{ UnicodePropertyValueExpression } UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue NestedClass :: [^ ClassContents ] ClassContents :: [empty] NonemptyClassRanges ClassSetOperand :: ClassSetCharacter
  1. Return false.
UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue
  1. If the source text matched by LoneUnicodePropertyNameOrValue is a binary property of strings listed in the “Property name” column of Table 69, return true.
  2. Return false.
ClassUnion :: ClassSetRange ClassUnionopt
  1. If the ClassUnion is present, return MayContainStrings of the ClassUnion.
  2. Return false.
ClassUnion :: ClassSetOperand ClassUnionopt
  1. If MayContainStrings of the ClassSetOperand is true, return true.
  2. If ClassUnion is present, return MayContainStrings of the ClassUnion.
  3. Return false.
ClassIntersection :: ClassSetOperand && ClassSetOperand
  1. If MayContainStrings of the first ClassSetOperand is false, return false.
  2. If MayContainStrings of the second ClassSetOperand is false, return false.
  3. Return true.
ClassIntersection :: ClassIntersection && ClassSetOperand
  1. If MayContainStrings of the ClassIntersection is false, return false.
  2. If MayContainStrings of the ClassSetOperand is false, return false.
  3. Return true.
ClassSubtraction :: ClassSetOperand -- ClassSetOperand
  1. Return MayContainStrings of the first ClassSetOperand.
ClassSubtraction :: ClassSubtraction -- ClassSetOperand
  1. Return MayContainStrings of the ClassSubtraction.
ClassStringDisjunctionContents :: ClassString | ClassStringDisjunctionContents
  1. If MayContainStrings of the ClassString is true, return true.
  2. Return MayContainStrings of the ClassStringDisjunctionContents.
ClassString :: [empty]
  1. Return true.
ClassString :: NonEmptyClassString
  1. Return MayContainStrings of the NonEmptyClassString.
NonEmptyClassString :: ClassSetCharacter NonEmptyClassStringopt
  1. If NonEmptyClassString is present, return true.
  2. Return false.

22.2.1.8 Static Semantics: GroupSpecifiersThatMatch ( thisGroupName )

The abstract operation GroupSpecifiersThatMatch takes argument thisGroupName (a GroupName Parse Node) and returns a List of GroupSpecifier Parse Nodes. It performs the following steps when called:

  1. Let name be the CapturingGroupName of thisGroupName.
  2. Let pattern be the Pattern containing thisGroupName.
  3. Let result be a new empty List.
  4. For each GroupSpecifier gs that pattern contains, do
    1. If the CapturingGroupName of gs is name, then
      1. Append gs to result.
  5. Return result.

22.2.1.9 Static Semantics: CapturingGroupName

The syntax-directed operation CapturingGroupName takes no arguments and returns a String. It is defined piecewise over the following productions:

GroupName :: < RegExpIdentifierName >
  1. Let idTextUnescaped be RegExpIdentifierCodePoints of RegExpIdentifierName.
  2. Return CodePointsToString(idTextUnescaped).

22.2.1.10 Static Semantics: RegExpIdentifierCodePoints

The syntax-directed operation RegExpIdentifierCodePoints takes no arguments and returns a List of code points. It is defined piecewise over the following productions:

RegExpIdentifierName :: RegExpIdentifierStart
  1. Let cp be RegExpIdentifierCodePoint of RegExpIdentifierStart.
  2. Return « cp ».
RegExpIdentifierName :: RegExpIdentifierName RegExpIdentifierPart
  1. Let cps be RegExpIdentifierCodePoints of the derived RegExpIdentifierName.
  2. Let cp be RegExpIdentifierCodePoint of RegExpIdentifierPart.
  3. Return the list-concatenation of cps and « cp ».

22.2.1.11 Static Semantics: RegExpIdentifierCodePoint

The syntax-directed operation RegExpIdentifierCodePoint takes no arguments and returns a code point. It is defined piecewise over the following productions:

RegExpIdentifierStart :: IdentifierStartChar
  1. Return the code point matched by IdentifierStartChar.
RegExpIdentifierPart :: IdentifierPartChar
  1. Return the code point matched by IdentifierPartChar.
RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence
  1. Return the code point whose numeric value is the CharacterValue of RegExpUnicodeEscapeSequence.
RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
  1. Let lead be the code unit whose numeric value is the numeric value of the code point matched by UnicodeLeadSurrogate.
  2. Let trail be the code unit whose numeric value is the numeric value of the code point matched by UnicodeTrailSurrogate.
  3. Return UTF16SurrogatePairToCodePoint(lead, trail).

22.2.2 Pattern Semantics

A regular expression pattern is converted into an Abstract Closure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The Abstract Closure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.

A Pattern is a BMP pattern if its associated flags contain neither a u nor a v. Otherwise, it is a Unicode pattern. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (6.1.4). In either context, “character value” means the numeric value of the corresponding non-encoded code point.

The syntax and semantics of Pattern is defined as if the source text for the Pattern was a List of SourceCharacter values where each SourceCharacter corresponds to a Unicode code point. If a BMP pattern contains a non-BMP SourceCharacter the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.

Note

For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character) List consisting of the single code point U+1D11E. However, interpreted as a BMP pattern, it is first UTF-16 encoded to produce a two element List consisting of the code units 0xD834 and 0xDD1E.

Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern UTF16SurrogatePairToCodePoint must be used in producing a List whose sole element is a single pattern character, the code point U+1D11E.

An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.

22.2.2.1 Notation

The descriptions below use the following internal data structures:

  • A CharSetElement is one of the two following entities:
    • If rer.[[UnicodeSets]] is false, then a CharSetElement is a character in the sense of the Pattern Semantics above.
    • If rer.[[UnicodeSets]] is true, then a CharSetElement is a sequence whose elements are characters in the sense of the Pattern Semantics above. This includes the empty sequence, sequences of one character, and sequences of more than one character. For convenience, when working with CharSetElements of this kind, an individual character is treated interchangeably with a sequence of one character.
  • A CharSet is a mathematical set of CharSetElements.
  • A CaptureRange is a Record { [[StartIndex]], [[EndIndex]] } that represents the range of characters included in a capture, where [[StartIndex]] is an integer representing the start index (inclusive) of the range within Input, and [[EndIndex]] is an integer representing the end index (exclusive) of the range within Input. For any CaptureRange, these indices must satisfy the invariant that [[StartIndex]][[EndIndex]].
  • A MatchState is a Record { [[Input]], [[EndIndex]], [[Captures]] } where [[Input]] is a List of characters representing the String being matched, [[EndIndex]] is an integer, and [[Captures]] is a List of values, one for each left-capturing parenthesis in the pattern. States are used to represent partial match states in the regular expression matching algorithms. The [[EndIndex]] is one plus the index of the last input character matched so far by the pattern, while [[Captures]] holds the results of capturing parentheses. The nth element of [[Captures]] is either a CaptureRange representing the range of characters captured by the nth set of capturing parentheses, or undefined if the nth set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
  • A MatchResult is either a MatchState or the special token failure that indicates that the match failed.
  • A MatcherContinuation is an Abstract Closure that takes one MatchState argument and returns a MatchResult result. The MatcherContinuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against Input, starting at the intermediate state given by its MatchState argument. If the match succeeds, the MatcherContinuation returns the final MatchState that it reached; if the match fails, the MatcherContinuation returns failure.
  • A Matcher is an Abstract Closure that takes two arguments—a MatchState and a MatcherContinuation—and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against the MatchState's [[Input]], starting at the intermediate state given by its MatchState argument. The MatcherContinuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new MatchState, the Matcher then calls MatcherContinuation on that new MatchState to test if the rest of the pattern can match as well. If it can, the Matcher returns the MatchState returned by MatcherContinuation; if not, the Matcher may try different choices at its choice points, repeatedly calling MatcherContinuation until it either succeeds or all possibilities have been exhausted.

22.2.2.1.1 RegExp Records

A RegExp Record is a Record value used to store information about a RegExp that is needed during compilation and possibly during matching.

It has the following fields:

Table 66: RegExp Record Fields
Field Name Value Meaning
[[IgnoreCase]] a Boolean indicates whether "i" appears in the RegExp's flags
[[Multiline]] a Boolean indicates whether "m" appears in the RegExp's flags
[[DotAll]] a Boolean indicates whether "s" appears in the RegExp's flags
[[Unicode]] a Boolean indicates whether "u" appears in the RegExp's flags
[[UnicodeSets]] a Boolean indicates whether "v" appears in the RegExp's flags
[[CapturingGroupsCount]] a non-negative integer the number of left-capturing parentheses in the RegExp's pattern

22.2.2.2 Runtime Semantics: CompilePattern

The syntax-directed operation CompilePattern takes argument rer (a RegExp Record) and returns an Abstract Closure that takes a List of characters and a non-negative integer and returns a MatchResult. It is defined piecewise over the following productions:

Pattern :: Disjunction
  1. Let m be CompileSubpattern of Disjunction with arguments rer and forward.
  2. Return a new Abstract Closure with parameters (Input, index) that captures rer and m and performs the following steps when called:
    1. Assert: Input is a List of characters.
    2. Assert: 0 ≤ index ≤ the number of elements in Input.
    3. Let c be a new MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a MatchState.
      2. Return y.
    4. Let cap be a List of rer.[[CapturingGroupsCount]] undefined values, indexed 1 through rer.[[CapturingGroupsCount]].
    5. Let x be the MatchState { [[Input]]: Input, [[EndIndex]]: index, [[Captures]]: cap }.
    6. Return m(x, c).
Note

A Pattern compiles to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a List of characters and an offset within that List to determine whether the pattern would match starting at exactly that offset within the List, and, if it does match, what the values of the capturing parentheses would be. The algorithms in 22.2.2 are designed so that compiling a pattern may throw a SyntaxError exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a List of characters cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).

22.2.2.3 Runtime Semantics: CompileSubpattern

The syntax-directed operation CompileSubpattern takes arguments rer (a RegExp Record) and direction (forward or backward) and returns a Matcher.

Note 1

This section is amended in B.1.2.5.

It is defined piecewise over the following productions:

Disjunction :: Alternative | Disjunction
  1. Let m1 be CompileSubpattern of Alternative with arguments rer and direction.
  2. Let m2 be CompileSubpattern of Disjunction with arguments rer and direction.
  3. Return MatchTwoAlternatives(m1, m2).
Note 2

The | regular expression operator separates two alternatives. The pattern first tries to match the left Alternative (followed by the sequel of the regular expression); if it fails, it tries to match the right Disjunction (followed by the sequel of the regular expression). If the left Alternative, the right Disjunction, and the sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left Alternative. If choices in the left Alternative are exhausted, the right Disjunction is tried instead of the left Alternative. Any capturing parentheses inside a portion of the pattern skipped by | produce undefined values instead of Strings. Thus, for example,

/a|ab/.exec("abc")

returns the result "a" and not "ab". Moreover,

/((a)|(ab))((c)|(bc))/.exec("abc")

returns the array

["abc", "a", "a", undefined, "bc", undefined, "bc"]

and not

["abc", "ab", undefined, "ab", "c", "c", undefined]

The order in which the two alternatives are tried is independent of the value of direction.

Alternative :: [empty]
  1. Return EmptyMatcher().
Alternative :: Alternative Term
  1. Let m1 be CompileSubpattern of Alternative with arguments rer and direction.
  2. Let m2 be CompileSubpattern of Term with arguments rer and direction.
  3. Return MatchSequence(m1, m2, direction).
Note 3

Consecutive Terms try to simultaneously match consecutive portions of Input. When direction is forward, if the left Alternative, the right Term, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right Term, and all choices in the right Term are tried before moving on to the next choice in the left Alternative. When direction is backward, the evaluation order of Alternative and Term are reversed.

Term :: Assertion
  1. Return CompileAssertion of Assertion with argument rer.
Note 4

The resulting Matcher is independent of direction.

Term :: Atom
  1. Return CompileAtom of Atom with arguments rer and direction.
Term :: Atom Quantifier
  1. Let m be CompileAtom of Atom with arguments rer and direction.
  2. Let q be CompileQuantifier of Quantifier.
  3. Assert: q.[[Min]]q.[[Max]].
  4. Let parenIndex be CountLeftCapturingParensBefore(Term).
  5. Let parenCount be CountLeftCapturingParensWithin(Atom).
  6. Return a new Matcher with parameters (x, c) that captures m, q, parenIndex, and parenCount and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Return RepeatMatcher(m, q.[[Min]], q.[[Max]], q.[[Greedy]], x, c, parenIndex, parenCount).

22.2.2.3.1 RepeatMatcher ( m, min, max, greedy, x, c, parenIndex, parenCount )

The abstract operation RepeatMatcher takes arguments m (a Matcher), min (a non-negative integer), max (a non-negative integer or +∞), greedy (a Boolean), x (a MatchState), c (a MatcherContinuation), parenIndex (a non-negative integer), and parenCount (a non-negative integer) and returns a MatchResult. It performs the following steps when called:

  1. If max = 0, return c(x).
  2. Let d be a new MatcherContinuation with parameters (y) that captures m, min, max, greedy, x, c, parenIndex, and parenCount and performs the following steps when called:
    1. Assert: y is a MatchState.
    2. If min = 0 and y.[[EndIndex]] = x.[[EndIndex]], return failure.
    3. If min = 0, let min2 be 0; otherwise let min2 be min - 1.
    4. If max = +∞, let max2 be +∞; otherwise let max2 be max - 1.
    5. Return RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount).
  3. Let cap be a copy of x.[[Captures]].
  4. For each integer k in the inclusive interval from parenIndex + 1 to parenIndex + parenCount, set cap[k] to undefined.
  5. Let Input be x.[[Input]].
  6. Let e be x.[[EndIndex]].
  7. Let xr be the MatchState { [[Input]]: Input, [[EndIndex]]: e, [[Captures]]: cap }.
  8. If min ≠ 0, return m(xr, d).
  9. If greedy is false, then
    1. Let z be c(x).
    2. If z is not failure, return z.
    3. Return m(xr, d).
  10. Let z be m(xr, d).
  11. If z is not failure, return z.
  12. Return c(x).
Note 1

An Atom followed by a Quantifier is repeated the number of times specified by the Quantifier. A Quantifier can be non-greedy, in which case the Atom pattern is repeated as few times as possible while still matching the sequel, or it can be greedy, in which case the Atom pattern is repeated as many times as possible while still matching the sequel. The Atom pattern is repeated rather than the input character sequence that it matches, so different repetitions of the Atom can match different input substrings.

Note 2

If the Atom and the sequel of the regular expression all have choice points, the Atom is first matched as many (or as few, if non-greedy) times as possible. All choices in the sequel are tried before moving on to the next choice in the last repetition of Atom. All choices in the last (nth) repetition of Atom are tried before moving on to the next choice in the next-to-last (n - 1)st repetition of Atom; at which point it may turn out that more or fewer repetitions of Atom are now possible; these are exhausted (again, starting with either as few or as many as possible) before moving on to the next choice in the (n - 1)st repetition of Atom and so on.

Compare

/a[a-z]{2,4}/.exec("abcdefghi")

which returns "abcde" with

/a[a-z]{2,4}?/.exec("abcdefghi")

which returns "abc".

Consider also

/(aa|aabaac|ba|b|c)*/.exec("aabaac")

which, by the choice point ordering above, returns the array

["aaba", "ba"]

and not any of:

["aabaac", "aabaac"]
["aabaac", "c"]

The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:

"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")

which returns the gcd in unary notation "aaaaa".

Note 3

Step 4 of the RepeatMatcher clears Atom's captures each time Atom is repeated. We can see its behaviour in the regular expression

/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")

which returns the array

["zaacbbbcac", "z", "ac", "a", undefined, "c"]

and not

["zaacbbbcac", "z", "ac", "a", "bbb", "c"]

because each iteration of the outermost * clears all captured Strings contained in the quantified Atom, which in this case includes capture Strings numbered 2, 3, 4, and 5.

Note 4

Step 2.b of the RepeatMatcher states that once the minimum number of repetitions has been satisfied, any more expansions of Atom that match the empty character sequence are not considered for further repetitions. This prevents the regular expression engine from falling into an infinite loop on patterns such as:

/(a*)*/.exec("b")

or the slightly more complicated:

/(a*)b\1+/.exec("baaaac")

which returns the array

["b", ""]

22.2.2.3.2 EmptyMatcher ( )

The abstract operation EmptyMatcher takes no arguments and returns a Matcher. It performs the following steps when called:

  1. Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Return c(x).

22.2.2.3.3 MatchTwoAlternatives ( m1, m2 )

The abstract operation MatchTwoAlternatives takes arguments m1 (a Matcher) and m2 (a Matcher) and returns a Matcher. It performs the following steps when called:

  1. Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let r be m1(x, c).
    4. If r is not failure, return r.
    5. Return m2(x, c).

22.2.2.3.4 MatchSequence ( m1, m2, direction )

The abstract operation MatchSequence takes arguments m1 (a Matcher), m2 (a Matcher), and direction (forward or backward) and returns a Matcher. It performs the following steps when called:

  1. If direction is forward, then
    1. Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
      1. Assert: x is a MatchState.
      2. Assert: c is a MatcherContinuation.
      3. Let d be a new MatcherContinuation with parameters (y) that captures c and m2 and performs the following steps when called:
        1. Assert: y is a MatchState.
        2. Return m2(y, c).
      4. Return m1(x, d).
  2. Else,
    1. Assert: direction is backward.
    2. Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
      1. Assert: x is a MatchState.
      2. Assert: c is a MatcherContinuation.
      3. Let d be a new MatcherContinuation with parameters (y) that captures c and m1 and performs the following steps when called:
        1. Assert: y is a MatchState.
        2. Return m1(y, c).
      4. Return m2(x, d).

22.2.2.4 Runtime Semantics: CompileAssertion

The syntax-directed operation CompileAssertion takes argument rer (a RegExp Record) and returns a Matcher.

Note 1

This section is amended in B.1.2.6.

It is defined piecewise over the following productions:

Assertion :: ^
  1. Return a new Matcher with parameters (x, c) that captures rer and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let Input be x.[[Input]].
    4. Let e be x.[[EndIndex]].
    5. If e = 0, or if rer.[[Multiline]] is true and the character Input[e - 1] is matched by LineTerminator, then
      1. Return c(x).
    6. Return failure.
Note 2

Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if rer.[[Multiline]] is true) at the beginning of a line.

Assertion :: $
  1. Return a new Matcher with parameters (x, c) that captures rer and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let Input be x.[[Input]].
    4. Let e be x.[[EndIndex]].
    5. Let InputLength be the number of elements in Input.
    6. If e = InputLength, or if rer.[[Multiline]] is true and the character Input[e] is matched by LineTerminator, then
      1. Return c(x).
    7. Return failure.
Assertion :: \b
  1. Return a new Matcher with parameters (x, c) that captures rer and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let Input be x.[[Input]].
    4. Let e be x.[[EndIndex]].
    5. Let a be IsWordChar(rer, Input, e - 1).
    6. Let b be IsWordChar(rer, Input, e).
    7. If a is true and b is false, or if a is false and b is true, return c(x).
    8. Return failure.
Assertion :: \B
  1. Return a new Matcher with parameters (x, c) that captures rer and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let Input be x.[[Input]].
    4. Let e be x.[[EndIndex]].
    5. Let a be IsWordChar(rer, Input, e - 1).
    6. Let b be IsWordChar(rer, Input, e).
    7. If a is true and b is true, or if a is false and b is false, return c(x).
    8. Return failure.
Assertion :: (?= Disjunction )
  1. Let m be CompileSubpattern of Disjunction with arguments rer and forward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let d be a new MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a MatchState.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is failure, return failure.
    6. Assert: r is a MatchState.
    7. Let cap be r.[[Captures]].
    8. Let Input be x.[[Input]].
    9. Let xe be x.[[EndIndex]].
    10. Let z be the MatchState { [[Input]]: Input, [[EndIndex]]: xe, [[Captures]]: cap }.
    11. Return c(z).
Note 3

The form (?= Disjunction ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside Disjunction must match at the current position, but the current position is not advanced before matching the sequel. If Disjunction can match at the current position in several ways, only the first one is tried. Unlike other regular expression operators, there is no backtracking into a (?= form (this unusual behaviour is inherited from Perl). This only matters when the Disjunction contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.

For example,

/(?=(a+))/.exec("baaabac")

matches the empty String immediately after the first b and therefore returns the array:

["", "aaa"]

To illustrate the lack of backtracking into the lookahead, consider:

/(?=(a+))a*b\1/.exec("baaabac")

This expression returns

["aba", "a"]

and not:

["aaaba", "a"]
Assertion :: (?! Disjunction )
  1. Let m be CompileSubpattern of Disjunction with arguments rer and forward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let d be a new MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a MatchState.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is not failure, return failure.
    6. Return c(x).
Note 4

The form (?! Disjunction ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside Disjunction must fail to match at the current position. The current position is not advanced before matching the sequel. Disjunction can contain capturing parentheses, but backreferences to them only make sense from within Disjunction itself. Backreferences to these capturing parentheses from elsewhere in the pattern always return undefined because the negative lookahead must fail for the pattern to succeed. For example,

/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")

looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against undefined and therefore always succeeds. The whole expression returns the array:

["baaabaac", "ba", undefined, "abaac"]
Assertion :: (?<= Disjunction )
  1. Let m be CompileSubpattern of Disjunction with arguments rer and backward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let d be a new MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a MatchState.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is failure, return failure.
    6. Assert: r is a MatchState.
    7. Let cap be r.[[Captures]].
    8. Let Input be x.[[Input]].
    9. Let xe be x.[[EndIndex]].
    10. Let z be the MatchState { [[Input]]: Input, [[EndIndex]]: xe, [[Captures]]: cap }.
    11. Return c(z).
Assertion :: (?<! Disjunction )
  1. Let m be CompileSubpattern of Disjunction with arguments rer and backward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let d be a new MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a MatchState.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is not failure, return failure.
    6. Return c(x).

22.2.2.4.1 IsWordChar ( rer, Input, e )

The abstract operation IsWordChar takes arguments rer (a RegExp Record), Input (a List of characters), and e (an integer) and returns a Boolean. It performs the following steps when called:

  1. Let InputLength be the number of elements in Input.
  2. If e = -1 or e = InputLength, return false.
  3. Let c be the character Input[e].
  4. If WordCharacters(rer) contains c, return true.
  5. Return false.

22.2.2.5 Runtime Semantics: CompileQuantifier

The syntax-directed operation CompileQuantifier takes no arguments and returns a Record with fields [[Min]] (a non-negative integer), [[Max]] (a non-negative integer or +∞), and [[Greedy]] (a Boolean). It is defined piecewise over the following productions:

Quantifier :: QuantifierPrefix
  1. Let qp be CompileQuantifierPrefix of QuantifierPrefix.
  2. Return the Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]: true }.
Quantifier :: QuantifierPrefix ?
  1. Let qp be CompileQuantifierPrefix of QuantifierPrefix.
  2. Return the Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]: false }.

22.2.2.6 Runtime Semantics: CompileQuantifierPrefix

The syntax-directed operation CompileQuantifierPrefix takes no arguments and returns a Record with fields [[Min]] (a non-negative integer) and [[Max]] (a non-negative integer or +∞). It is defined piecewise over the following productions:

QuantifierPrefix :: *
  1. Return the Record { [[Min]]: 0, [[Max]]: +∞ }.
QuantifierPrefix :: +
  1. Return the Record { [[Min]]: 1, [[Max]]: +∞ }.
QuantifierPrefix :: ?
  1. Return the Record { [[Min]]: 0, [[Max]]: 1 }.
QuantifierPrefix :: { DecimalDigits }
  1. Let i be the MV of DecimalDigits (see 12.9.3).
  2. Return the Record { [[Min]]: i, [[Max]]: i }.
QuantifierPrefix :: { DecimalDigits ,}
  1. Let i be the MV of DecimalDigits.
  2. Return the Record { [[Min]]: i, [[Max]]: +∞ }.
QuantifierPrefix :: { DecimalDigits , DecimalDigits }
  1. Let i be the MV of the first DecimalDigits.
  2. Let j be the MV of the second DecimalDigits.
  3. Return the Record { [[Min]]: i, [[Max]]: j }.

22.2.2.7 Runtime Semantics: CompileAtom

The syntax-directed operation CompileAtom takes arguments rer (a RegExp Record) and direction (forward or backward) and returns a Matcher.

Note 1

This section is amended in B.1.2.7.

It is defined piecewise over the following productions:

Atom :: PatternCharacter
  1. Let ch be the character matched by PatternCharacter.
  2. Let A be a one-element CharSet containing the character ch.
  3. Return CharacterSetMatcher(rer, A, false, direction).
Atom :: .
  1. Let A be AllCharacters(rer).
  2. If rer.[[DotAll]] is not true, then
    1. Remove from A all characters corresponding to a code point on the right-hand side of the LineTerminator production.
  3. Return CharacterSetMatcher(rer, A, false, direction).
Atom :: CharacterClass
  1. Let cc be CompileCharacterClass of CharacterClass with argument rer.
  2. Let cs be cc.[[CharSet]].
  3. If rer.[[UnicodeSets]] is false, or if every CharSetElement of cs consists of a single character (including if cs is empty), return CharacterSetMatcher(rer, cs, cc.[[Invert]], direction).
  4. Assert: cc.[[Invert]] is false.
  5. Let lm be an empty List of Matchers.
  6. For each CharSetElement s in cs containing more than 1 character, iterating in descending order of length, do
    1. Let cs2 be a one-element CharSet containing the last code point of s.
    2. Let m2 be CharacterSetMatcher(rer, cs2, false, direction).
    3. For each code point c1 in s, iterating backwards from its second-to-last code point, do
      1. Let cs1 be a one-element CharSet containing c1.
      2. Let m1 be CharacterSetMatcher(rer, cs1, false, direction).
      3. Set m2 to MatchSequence(m1, m2, direction).
    4. Append m2 to lm.
  7. Let singles be the CharSet containing every CharSetElement of cs that consists of a single character.
  8. Append CharacterSetMatcher(rer, singles, false, direction) to lm.
  9. If cs contains the empty sequence of characters, append EmptyMatcher() to lm.
  10. Let m2 be the last Matcher in lm.
  11. For each Matcher m1 of lm, iterating backwards from its second-to-last element, do
    1. Set m2 to MatchTwoAlternatives(m1, m2).
  12. Return m2.
Atom :: ( GroupSpecifieropt Disjunction )
  1. Let m be CompileSubpattern of Disjunction with arguments rer and direction.
  2. Let parenIndex be CountLeftCapturingParensBefore(Atom).
  3. Return a new Matcher with parameters (x, c) that captures direction, m, and parenIndex and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let d be a new MatcherContinuation with parameters (y) that captures x, c, direction, and parenIndex and performs the following steps when called:
      1. Assert: y is a MatchState.
      2. Let cap be a copy of y.[[Captures]].
      3. Let Input be x.[[Input]].
      4. Let xe be x.[[EndIndex]].
      5. Let ye be y.[[EndIndex]].
      6. If direction is forward, then
        1. Assert: xeye.
        2. Let r be the CaptureRange { [[StartIndex]]: xe, [[EndIndex]]: ye }.
      7. Else,
        1. Assert: direction is backward.
        2. Assert: yexe.
        3. Let r be the CaptureRange { [[StartIndex]]: ye, [[EndIndex]]: xe }.
      8. Set cap[parenIndex + 1] to r.
      9. Let z be the MatchState { [[Input]]: Input, [[EndIndex]]: ye, [[Captures]]: cap }.
      10. Return c(z).
    4. Return m(x, d).
Note 2

Parentheses of the form ( Disjunction ) serve both to group the components of the Disjunction pattern together and to save the result of the match. The result can be used either in a backreference (\ followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching Abstract Closure. To inhibit the capturing behaviour of parentheses, use the form (?: Disjunction ) instead.

Atom :: (?: Disjunction )
  1. Return CompileSubpattern of Disjunction with arguments rer and direction.
AtomEscape :: DecimalEscape
  1. Let n be the CapturingGroupNumber of DecimalEscape.
  2. Assert: nrer.[[CapturingGroupsCount]].
  3. Return BackreferenceMatcher(rer, n, direction).
Note 3

An escape sequence of the form \ followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (22.2.2.1). It is an error if the regular expression has fewer than n capturing parentheses. If the regular expression has n or more capturing parentheses but the nth one is undefined because it has not captured anything, then the backreference always succeeds.

AtomEscape :: CharacterEscape
  1. Let cv be the CharacterValue of CharacterEscape.
  2. Let ch be the character whose character value is cv.
  3. Let A be a one-element CharSet containing the character ch.
  4. Return CharacterSetMatcher(rer, A, false, direction).
AtomEscape :: CharacterClassEscape
  1. Let cs be CompileToCharSet of CharacterClassEscape with argument rer.
  2. If rer.[[UnicodeSets]] is false, or if every CharSetElement of cs consists of a single character (including if cs is empty), return CharacterSetMatcher(rer, cs, false, direction).
  3. Let lm be an empty List of Matchers.
  4. For each CharSetElement s in cs containing more than 1 character, iterating in descending order of length, do
    1. Let cs2 be a one-element CharSet containing the last code point of s.
    2. Let m2 be CharacterSetMatcher(rer, cs2, false, direction).
    3. For each code point c1 in s, iterating backwards from its second-to-last code point, do
      1. Let cs1 be a one-element CharSet containing c1.
      2. Let m1 be CharacterSetMatcher(rer, cs1, false, direction).
      3. Set m2 to MatchSequence(m1, m2, direction).
    4. Append m2 to lm.
  5. Let singles be the CharSet containing every CharSetElement of cs that consists of a single character.
  6. Append CharacterSetMatcher(rer, singles, false, direction) to lm.
  7. If cs contains the empty sequence of characters, append EmptyMatcher() to lm.
  8. Let m2 be the last Matcher in lm.
  9. For each Matcher m1 of lm, iterating backwards from its second-to-last element, do
    1. Set m2 to MatchTwoAlternatives(m1, m2).
  10. Return m2.
AtomEscape :: k GroupName
  1. Let matchingGroupSpecifiers be GroupSpecifiersThatMatch(GroupName).
  2. Assert: matchingGroupSpecifiers contains a single GroupSpecifier.
  3. Let groupSpecifier be the sole element of matchingGroupSpecifiers.
  4. Let parenIndex be CountLeftCapturingParensBefore(groupSpecifier).
  5. Return BackreferenceMatcher(rer, parenIndex, direction).

22.2.2.7.1 CharacterSetMatcher ( rer, A, invert, direction )

The abstract operation CharacterSetMatcher takes arguments rer (a RegExp Record), A (a CharSet), invert (a Boolean), and direction (forward or backward) and returns a Matcher. It performs the following steps when called:

  1. If rer.[[UnicodeSets]] is true, then
    1. Assert: invert is false.
    2. Assert: Every CharSetElement of A consists of a single character.
  2. Return a new Matcher with parameters (x, c) that captures rer, A, invert, and direction and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let Input be x.[[Input]].
    4. Let e be x.[[EndIndex]].
    5. If direction is forward, let f be e + 1.
    6. Else, let f be e - 1.
    7. Let InputLength be the number of elements in Input.
    8. If f < 0 or f > InputLength, return failure.
    9. Let index be min(e, f).
    10. Let ch be the character Input[index].
    11. Let cc be Canonicalize(rer, ch).
    12. If there exists a CharSetElement in A containing exactly one character a such that Canonicalize(rer, a) is cc, let found be true. Otherwise, let found be false.
    13. If invert is false and found is false, return failure.
    14. If invert is true and found is true, return failure.
    15. Let cap be x.[[Captures]].
    16. Let y be the MatchState { [[Input]]: Input, [[EndIndex]]: f, [[Captures]]: cap }.
    17. Return c(y).

22.2.2.7.2 BackreferenceMatcher ( rer, n, direction )

The abstract operation BackreferenceMatcher takes arguments rer (a RegExp Record), n (a positive integer), and direction (forward or backward) and returns a Matcher. It performs the following steps when called:

  1. Assert: n ≥ 1.
  2. Return a new Matcher with parameters (x, c) that captures rer, n, and direction and performs the following steps when called:
    1. Assert: x is a MatchState.
    2. Assert: c is a MatcherContinuation.
    3. Let Input be x.[[Input]].
    4. Let cap be x.[[Captures]].
    5. Let r be cap[n].
    6. If r is undefined, return c(x).
    7. Let e be x.[[EndIndex]].
    8. Let rs be r.[[StartIndex]].
    9. Let re be r.[[EndIndex]].
    10. Let len be re - rs.
    11. If direction is forward, let f be e + len.
    12. Else, let f be e - len.
    13. Let InputLength be the number of elements in Input.
    14. If f < 0 or f > InputLength, return failure.
    15. Let g be min(e, f).
    16. If there exists an integer i in the interval from 0 (inclusive) to len (exclusive) such that Canonicalize(rer, Input[rs + i]) is not Canonicalize(rer, Input[g + i]), return failure.
    17. Let y be the MatchState { [[Input]]: Input, [[EndIndex]]: f, [[Captures]]: cap }.
    18. Return c(y).

22.2.2.7.3 Canonicalize ( rer, ch )

The abstract operation Canonicalize takes arguments rer (a RegExp Record) and ch (a character) and returns a character. It performs the following steps when called:

  1. If HasEitherUnicodeFlag(rer) is true and rer.[[IgnoreCase]] is true, then
    1. If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.
    2. Return ch.
  2. If rer.[[IgnoreCase]] is false, return ch.
  3. Assert: ch is a UTF-16 code unit.
  4. Let cp be the code point whose numeric value is the numeric value of ch.
  5. Let u be the result of toUppercase(« cp »), according to the Unicode Default Case Conversion algorithm.
  6. Let uStr be CodePointsToString(u).
  7. If the length of uStr ≠ 1, return ch.
  8. Let cu be uStr's single code unit element.
  9. If the numeric value of ch ≥ 128 and the numeric value of cu < 128, return ch.
  10. Return cu.
Note

In case-insignificant matches when HasEitherUnicodeFlag(rer) is true, all characters are implicitly case-folded using the simple mapping provided by the Unicode Standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, ß (U+00DF LATIN SMALL LETTER SHARP S) to ss or SS. It may however map code points outside the Basic Latin block to code points within it—for example, ſ (U+017F LATIN SMALL LETTER LONG S) case-folds to s (U+0073 LATIN SMALL LETTER S) and (U+212A KELVIN SIGN) case-folds to k (U+006B LATIN SMALL LETTER K). Strings containing those code points are matched by regular expressions such as /[a-z]/ui.

In case-insignificant matches when HasEitherUnicodeFlag(rer) is false, the mapping is based on Unicode Default Case Conversion algorithm toUppercase rather than toCasefold, which results in some subtle differences. For example, (U+2126 OHM SIGN) is mapped by toUppercase to itself but by toCasefold to ω (U+03C9 GREEK SMALL LETTER OMEGA) along with Ω (U+03A9 GREEK CAPITAL LETTER OMEGA), so "\u2126" is matched by /[ω]/ui and /[\u03A9]/ui but not by /[ω]/i or /[\u03A9]/i. Also, no code point outside the Basic Latin block is mapped to a code point within it, so strings such as "\u017F ſ" and "\u212A K" are not matched by /[a-z]/i.

22.2.2.8 Runtime Semantics: CompileCharacterClass

The syntax-directed operation CompileCharacterClass takes argument rer (a RegExp Record) and returns a Record with fields [[CharSet]] (a CharSet) and [[Invert]] (a Boolean). It is defined piecewise over the following productions:

CharacterClass :: [ ClassContents ]
  1. Let A be CompileToCharSet of ClassContents with argument rer.
  2. Return the Record { [[CharSet]]: A, [[Invert]]: false }.
CharacterClass :: [^ ClassContents ]
  1. Let A be CompileToCharSet of ClassContents with argument rer.
  2. If rer.[[UnicodeSets]] is true, then
    1. Return the Record { [[CharSet]]: CharacterComplement(rer, A), [[Invert]]: false }.
  3. Return the Record { [[CharSet]]: A, [[Invert]]: true }.

22.2.2.9 Runtime Semantics: CompileToCharSet

The syntax-directed operation CompileToCharSet takes argument rer (a RegExp Record) and returns a CharSet.

Note 1

This section is amended in B.1.2.8.

It is defined piecewise over the following productions:

ClassContents :: [empty]
  1. Return the empty CharSet.
NonemptyClassRanges :: ClassAtom NonemptyClassRangesNoDash
  1. Let A be CompileToCharSet of ClassAtom with argument rer.
  2. Let B be CompileToCharSet of NonemptyClassRangesNoDash with argument rer.
  3. Return the union of CharSets A and B.
NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents
  1. Let A be CompileToCharSet of the first ClassAtom with argument rer.
  2. Let B be CompileToCharSet of the second ClassAtom with argument rer.
  3. Let C be CompileToCharSet of ClassContents with argument rer.
  4. Let D be CharacterRange(A, B).
  5. Return the union of D and C.
NonemptyClassRangesNoDash :: ClassAtomNoDash NonemptyClassRangesNoDash
  1. Let A be CompileToCharSet of ClassAtomNoDash with argument rer.
  2. Let B be CompileToCharSet of NonemptyClassRangesNoDash with argument rer.
  3. Return the union of CharSets A and B.
NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents
  1. Let A be CompileToCharSet of ClassAtomNoDash with argument rer.
  2. Let B be CompileToCharSet of ClassAtom with argument rer.
  3. Let C be CompileToCharSet of ClassContents with argument rer.
  4. Let D be CharacterRange(A, B).
  5. Return the union of D and C.
Note 2

ClassContents can expand into a single ClassAtom and/or ranges of two ClassAtom separated by dashes. In the latter case the ClassContents includes all characters between the first ClassAtom and the second ClassAtom, inclusive; an error occurs if either ClassAtom does not represent a single character (for example, if one is \w) or if the first ClassAtom's character value is strictly greater than the second ClassAtom's character value.

Note 3

Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i matches only the letters E, F, e, and f, while the pattern /[E-f]/i matches all uppercase and lowercase letters in the Unicode Basic Latin block as well as the symbols [, \, ], ^, _, and `.

Note 4

A - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of ClassContents, the beginning or end limit of a range specification, or immediately follows a range specification.

ClassAtom :: -
  1. Return the CharSet containing the single character - U+002D (HYPHEN-MINUS).
ClassAtomNoDash :: SourceCharacter but not one of \ or ] or -
  1. Return the CharSet containing the character matched by SourceCharacter.
ClassEscape :: b - CharacterEscape
  1. Let cv be the CharacterValue of this ClassEscape.
  2. Let c be the character whose character value is cv.
  3. Return the CharSet containing the single character c.
Note 5

A ClassAtom can use any of the escape sequences that are allowed in the rest of the regular expression except for \b, \B, and backreferences. Inside a CharacterClass, \b means the backspace character, while \B and backreferences raise errors. Using a backreference inside a ClassAtom causes an error.

CharacterClassEscape :: d
  1. Return the ten-element CharSet containing the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.
CharacterClassEscape :: D
  1. Let S be the CharSet returned by CharacterClassEscape :: d .
  2. Return CharacterComplement(rer, S).
CharacterClassEscape :: s
  1. Return the CharSet containing all characters corresponding to a code point on the right-hand side of the WhiteSpace or LineTerminator productions.
CharacterClassEscape :: S
  1. Let S be the CharSet returned by CharacterClassEscape :: s .
  2. Return CharacterComplement(rer, S).
CharacterClassEscape :: w
  1. Return MaybeSimpleCaseFolding(rer, WordCharacters(rer)).
CharacterClassEscape :: W
  1. Let S be the CharSet returned by CharacterClassEscape :: w .
  2. Return CharacterComplement(rer, S).
CharacterClassEscape :: p{ UnicodePropertyValueExpression }
  1. Return CompileToCharSet of UnicodePropertyValueExpression with argument rer.
CharacterClassEscape :: P{ UnicodePropertyValueExpression }
  1. Let S be CompileToCharSet of UnicodePropertyValueExpression with argument rer.
  2. Assert: S contains only single code points.
  3. Return CharacterComplement(rer, S).
UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue
  1. Let ps be the source text matched by UnicodePropertyName.
  2. Let p be UnicodeMatchProperty(rer, ps).
  3. Assert: p is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 67.
  4. Let vs be the source text matched by UnicodePropertyValue.
  5. Let v be UnicodeMatchPropertyValue(p, vs).
  6. Let A be the CharSet containing all Unicode code points whose character database definition includes the property p with value v.
  7. Return MaybeSimpleCaseFolding(rer, A).
UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue
  1. Let s be the source text matched by LoneUnicodePropertyNameOrValue.
  2. If UnicodeMatchPropertyValue(General_Category, s) is a Unicode property value or property value alias for the General_Category (gc) property listed in PropertyValueAliases.txt, then
    1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value s.
  3. Let p be UnicodeMatchProperty(rer, s).
  4. Assert: p is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of Table 68, or a binary Unicode property of strings listed in the “Property name” column of Table 69.
  5. Let A be the CharSet containing all CharSetElements whose character database definition includes the property p with value “True”.
  6. Return MaybeSimpleCaseFolding(rer, A).
ClassUnion :: ClassSetRange ClassUnionopt
  1. Let A be CompileToCharSet of ClassSetRange with argument rer.
  2. If ClassUnion is present, then
    1. Let B be CompileToCharSet of ClassUnion with argument rer.
    2. Return the union of CharSets A and B.
  3. Return A.
ClassUnion :: ClassSetOperand ClassUnionopt
  1. Let A be CompileToCharSet of ClassSetOperand with argument rer.
  2. If ClassUnion is present, then
    1. Let B be CompileToCharSet of ClassUnion with argument rer.
    2. Return the union of CharSets A and B.
  3. Return A.
ClassIntersection :: ClassSetOperand && ClassSetOperand
  1. Let A be CompileToCharSet of the first ClassSetOperand with argument rer.
  2. Let B be CompileToCharSet of the second ClassSetOperand with argument rer.
  3. Return the intersection of CharSets A and B.
ClassIntersection :: ClassIntersection && ClassSetOperand
  1. Let A be CompileToCharSet of the ClassIntersection with argument rer.
  2. Let B be CompileToCharSet of the ClassSetOperand with argument rer.
  3. Return the intersection of CharSets A and B.
ClassSubtraction :: ClassSetOperand -- ClassSetOperand
  1. Let A be CompileToCharSet of the first ClassSetOperand with argument rer.
  2. Let B be CompileToCharSet of the second ClassSetOperand with argument rer.
  3. Return the CharSet containing the CharSetElements of A which are not also CharSetElements of B.
ClassSubtraction :: ClassSubtraction -- ClassSetOperand
  1. Let A be CompileToCharSet of the ClassSubtraction with argument rer.
  2. Let B be CompileToCharSet of the ClassSetOperand with argument rer.
  3. Return the CharSet containing the CharSetElements of A which are not also CharSetElements of B.
ClassSetRange :: ClassSetCharacter - ClassSetCharacter
  1. Let A be CompileToCharSet of the first ClassSetCharacter with argument rer.
  2. Let B be CompileToCharSet of the second ClassSetCharacter with argument rer.
  3. Return MaybeSimpleCaseFolding(rer, CharacterRange(A, B)).
Note 6

The result will often consist of two or more ranges. When UnicodeSets is true and IgnoreCase is true, then MaybeSimpleCaseFolding(rer, [Ā-č]) will include only the odd-numbered code points of that range.

ClassSetOperand :: ClassSetCharacter
  1. Let A be CompileToCharSet of ClassSetCharacter with argument rer.
  2. Return MaybeSimpleCaseFolding(rer, A).
ClassSetOperand :: ClassStringDisjunction
  1. Let A be CompileToCharSet of ClassStringDisjunction with argument rer.
  2. Return MaybeSimpleCaseFolding(rer, A).
ClassSetOperand :: NestedClass
  1. Return CompileToCharSet of NestedClass with argument rer.
NestedClass :: [ ClassContents ]
  1. Return CompileToCharSet of ClassContents with argument rer.
NestedClass :: [^ ClassContents ]
  1. Let A be CompileToCharSet of ClassContents with argument rer.
  2. Return CharacterComplement(rer, A).
NestedClass :: \ CharacterClassEscape
  1. Return CompileToCharSet of CharacterClassEscape with argument rer.
ClassStringDisjunction :: \q{ ClassStringDisjunctionContents }
  1. Return CompileToCharSet of ClassStringDisjunctionContents with argument rer.
ClassStringDisjunctionContents :: ClassString
  1. Let s be CompileClassSetString of ClassString with argument rer.
  2. Return the CharSet containing the one string s.
ClassStringDisjunctionContents :: ClassString | ClassStringDisjunctionContents
  1. Let s be CompileClassSetString of ClassString with argument rer.
  2. Let A be the CharSet containing the one string s.
  3. Let B be CompileToCharSet of ClassStringDisjunctionContents with argument rer.
  4. Return the union of CharSets A and B.
ClassSetCharacter :: SourceCharacter but not ClassSetSyntaxCharacter \ CharacterEscape \ ClassSetReservedPunctuator
  1. Let cv be the CharacterValue of this ClassSetCharacter.
  2. Let c be the character whose character value is cv.
  3. Return the CharSet containing the single character c.
ClassSetCharacter :: \b
  1. Return the CharSet containing the single character U+0008 (BACKSPACE).

22.2.2.9.1 CharacterRange ( A, B )

The abstract operation CharacterRange takes arguments A (a CharSet) and B (a CharSet) and returns a CharSet. It performs the following steps when called:

  1. Assert: A and B each contain exactly one character.
  2. Let a be the one character in CharSet A.
  3. Let b be the one character in CharSet B.
  4. Let i be the character value of character a.
  5. Let j be the character value of character b.
  6. Assert: ij.
  7. Return the CharSet containing all characters with a character value in the inclusive interval from i to j.

22.2.2.9.2 HasEitherUnicodeFlag ( rer )

The abstract operation HasEitherUnicodeFlag takes argument rer (a RegExp Record) and returns a Boolean. It performs the following steps when called:

  1. If rer.[[Unicode]] is true or rer.[[UnicodeSets]] is true, then
    1. Return true.
  2. Return false.

22.2.2.9.3 WordCharacters ( rer )

The abstract operation WordCharacters takes argument rer (a RegExp Record) and returns a CharSet. Returns a CharSet containing the characters considered "word characters" for the purposes of \b, \B, \w, and \W It performs the following steps when called:

  1. Let basicWordChars be the CharSet containing every character in the ASCII word characters.
  2. Let extraWordChars be the CharSet containing all characters c such that c is not in basicWordChars but Canonicalize(rer, c) is in basicWordChars.
  3. Assert: extraWordChars is empty unless HasEitherUnicodeFlag(rer) is true and rer.[[IgnoreCase]] is true.
  4. Return the union of basicWordChars and extraWordChars.

22.2.2.9.4 AllCharacters ( rer )

The abstract operation AllCharacters takes argument rer (a RegExp Record) and returns a CharSet. Returns the set of “all characters” according to the regular expression flags. It performs the following steps when called:

  1. If rer.[[UnicodeSets]] is true and rer.[[IgnoreCase]] is true, then
    1. Return the CharSet containing all Unicode code points c that do not have a Simple Case Folding mapping (that is, scf(c)=c).
  2. Else if HasEitherUnicodeFlag(rer) is true, then
    1. Return the CharSet containing all code point values.
  3. Else,
    1. Return the CharSet containing all code unit values.

22.2.2.9.5 MaybeSimpleCaseFolding ( rer, A )

The abstract operation MaybeSimpleCaseFolding takes arguments rer (a RegExp Record) and A (a CharSet) and returns a CharSet. If rer.[[UnicodeSets]] is false or rer.[[IgnoreCase]] is false, it returns A. Otherwise, it uses the Simple Case Folding (scf(cp)) definitions in the file CaseFolding.txt of the Unicode Character Database (each of which maps a single code point to another single code point) to map each CharSetElement of A character-by-character into a canonical form and returns the resulting CharSet. It performs the following steps when called:

  1. If rer.[[UnicodeSets]] is false or rer.[[IgnoreCase]] is false, return A.
  2. Let B be a new empty CharSet.
  3. For each CharSetElement s of A, do
    1. Let t be an empty sequence of characters.
    2. For each single code point cp in s, do
      1. Append scf(cp) to t.
    3. Add t to B.
  4. Return B.

22.2.2.9.6 CharacterComplement ( rer, S )

The abstract operation CharacterComplement takes arguments rer (a RegExp Record) and S (a CharSet) and returns a CharSet. It performs the following steps when called:

  1. Let A be AllCharacters(rer).
  2. Return the CharSet containing the CharSetElements of A which are not also CharSetElements of S.

22.2.2.9.7 UnicodeMatchProperty ( rer, p )

The abstract operation UnicodeMatchProperty takes arguments rer (a RegExp Record) and p (ECMAScript source text) and returns a Unicode property name. It performs the following steps when called:

  1. If rer.[[UnicodeSets]] is true and p is a Unicode property name listed in the “Property name” column of Table 69, then
    1. Return the List of Unicode code points p.
  2. Assert: p is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 67 or Table 68.
  3. Let c be the canonical property name of p as given in the “Canonical property name” column of the corresponding row.
  4. Return the List of Unicode code points c.

Implementations must support the Unicode property names and aliases listed in Table 67, Table 68, and Table 69. To ensure interoperability, implementations must not support any other property names or aliases.

Note 1

For example, Script_Extensions (property name) and scx (property alias) are valid, but script_extensions or Scx aren't.

Note 2

The listed properties form a superset of what UTS18 RL1.2 requires.

Note 3

The spellings of entries in these tables (including casing) match the spellings used in the file PropertyAliases.txt in the Unicode Character Database. The precise spellings in that file are guaranteed to be stable.

Table 67: Non-binary Unicode property aliases and their canonical property names
Property name and aliases Canonical property name
General_Category General_Category
gc
Script Script
sc
Script_Extensions Script_Extensions
scx
Table 68: Binary Unicode property aliases and their canonical property names
Property name and aliases Canonical property name
ASCII ASCII
ASCII_Hex_Digit ASCII_Hex_Digit
AHex
Alphabetic Alphabetic
Alpha
Any Any
Assigned Assigned
Bidi_Control Bidi_Control
Bidi_C
Bidi_Mirrored Bidi_Mirrored
Bidi_M
Case_Ignorable Case_Ignorable
CI
Cased Cased
Changes_When_Casefolded Changes_When_Casefolded
CWCF
Changes_When_Casemapped Changes_When_Casemapped
CWCM
Changes_When_Lowercased Changes_When_Lowercased
CWL
Changes_When_NFKC_Casefolded Changes_When_NFKC_Casefolded
CWKCF
Changes_When_Titlecased Changes_When_Titlecased
CWT
Changes_When_Uppercased Changes_When_Uppercased
CWU
Dash Dash
Default_Ignorable_Code_Point Default_Ignorable_Code_Point
DI
Deprecated Deprecated
Dep
Diacritic Diacritic
Dia
Emoji Emoji
Emoji_Component Emoji_Component
EComp
Emoji_Modifier Emoji_Modifier
EMod
Emoji_Modifier_Base Emoji_Modifier_Base
EBase
Emoji_Presentation Emoji_Presentation
EPres
Extended_Pictographic Extended_Pictographic
ExtPict
Extender Extender
Ext
Grapheme_Base Grapheme_Base
Gr_Base
Grapheme_Extend Grapheme_Extend
Gr_Ext
Hex_Digit Hex_Digit
Hex
IDS_Binary_Operator IDS_Binary_Operator
IDSB
IDS_Trinary_Operator IDS_Trinary_Operator
IDST
ID_Continue ID_Continue
IDC
ID_Start ID_Start
IDS
Ideographic Ideographic
Ideo
Join_Control Join_Control
Join_C
Logical_Order_Exception Logical_Order_Exception
LOE
Lowercase Lowercase
Lower
Math Math
Noncharacter_Code_Point Noncharacter_Code_Point
NChar
Pattern_Syntax Pattern_Syntax
Pat_Syn
Pattern_White_Space Pattern_White_Space
Pat_WS
Quotation_Mark Quotation_Mark
QMark
Radical Radical
Regional_Indicator Regional_Indicator
RI
Sentence_Terminal Sentence_Terminal
STerm
Soft_Dotted Soft_Dotted
SD
Terminal_Punctuation Terminal_Punctuation
Term
Unified_Ideograph Unified_Ideograph
UIdeo
Uppercase Uppercase
Upper
Variation_Selector Variation_Selector
VS
White_Space White_Space
space
XID_Continue XID_Continue
XIDC
XID_Start XID_Start
XIDS
Table 69: Binary Unicode properties of strings
Property name
Basic_Emoji
Emoji_Keycap_Sequence
RGI_Emoji_Modifier_Sequence
RGI_Emoji_Flag_Sequence
RGI_Emoji_Tag_Sequence
RGI_Emoji_ZWJ_Sequence
RGI_Emoji

22.2.2.9.8 UnicodeMatchPropertyValue ( p, v )

The abstract operation UnicodeMatchPropertyValue takes arguments p (ECMAScript source text) and v (ECMAScript source text) and returns a Unicode property value. It performs the following steps when called:

  1. Assert: p is a canonical, unaliased Unicode property name listed in the “Canonical property name” column of Table 67.
  2. Assert: v is a property value or property value alias for the Unicode property p listed in PropertyValueAliases.txt.
  3. Let value be the canonical property value of v as given in the “Canonical property value” column of the corresponding row.
  4. Return the List of Unicode code points value.

Implementations must support the Unicode property values and property value aliases listed in PropertyValueAliases.txt for the properties listed in Table 67. To ensure interoperability, implementations must not support any other property values or property value aliases.

Note 1

For example, Xpeo and Old_Persian are valid Script_Extensions values, but xpeo and Old Persian aren't.

Note 2

This algorithm differs from the matching rules for symbolic values listed in UAX44: case, white space, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the Is prefix is not supported.

22.2.2.10 Runtime Semantics: CompileClassSetString

The syntax-directed operation CompileClassSetString takes argument rer (a RegExp Record) and returns a sequence of characters. It is defined piecewise over the following productions:

ClassString :: [empty]
  1. Return an empty sequence of characters.
ClassString :: NonEmptyClassString
  1. Return CompileClassSetString of NonEmptyClassString with argument rer.
NonEmptyClassString :: ClassSetCharacter NonEmptyClassStringopt
  1. Let cs be CompileToCharSet of ClassSetCharacter with argument rer.
  2. Let s1 be the sequence of characters that is the single CharSetElement of cs.
  3. If NonEmptyClassString is present, then
    1. Let s2 be CompileClassSetString of NonEmptyClassString with argument rer.
    2. Return the concatenation of s1 and s2.
  4. Return s1.

22.2.3 Abstract Operations for RegExp Creation

22.2.3.1 RegExpCreate ( P, F )

The abstract operation RegExpCreate takes arguments P (an ECMAScript language value) and F (a String or undefined) and returns either a normal completion containing an Object or a throw completion. It performs the following steps when called:

  1. Let obj be ! RegExpAlloc(%RegExp%).
  2. Return ? RegExpInitialize(obj, P, F).

22.2.3.2 RegExpAlloc ( newTarget )

The abstract operation RegExpAlloc takes argument newTarget (a constructor) and returns either a normal completion containing an Object or a throw completion. It performs the following steps when called:

  1. Let obj be ? OrdinaryCreateFromConstructor(newTarget, "%RegExp.prototype%", « [[OriginalSource]], [[OriginalFlags]], [[RegExpRecord]], [[RegExpMatcher]] »).
  2. Perform ! DefinePropertyOrThrow(obj, "lastIndex", PropertyDescriptor { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }).
  3. Return obj.

22.2.3.3 RegExpInitialize ( obj, pattern, flags )

The abstract operation RegExpInitialize takes arguments obj (an Object), pattern (an ECMAScript language value), and flags (an ECMAScript language value) and returns either a normal completion containing an Object or a throw completion. It performs the following steps when called:

  1. If pattern is undefined, let P be the empty String.
  2. Else, let P be ? ToString(pattern).
  3. If flags is undefined, let F be the empty String.
  4. Else, let F be ? ToString(flags).
  5. If F contains any code unit other than "d", "g", "i", "m", "s", "u", "v", or "y", or if F contains any code unit more than once, throw a SyntaxError exception.
  6. If F contains "i", let i be true; else let i be false.
  7. If F contains "m", let m be true; else let m be false.
  8. If F contains "s", let s be true; else let s be false.
  9. If F contains "u", let u be true; else let u be false.
  10. If F contains "v", let v be true; else let v be false.
  11. If u is true or v is true, then
    1. Let patternText be StringToCodePoints(P).
  12. Else,
    1. Let patternText be the result of interpreting each of P's 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
  13. Let parseResult be ParsePattern(patternText, u, v).
  14. If parseResult is a non-empty List of SyntaxError objects, throw a SyntaxError exception.
  15. Assert: parseResult is a Pattern Parse Node.
  16. Set obj.[[OriginalSource]] to P.
  17. Set obj.[[OriginalFlags]] to F.
  18. Let capturingGroupsCount be CountLeftCapturingParensWithin(parseResult).
  19. Let rer be the RegExp Record { [[IgnoreCase]]: i, [[Multiline]]: m, [[DotAll]]: s, [[Unicode]]: u, [[UnicodeSets]]: v, [[CapturingGroupsCount]]: capturingGroupsCount }.
  20. Set obj.[[RegExpRecord]] to rer.
  21. Set obj.[[RegExpMatcher]] to CompilePattern of parseResult with argument rer.
  22. Perform ? Set(obj, "lastIndex", +0𝔽, true).
  23. Return obj.

22.2.3.4 Static Semantics: ParsePattern ( patternText, u, v )

The abstract operation ParsePattern takes arguments patternText (a sequence of Unicode code points), u (a Boolean), and v (a Boolean) and returns a Parse Node or a non-empty List of SyntaxError objects.

Note

This section is amended in B.1.2.9.

It performs the following steps when called:

  1. If v is true and u is true, then
    1. Let parseResult be a List containing one or more SyntaxError objects.
  2. Else if v is true, then
    1. Let parseResult be ParseText(patternText, Pattern[+UnicodeMode, +UnicodeSetsMode, +NamedCaptureGroups]).
  3. Else if u is true, then
    1. Let parseResult be ParseText(patternText, Pattern[+UnicodeMode, ~UnicodeSetsMode, +NamedCaptureGroups]).
  4. Else,
    1. Let parseResult be ParseText(patternText, Pattern[~UnicodeMode, ~UnicodeSetsMode, +NamedCaptureGroups]).
  5. Return parseResult.

22.2.4 The RegExp Constructor

The RegExp constructor:

  • is %RegExp%.
  • is the initial value of the "RegExp" property of the global object.
  • creates and initializes a new RegExp object when called as a constructor.
  • when called as a function rather than as a constructor, returns either a new RegExp object, or the argument itself if the only argument is a RegExp object.
  • may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified RegExp behaviour must include a super call to the RegExp constructor to create and initialize subclass instances with the necessary internal slots.

22.2.4.1 RegExp ( pattern, flags )

This function performs the following steps when called:

  1. Let patternIsRegExp be ? IsRegExp(pattern).
  2. If NewTarget is undefined, then
    1. Let newTarget be the active function object.
    2. If patternIsRegExp is true and flags is undefined, then
      1. Let patternConstructor be ? Get(pattern, "constructor").
      2. If SameValue(newTarget, patternConstructor) is true, return pattern.
  3. Else,
    1. Let newTarget be NewTarget.
  4. If pattern is an Object and pattern has a [[RegExpMatcher]] internal slot, then
    1. Let P be pattern.[[OriginalSource]].
    2. If flags is undefined, let F be pattern.[[OriginalFlags]].
    3. Else, let F be flags.
  5. Else if patternIsRegExp is true, then
    1. Let P be ? Get(pattern, "source").
    2. If flags is undefined, then
      1. Let F be ? Get(pattern, "flags").
    3. Else,
      1. Let F be flags.
  6. Else,
    1. Let P be pattern.
    2. Let F be flags.
  7. Let O be ? RegExpAlloc(newTarget).
  8. Return ? RegExpInitialize(O, P, F).
Note

If pattern is supplied using a StringLiteral, the usual escape sequence substitutions are performed before the String is processed by this function. If pattern must contain an escape sequence to be recognized by this function, any U+005C (REVERSE SOLIDUS) code points must be escaped within the StringLiteral to prevent them being removed when the contents of the StringLiteral are formed.

22.2.5 Properties of the RegExp Constructor

The RegExp constructor:

  • has a [[Prototype]] internal slot whose value is %Function.prototype%.
  • has the following properties:

22.2.5.1 RegExp.prototype

The initial value of RegExp.prototype is the RegExp prototype object.

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

22.2.5.2 get RegExp [ @@species ]

RegExp[@@species] is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Return the this value.

The value of the "name" property of this function is "get [Symbol.species]".

Note

RegExp prototype methods normally use their this value's constructor to create a derived object. However, a subclass constructor may over-ride that default behaviour by redefining its @@species property.

22.2.6 Properties of the RegExp Prototype Object

The RegExp prototype object:

  • is %RegExp.prototype%.
  • is an ordinary object.
  • is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.
  • has a [[Prototype]] internal slot whose value is %Object.prototype%.
Note

The RegExp prototype object does not have a "valueOf" property of its own; however, it inherits the "valueOf" property from the Object prototype object.

22.2.6.1 RegExp.prototype.constructor

The initial value of RegExp.prototype.constructor is %RegExp%.

22.2.6.2 RegExp.prototype.exec ( string )

This method searches string for an occurrence of the regular expression pattern and returns an Array containing the results of the match, or null if string did not match.

It performs the following steps when called:

  1. Let R be the this value.
  2. Perform ? RequireInternalSlot(R, [[RegExpMatcher]]).
  3. Let S be ? ToString(string).
  4. Return ? RegExpBuiltinExec(R, S).

22.2.6.3 get RegExp.prototype.dotAll

RegExp.prototype.dotAll is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0073 (LATIN SMALL LETTER S).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.4 get RegExp.prototype.flags

RegExp.prototype.flags is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. If R is not an Object, throw a TypeError exception.
  3. Let codeUnits be a new empty List.
  4. Let hasIndices be ToBoolean(? Get(R, "hasIndices")).
  5. If hasIndices is true, append the code unit 0x0064 (LATIN SMALL LETTER D) to codeUnits.
  6. Let global be ToBoolean(? Get(R, "global")).
  7. If global is true, append the code unit 0x0067 (LATIN SMALL LETTER G) to codeUnits.
  8. Let ignoreCase be ToBoolean(? Get(R, "ignoreCase")).
  9. If ignoreCase is true, append the code unit 0x0069 (LATIN SMALL LETTER I) to codeUnits.
  10. Let multiline be ToBoolean(? Get(R, "multiline")).
  11. If multiline is true, append the code unit 0x006D (LATIN SMALL LETTER M) to codeUnits.
  12. Let dotAll be ToBoolean(? Get(R, "dotAll")).
  13. If dotAll is true, append the code unit 0x0073 (LATIN SMALL LETTER S) to codeUnits.
  14. Let unicode be ToBoolean(? Get(R, "unicode")).
  15. If unicode is true, append the code unit 0x0075 (LATIN SMALL LETTER U) to codeUnits.
  16. Let unicodeSets be ToBoolean(? Get(R, "unicodeSets")).
  17. If unicodeSets is true, append the code unit 0x0076 (LATIN SMALL LETTER V) to codeUnits.
  18. Let sticky be ToBoolean(? Get(R, "sticky")).
  19. If sticky is true, append the code unit 0x0079 (LATIN SMALL LETTER Y) to codeUnits.
  20. Return the String value whose code units are the elements of the List codeUnits. If codeUnits has no elements, the empty String is returned.

22.2.6.4.1 RegExpHasFlag ( R, codeUnit )

The abstract operation RegExpHasFlag takes arguments R (an ECMAScript language value) and codeUnit (a code unit) and returns either a normal completion containing either a Boolean or undefined, or a throw completion. It performs the following steps when called:

  1. If R is not an Object, throw a TypeError exception.
  2. If R does not have an [[OriginalFlags]] internal slot, then
    1. If SameValue(R, %RegExp.prototype%) is true, return undefined.
    2. Otherwise, throw a TypeError exception.
  3. Let flags be R.[[OriginalFlags]].
  4. If flags contains codeUnit, return true.
  5. Return false.

22.2.6.5 get RegExp.prototype.global

RegExp.prototype.global is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0067 (LATIN SMALL LETTER G).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.6 get RegExp.prototype.hasIndices

RegExp.prototype.hasIndices is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0064 (LATIN SMALL LETTER D).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.7 get RegExp.prototype.ignoreCase

RegExp.prototype.ignoreCase is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0069 (LATIN SMALL LETTER I).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.8 RegExp.prototype [ @@match ] ( string )

This method performs the following steps when called:

  1. Let rx be the this value.
  2. If rx is not an Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let flags be ? ToString(? Get(rx, "flags")).
  5. If flags does not contain "g", then
    1. Return ? RegExpExec(rx, S).
  6. Else,
    1. If flags contains "u" or flags contains "v", let fullUnicode be true. Otherwise, let fullUnicode be false.
    2. Perform ? Set(rx, "lastIndex", +0𝔽, true).
    3. Let A be ! ArrayCreate(0).
    4. Let n be 0.
    5. Repeat,
      1. Let result be ? RegExpExec(rx, S).
      2. If result is null, then
        1. If n = 0, return null.
        2. Return A.
      3. Else,
        1. Let matchStr be ? ToString(? Get(result, "0")).
        2. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(n)), matchStr).
        3. If matchStr is the empty String, then
          1. Let thisIndex be (? ToLength(? Get(rx, "lastIndex"))).
          2. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
          3. Perform ? Set(rx, "lastIndex", 𝔽(nextIndex), true).
        4. Set n to n + 1.

The value of the "name" property of this method is "[Symbol.match]".

Note

The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions. The absence of a @@match property or the existence of such a property whose value does not Boolean coerce to true indicates that the object is not intended to be used as a regular expression object.

22.2.6.9 RegExp.prototype [ @@matchAll ] ( string )

This method performs the following steps when called:

  1. Let R be the this value.
  2. If R is not an Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let C be ? SpeciesConstructor(R, %RegExp%).
  5. Let flags be ? ToString(? Get(R, "flags")).
  6. Let matcher be ? Construct(C, « R, flags »).
  7. Let lastIndex be ? ToLength(? Get(R, "lastIndex")).
  8. Perform ? Set(matcher, "lastIndex", lastIndex, true).
  9. If flags contains "g", let global be true.
  10. Else, let global be false.
  11. If flags contains "u" or flags contains "v", let fullUnicode be true.
  12. Else, let fullUnicode be false.
  13. Return CreateRegExpStringIterator(matcher, S, global, fullUnicode).

The value of the "name" property of this method is "[Symbol.matchAll]".

22.2.6.10 get RegExp.prototype.multiline

RegExp.prototype.multiline is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x006D (LATIN SMALL LETTER M).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.11 RegExp.prototype [ @@replace ] ( string, replaceValue )

This method performs the following steps when called:

  1. Let rx be the this value.
  2. If rx is not an Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let lengthS be the length of S.
  5. Let functionalReplace be IsCallable(replaceValue).
  6. If functionalReplace is false, then
    1. Set replaceValue to ? ToString(replaceValue).
  7. Let flags be ? ToString(? Get(rx, "flags")).
  8. If flags contains "g", let global be true. Otherwise, let global be false.
  9. If global is true, then
    1. Perform ? Set(rx, "lastIndex", +0𝔽, true).
  10. Let results be a new empty List.
  11. Let done be false.
  12. Repeat, while done is false,
    1. Let result be ? RegExpExec(rx, S).
    2. If result is null, then
      1. Set done to true.
    3. Else,
      1. Append result to results.
      2. If global is false, then
        1. Set done to true.
      3. Else,
        1. Let matchStr be ? ToString(? Get(result, "0")).
        2. If matchStr is the empty String, then
          1. Let thisIndex be (? ToLength(? Get(rx, "lastIndex"))).
          2. If flags contains "u" or flags contains "v", let fullUnicode be true. Otherwise, let fullUnicode be false.
          3. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
          4. Perform ? Set(rx, "lastIndex", 𝔽(nextIndex), true).
  13. Let accumulatedResult be the empty String.
  14. Let nextSourcePosition be 0.
  15. For each element result of results, do
    1. Let resultLength be ? LengthOfArrayLike(result).
    2. Let nCaptures be max(resultLength - 1, 0).
    3. Let matched be ? ToString(? Get(result, "0")).
    4. Let matchLength be the length of matched.
    5. Let position be ? ToIntegerOrInfinity(? Get(result, "index")).
    6. Set position to the result of clamping position between 0 and lengthS.
    7. Let captures be a new empty List.
    8. Let n be 1.
    9. Repeat, while nnCaptures,
      1. Let capN be ? Get(result, ! ToString(𝔽(n))).
      2. If capN is not undefined, then
        1. Set capN to ? ToString(capN).
      3. Append capN to captures.
      4. NOTE: When n = 1, the preceding step puts the first element into captures (at index 0). More generally, the nth capture (the characters captured by the nth set of capturing parentheses) is at captures[n - 1].
      5. Set n to n + 1.
    10. Let namedCaptures be ? Get(result, "groups").
    11. If functionalReplace is true, then
      1. Let replacerArgs be the list-concatenation of « matched », captures, and « 𝔽(position), S ».
      2. If namedCaptures is not undefined, then
        1. Append namedCaptures to replacerArgs.
      3. Let replValue be ? Call(replaceValue, undefined, replacerArgs).
      4. Let replacement be ? ToString(replValue).
    12. Else,
      1. If namedCaptures is not undefined, then
        1. Set namedCaptures to ? ToObject(namedCaptures).
      2. Let replacement be ? GetSubstitution(matched, S, position, captures, namedCaptures, replaceValue).
    13. If positionnextSourcePosition, then
      1. NOTE: position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of rx. In such cases, the corresponding substitution is ignored.
      2. Set accumulatedResult to the string-concatenation of accumulatedResult, the substring of S from nextSourcePosition to position, and replacement.
      3. Set nextSourcePosition to position + matchLength.
  16. If nextSourcePositionlengthS, return accumulatedResult.
  17. Return the string-concatenation of accumulatedResult and the substring of S from nextSourcePosition.

The value of the "name" property of this method is "[Symbol.replace]".

22.2.6.12 RegExp.prototype [ @@search ] ( string )

This method performs the following steps when called:

  1. Let rx be the this value.
  2. If rx is not an Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let previousLastIndex be ? Get(rx, "lastIndex").
  5. If SameValue(previousLastIndex, +0𝔽) is false, then
    1. Perform ? Set(rx, "lastIndex", +0𝔽, true).
  6. Let result be ? RegExpExec(rx, S).
  7. Let currentLastIndex be ? Get(rx, "lastIndex").
  8. If SameValue(currentLastIndex, previousLastIndex) is false, then
    1. Perform ? Set(rx, "lastIndex", previousLastIndex, true).
  9. If result is null, return -1𝔽.
  10. Return ? Get(result, "index").

The value of the "name" property of this method is "[Symbol.search]".

Note

The "lastIndex" and "global" properties of this RegExp object are ignored when performing the search. The "lastIndex" property is left unchanged.

22.2.6.13 get RegExp.prototype.source

RegExp.prototype.source is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. If R is not an Object, throw a TypeError exception.
  3. If R does not have an [[OriginalSource]] internal slot, then
    1. If SameValue(R, %RegExp.prototype%) is true, return "(?:)".
    2. Otherwise, throw a TypeError exception.
  4. Assert: R has an [[OriginalFlags]] internal slot.
  5. Let src be R.[[OriginalSource]].
  6. Let flags be R.[[OriginalFlags]].
  7. Return EscapeRegExpPattern(src, flags).

22.2.6.13.1 EscapeRegExpPattern ( P, F )

The abstract operation EscapeRegExpPattern takes arguments P (a String) and F (a String) and returns a String. It performs the following steps when called:

  1. If F contains "v", then
    1. Let patternSymbol be Pattern[+UnicodeMode, +UnicodeSetsMode].
  2. Else if F contains "u", then
    1. Let patternSymbol be Pattern[+UnicodeMode, ~UnicodeSetsMode].
  3. Else,
    1. Let patternSymbol be Pattern[~UnicodeMode, ~UnicodeSetsMode].
  4. Let S be a String in the form of a patternSymbol equivalent to P interpreted as UTF-16 encoded Unicode code points (6.1.4), in which certain code points are escaped as described below. S may or may not differ from P; however, the Abstract Closure that would result from evaluating S as a patternSymbol must behave identically to the Abstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for P and F must produce identical results.
  5. The code points / or any LineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that the string-concatenation of "/", S, "/", and F can be parsed (in an appropriate lexical context) as a RegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is "/", then S could be "\/" or "\u002F", among other possibilities, but not "/", because /// followed by F would be parsed as a SingleLineComment rather than a RegularExpressionLiteral. If P is the empty String, this specification can be met by letting S be "(?:)".
  6. Return S.

22.2.6.14 RegExp.prototype [ @@split ] ( string, limit )

Note 1

This method returns an Array into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the this value regular expression; these occurrences are not part of any String in the returned array, but serve to divide up the String value.

The this value may be an empty regular expression or a regular expression that can match an empty String. In this case, the regular expression does not match the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of the previous separator match. (For example, if the regular expression matches the empty String, the String is split up into individual code unit elements; the length of the result array equals the length of the String, and each substring contains one code unit.) Only the first match at a given index of the String is considered, even if backtracking could yield a non-empty substring match at that index. (For example, /a*?/[Symbol.split]("ab") evaluates to the array ["a", "b"], while /a*/[Symbol.split]("ab") evaluates to the array ["","b"].)

If string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.

If the regular expression contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. For example,

/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")

evaluates to the array

["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]

If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.

This method performs the following steps when called:

  1. Let rx be the this value.
  2. If rx is not an Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let C be ? SpeciesConstructor(rx, %RegExp%).
  5. Let flags be ? ToString(? Get(rx, "flags")).
  6. If flags contains "u" or flags contains "v", let unicodeMatching be true.
  7. Else, let unicodeMatching be false.
  8. If flags contains "y", let newFlags be flags.
  9. Else, let newFlags be the string-concatenation of flags and "y".
  10. Let splitter be ? Construct(C, « rx, newFlags »).
  11. Let A be ! ArrayCreate(0).
  12. Let lengthA be 0.
  13. If limit is undefined, let lim be 232 - 1; else let lim be (? ToUint32(limit)).
  14. If lim = 0, return A.
  15. If S is the empty String, then
    1. Let z be ? RegExpExec(splitter, S).
    2. If z is not null, return A.
    3. Perform ! CreateDataPropertyOrThrow(A, "0", S).
    4. Return A.
  16. Let size be the length of S.
  17. Let p be 0.
  18. Let q be p.
  19. Repeat, while q < size,
    1. Perform ? Set(splitter, "lastIndex", 𝔽(q), true).
    2. Let z be ? RegExpExec(splitter, S).
    3. If z is null, then
      1. Set q to AdvanceStringIndex(S, q, unicodeMatching).
    4. Else,
      1. Let e be (? ToLength(? Get(splitter, "lastIndex"))).
      2. Set e to min(e, size).
      3. If e = p, then
        1. Set q to AdvanceStringIndex(S, q, unicodeMatching).
      4. Else,
        1. Let T be the substring of S from p to q.
        2. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), T).
        3. Set lengthA to lengthA + 1.
        4. If lengthA = lim, return A.
        5. Set p to e.
        6. Let numberOfCaptures be ? LengthOfArrayLike(z).
        7. Set numberOfCaptures to max(numberOfCaptures - 1, 0).
        8. Let i be 1.
        9. Repeat, while inumberOfCaptures,
          1. Let nextCapture be ? Get(z, ! ToString(𝔽(i))).
          2. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), nextCapture).
          3. Set i to i + 1.
          4. Set lengthA to lengthA + 1.
          5. If lengthA = lim, return A.
        10. Set q to p.
  20. Let T be the substring of S from p to size.
  21. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), T).
  22. Return A.

The value of the "name" property of this method is "[Symbol.split]".

Note 2

This method ignores the value of the "global" and "sticky" properties of this RegExp object.

22.2.6.15 get RegExp.prototype.sticky

RegExp.prototype.sticky is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0079 (LATIN SMALL LETTER Y).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.16 RegExp.prototype.test ( S )

This method performs the following steps when called:

  1. Let R be the this value.
  2. If R is not an Object, throw a TypeError exception.
  3. Let string be ? ToString(S).
  4. Let match be ? RegExpExec(R, string).
  5. If match is not null, return true; else return false.

22.2.6.17 RegExp.prototype.toString ( )

  1. Let R be the this value.
  2. If R is not an Object, throw a TypeError exception.
  3. Let pattern be ? ToString(? Get(R, "source")).
  4. Let flags be ? ToString(? Get(R, "flags")).
  5. Let result be the string-concatenation of "/", pattern, "/", and flags.
  6. Return result.
Note

The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.

22.2.6.18 get RegExp.prototype.unicode

RegExp.prototype.unicode is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0075 (LATIN SMALL LETTER U).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6.19 get RegExp.prototype.unicodeSets

RegExp.prototype.unicodeSets is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps when called:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0076 (LATIN SMALL LETTER V).
  3. Return ? RegExpHasFlag(R, cu).

22.2.7 Abstract Operations for RegExp Matching

22.2.7.1 RegExpExec ( R, S )

The abstract operation RegExpExec takes arguments R (an Object) and S (a String) and returns either a normal completion containing either an Object or null, or a throw completion. It performs the following steps when called:

  1. Let exec be ? Get(R, "exec").
  2. If IsCallable(exec) is true, then
    1. Let result be ? Call(exec, R, « S »).
    2. If result is not an Object and result is not null, throw a TypeError exception.
    3. Return result.
  3. Perform ? RequireInternalSlot(R, [[RegExpMatcher]]).
  4. Return ? RegExpBuiltinExec(R, S).
Note

If a callable "exec" property is not found this algorithm falls back to attempting to use the built-in RegExp matching algorithm. This provides compatible behaviour for code written for prior editions where most built-in algorithms that use regular expressions did not perform a dynamic property lookup of "exec".

22.2.7.2 RegExpBuiltinExec ( R, S )

The abstract operation RegExpBuiltinExec takes arguments R (an initialized RegExp instance) and S (a String) and returns either a normal completion containing either an Array exotic object or null, or a throw completion. It performs the following steps when called:

  1. Let length be the length of S.
  2. Let lastIndex be (? ToLength(? Get(R, "lastIndex"))).
  3. Let flags be R.[[OriginalFlags]].
  4. If flags contains "g", let global be true; else let global be false.
  5. If flags contains "y", let sticky be true; else let sticky be false.
  6. If flags contains "d", let hasIndices be true; else let hasIndices be false.
  7. If global is false and sticky is false, set lastIndex to 0.
  8. Let matcher be R.[[RegExpMatcher]].
  9. If flags contains "u" or flags contains "v", let fullUnicode be true; else let fullUnicode be false.
  10. Let matchSucceeded be false.
  11. If fullUnicode is true, let input be StringToCodePoints(S). Otherwise, let input be a List whose elements are the code units that are the elements of S.
  12. NOTE: Each element of input is considered to be a character.
  13. Repeat, while matchSucceeded is false,
    1. If lastIndex > length, then
      1. If global is true or sticky is true, then
        1. Perform ? Set(R, "lastIndex", +0𝔽, true).
      2. Return null.
    2. Let inputIndex be the index into input of the character that was obtained from element lastIndex of S.
    3. Let r be matcher(input, inputIndex).
    4. If r is failure, then
      1. If sticky is true, then
        1. Perform ? Set(R, "lastIndex", +0𝔽, true).
        2. Return null.
      2. Set lastIndex to AdvanceStringIndex(S, lastIndex, fullUnicode).
    5. Else,
      1. Assert: r is a MatchState.
      2. Set matchSucceeded to true.
  14. Let e be r.[[EndIndex]].
  15. If fullUnicode is true, set e to GetStringIndex(S, e).
  16. If global is true or sticky is true, then
    1. Perform ? Set(R, "lastIndex", 𝔽(e), true).
  17. Let n be the number of elements in r.[[Captures]].
  18. Assert: n = R.[[RegExpRecord]].[[CapturingGroupsCount]].
  19. Assert: n < 232 - 1.
  20. Let A be ! ArrayCreate(n + 1).
  21. Assert: The mathematical value of A's "length" property is n + 1.
  22. Perform ! CreateDataPropertyOrThrow(A, "index", 𝔽(lastIndex)).
  23. Perform ! CreateDataPropertyOrThrow(A, "input", S).
  24. Let match be the Match Record { [[StartIndex]]: lastIndex, [[EndIndex]]: e }.
  25. Let indices be a new empty List.
  26. Let groupNames be a new empty List.
  27. Append match to indices.
  28. Let matchedSubstr be GetMatchString(S, match).
  29. Perform ! CreateDataPropertyOrThrow(A, "0", matchedSubstr).
  30. If R contains any GroupName, then
    1. Let groups be OrdinaryObjectCreate(null).
    2. Let hasGroups be true.
  31. Else,
    1. Let groups be undefined.
    2. Let hasGroups be false.
  32. Perform ! CreateDataPropertyOrThrow(A, "groups", groups).
  33. For each integer i such that 1 ≤ in, in ascending order, do
    1. Let captureI be ith element of r.[[Captures]].
    2. If captureI is undefined, then
      1. Let capturedValue be undefined.
      2. Append undefined to indices.
    3. Else,
      1. Let captureStart be captureI.[[StartIndex]].
      2. Let captureEnd be captureI.[[EndIndex]].
      3. If fullUnicode is true, then
        1. Set captureStart to GetStringIndex(S, captureStart).
        2. Set captureEnd to GetStringIndex(S, captureEnd).
      4. Let capture be the Match Record { [[StartIndex]]: captureStart, [[EndIndex]]: captureEnd }.
      5. Let capturedValue be GetMatchString(S, capture).
      6. Append capture to indices.
    4. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), capturedValue).
    5. If the ith capture of R was defined with a GroupName, then
      1. Let s be the CapturingGroupName of that GroupName.
      2. Perform ! CreateDataPropertyOrThrow(groups, s, capturedValue).
      3. Append s to groupNames.
    6. Else,
      1. Append undefined to groupNames.
  34. If hasIndices is true, then
    1. Let indicesArray be MakeMatchIndicesIndexPairArray(S, indices, groupNames, hasGroups).
    2. Perform ! CreateDataPropertyOrThrow(A, "indices", indicesArray).
  35. Return A.

22.2.7.3 AdvanceStringIndex ( S, index, unicode )

The abstract operation AdvanceStringIndex takes arguments S (a String), index (a non-negative integer), and unicode (a Boolean) and returns an integer. It performs the following steps when called:

  1. Assert: index ≤ 253 - 1.
  2. If unicode is false, return index + 1.
  3. Let length be the length of S.
  4. If index + 1 ≥ length, return index + 1.
  5. Let cp be CodePointAt(S, index).
  6. Return index + cp.[[CodeUnitCount]].

22.2.7.4 GetStringIndex ( S, codePointIndex )

The abstract operation GetStringIndex takes arguments S (a String) and codePointIndex (a non-negative integer) and returns a non-negative integer. It interprets S as a sequence of UTF-16 encoded code points, as described in 6.1.4, and returns the code unit index corresponding to code point index codePointIndex when such an index exists. Otherwise, it returns the length of S. It performs the following steps when called:

  1. If S is the empty String, return 0.
  2. Let len be the length of S.
  3. Let codeUnitCount be 0.
  4. Let codePointCount be 0.
  5. Repeat, while codeUnitCount < len,
    1. If codePointCount = codePointIndex, return codeUnitCount.
    2. Let cp be CodePointAt(S, codeUnitCount).
    3. Set codeUnitCount to codeUnitCount + cp.[[CodeUnitCount]].
    4. Set codePointCount to codePointCount + 1.
  6. Return len.

22.2.7.5 Match Records

A Match Record is a Record value used to encapsulate the start and end indices of a regular expression match or capture.

Match Records have the fields listed in Table 70.

Table 70: Match Record Fields
Field Name Value Meaning
[[StartIndex]] a non-negative integer The number of code units from the start of a string at which the match begins (inclusive).
[[EndIndex]] an integer[[StartIndex]] The number of code units from the start of a string at which the match ends (exclusive).

22.2.7.6 GetMatchString ( S, match )

The abstract operation GetMatchString takes arguments S (a String) and match (a Match Record) and returns a String. It performs the following steps when called:

  1. Assert: match.[[StartIndex]]match.[[EndIndex]] ≤ the length of S.
  2. Return the substring of S from match.[[StartIndex]] to match.[[EndIndex]].

22.2.7.7 GetMatchIndexPair ( S, match )

The abstract operation GetMatchIndexPair takes arguments S (a String) and match (a Match Record) and returns an Array. It performs the following steps when called:

  1. Assert: match.[[StartIndex]]match.[[EndIndex]] ≤ the length of S.
  2. Return CreateArrayFromList𝔽(match.[[StartIndex]]), 𝔽(match.[[EndIndex]]) »).

22.2.7.8 MakeMatchIndicesIndexPairArray ( S, indices, groupNames, hasGroups )

The abstract operation MakeMatchIndicesIndexPairArray takes arguments S (a String), indices (a List of either Match Records or undefined), groupNames (a List of either Strings or undefined), and hasGroups (a Boolean) and returns an Array. It performs the following steps when called:

  1. Let n be the number of elements in indices.
  2. Assert: n < 232 - 1.
  3. Assert: groupNames has n - 1 elements.
  4. NOTE: The groupNames List contains elements aligned with the indices List starting at indices[1].
  5. Let A be ! ArrayCreate(n).
  6. If hasGroups is true, then
    1. Let groups be OrdinaryObjectCreate(null).
  7. Else,
    1. Let groups be undefined.
  8. Perform ! CreateDataPropertyOrThrow(A, "groups", groups).
  9. For each integer i such that 0 ≤ i < n, in ascending order, do
    1. Let matchIndices be indices[i].
    2. If matchIndices is not undefined, then
      1. Let matchIndexPair be GetMatchIndexPair(S, matchIndices).
    3. Else,
      1. Let matchIndexPair be undefined.
    4. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), matchIndexPair).
    5. If i > 0 and groupNames[i - 1] is not undefined, then
      1. Assert: groups is not undefined.
      2. Perform ! CreateDataPropertyOrThrow(groups, groupNames[i - 1], matchIndexPair).
  10. Return A.

22.2.8 Properties of RegExp Instances

RegExp instances are ordinary objects that inherit properties from the RegExp prototype object. RegExp instances have internal slots [[OriginalSource]], [[OriginalFlags]], [[RegExpRecord]], and [[RegExpMatcher]]. The value of the [[RegExpMatcher]] internal slot is an Abstract Closure representation of the Pattern of the RegExp object.

Note

Prior to ECMAScript 2015, RegExp instances were specified as having the own data properties "source", "global", "ignoreCase", and "multiline". Those properties are now specified as accessor properties of RegExp.prototype.

RegExp instances also have the following property:

22.2.8.1 lastIndex

The value of the "lastIndex" property specifies the String index at which to start the next match. It is coerced to an integral Number when used (see 22.2.7.2). This property shall have the attributes { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }.

22.2.9 RegExp String Iterator Objects

A RegExp String Iterator is an object, that represents a specific iteration over some specific String instance object, matching against some specific RegExp instance object. There is not a named constructor for RegExp String Iterator objects. Instead, RegExp String Iterator objects are created by calling certain methods of RegExp instance objects.

22.2.9.1 CreateRegExpStringIterator ( R, S, global, fullUnicode )

The abstract operation CreateRegExpStringIterator takes arguments R (an Object), S (a String), global (a Boolean), and fullUnicode (a Boolean) and returns a Generator. It performs the following steps when called:

  1. Let closure be a new Abstract Closure with no parameters that captures R, S, global, and fullUnicode and performs the following steps when called:
    1. Repeat,
      1. Let match be ? RegExpExec(R, S).
      2. If match is null, return undefined.
      3. If global is false, then
        1. Perform ? GeneratorYield(CreateIterResultObject(match, false)).
        2. Return undefined.
      4. Let matchStr be ? ToString(? Get(match, "0")).
      5. If matchStr is the empty String, then
        1. Let thisIndex be (? ToLength(? Get(R, "lastIndex"))).
        2. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
        3. Perform ? Set(R, "lastIndex", 𝔽(nextIndex), true).
      6. Perform ? GeneratorYield(CreateIterResultObject(match, false)).
  2. Return CreateIteratorFromClosure(closure, "%RegExpStringIteratorPrototype%", %RegExpStringIteratorPrototype%).

22.2.9.2 The %RegExpStringIteratorPrototype% Object

The %RegExpStringIteratorPrototype% object:

  • has properties that are inherited by all RegExp String Iterator Objects.
  • is an ordinary object.
  • has a [[Prototype]] internal slot whose value is %IteratorPrototype%.
  • has the following properties:

22.2.9.2.1 %RegExpStringIteratorPrototype%.next ( )

  1. Return ? GeneratorResume(this value, empty, "%RegExpStringIteratorPrototype%").

22.2.9.2.2 %RegExpStringIteratorPrototype% [ @@toStringTag ]

The initial value of the @@toStringTag property is the String value "RegExp String Iterator".

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.