22.2 RegExp (Regular Expression) Objects
A RegExp object contains a regular expression and the associated flags.
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
22.2.1 Patterns
The RegExp
Syntax
Each \u
u
u
\u
The first two lines here are equivalent to CharacterClass.
A number of productions in this section are given alternative definitions in section
22.2.1.1 Static Semantics: Early Errors
This section is amended in
-
It is a Syntax Error if
CountLeftCapturingParensWithin (Pattern ) ≥ 232 - 1. -
It is a Syntax Error if
Pattern contains two or moreGroupSpecifier s for whichCapturingGroupName ofGroupSpecifier is the same.
-
It is a Syntax Error if the MV of the first
DecimalDigits is strictly greater than the MV of the secondDecimalDigits .
-
It is a Syntax Error if
GroupSpecifiersThatMatch (GroupName ) is empty.
-
It is a Syntax Error if the
CapturingGroupNumber ofDecimalEscape is strictly greater thanCountLeftCapturingParensWithin (thePattern containingAtomEscape ).
-
It is a Syntax Error if
IsCharacterClass of the firstClassAtom istrue orIsCharacterClass of the secondClassAtom istrue . -
It is a Syntax Error if
IsCharacterClass of the firstClassAtom isfalse ,IsCharacterClass of the secondClassAtom isfalse , and theCharacterValue of the firstClassAtom is strictly greater than theCharacterValue of the secondClassAtom .
-
It is a Syntax Error if
IsCharacterClass ofClassAtomNoDash istrue orIsCharacterClass ofClassAtom istrue . -
It is a Syntax Error if
IsCharacterClass ofClassAtomNoDash isfalse ,IsCharacterClass ofClassAtom isfalse , and theCharacterValue ofClassAtomNoDash is strictly greater than theCharacterValue ofClassAtom .
-
It is a Syntax Error if the
CharacterValue ofRegExpUnicodeEscapeSequence is not the numeric value of some code point matched by theIdentifierStartChar lexical grammar production.
-
It is a Syntax Error if
RegExpIdentifierCodePoint ofRegExpIdentifierStart is not matched by theUnicodeIDStart lexical grammar production.
-
It is a Syntax Error if the
CharacterValue ofRegExpUnicodeEscapeSequence is not the numeric value of some code point matched by theIdentifierPartChar lexical grammar production.
-
It is a Syntax Error if
RegExpIdentifierCodePoint ofRegExpIdentifierPart is not matched by theUnicodeIDContinue lexical grammar production.
-
It is a Syntax Error if the
source text matched by UnicodePropertyName is not a Unicodeproperty name or property alias listed in the “Property name and aliases” column ofTable 67 . -
It is a Syntax Error if the
source text matched by UnicodePropertyValue is not a property value or property value alias for the Unicode property or property alias given by thesource text matched by UnicodePropertyName listed inPropertyValueAliases.txt
.
-
It is a Syntax Error if the
source text matched by LoneUnicodePropertyNameOrValue is not a Unicode property value or property value alias for the General_Category (gc) property listed inPropertyValueAliases.txt
, nor a binary property or binary property alias listed in the “Property name and aliases” column ofTable 68 , nor a binary property of strings listed in the “Property name ” column ofTable 69 . -
It is a Syntax Error if the enclosing
Pattern does not have a [UnicodeSetsMode] parameter and thesource text matched by LoneUnicodePropertyNameOrValue is a binary property of strings listed in the “Property name ” column ofTable 69 .
-
It is a Syntax Error if
MayContainStrings of theUnicodePropertyValueExpression istrue .
-
It is a Syntax Error if
MayContainStrings of theClassContents istrue .
-
It is a Syntax Error if
MayContainStrings of theClassContents istrue .
-
It is a Syntax Error if the
CharacterValue of the firstClassSetCharacter is strictly greater than theCharacterValue of the secondClassSetCharacter .
22.2.1.2 Static Semantics: CountLeftCapturingParensWithin ( node )
The abstract operation CountLeftCapturingParensWithin takes argument node (a (
pattern character that is matched by the (
terminal of the
This section is amended in
It performs the following steps when called:
Assert : node is an instance of a production inthe RegExp Pattern grammar .- Return the number of
Atom :: ( GroupSpecifier opt Disjunction ) Parse Nodes contained within node.
22.2.1.3 Static Semantics: CountLeftCapturingParensBefore ( node )
The abstract operation CountLeftCapturingParensBefore takes argument node (a
This section is amended in
It performs the following steps when called:
Assert : node is an instance of a production inthe RegExp Pattern grammar .- Let pattern be the
Pattern containing node. - Return the number of
Atom :: ( GroupSpecifier opt Disjunction ) Parse Nodes contained within pattern that either occur before node or contain node.
22.2.1.4 Static Semantics: CapturingGroupNumber
The
This section is amended in
It is defined piecewise over the following productions:
- Return the MV of
NonZeroDigit .
- Let n be the number of code points in
DecimalDigits . - Return (the MV of
NonZeroDigit × 10n plus the MV ofDecimalDigits ).
The definitions of “the MV of
22.2.1.5 Static Semantics: IsCharacterClass
The
This section is amended in
It is defined piecewise over the following productions:
- Return
false .
- Return
true .
22.2.1.6 Static Semantics: CharacterValue
The
This section is amended in
It is defined piecewise over the following productions:
- Return the numeric value of U+002D (HYPHEN-MINUS).
- Let ch be the code point matched by
SourceCharacter . - Return the numeric value of ch.
- Return the numeric value of U+0008 (BACKSPACE).
- Return the numeric value of U+002D (HYPHEN-MINUS).
- Return the numeric value according to
Table 65 .
ControlEscape | Numeric Value | Code Point | Unicode Name | Symbol |
---|---|---|---|---|
t
|
9 |
U+0009
|
CHARACTER TABULATION | <HT> |
n
|
10 |
U+000A
|
LINE FEED (LF) | <LF> |
v
|
11 |
U+000B
|
LINE TABULATION | <VT> |
f
|
12 |
U+000C
|
FORM FEED (FF) | <FF> |
r
|
13 |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
- Let ch be the code point matched by
AsciiLetter . - Let i be the numeric value of ch.
- Return the remainder of dividing i by 32.
- Return the numeric value of U+0000 (NULL).
\0
represents the <NUL> character and cannot be followed by a decimal digit.
- Return the MV of
HexEscapeSequence .
- Let lead be the
CharacterValue ofHexLeadSurrogate . - Let trail be the
CharacterValue ofHexTrailSurrogate . - Let cp be
UTF16SurrogatePairToCodePoint (lead, trail). - Return the numeric value of cp.
- Return the MV of
Hex4Digits .
- Return the MV of
CodePoint .
- Return the MV of
Hex4Digits .
- Let ch be the code point matched by
IdentityEscape . - Return the numeric value of ch.
- Let ch be the code point matched by
SourceCharacter . - Return the numeric value of ch.
- Let ch be the code point matched by
ClassSetReservedPunctuator . - Return the numeric value of ch.
- Return the numeric value of U+0008 (BACKSPACE).
22.2.1.7 Static Semantics: MayContainStrings
The
- Return
false .
- If the
source text matched by LoneUnicodePropertyNameOrValue is a binary property of strings listed in the “Property name ” column ofTable 69 , returntrue . - Return
false .
- If the
ClassUnion is present, returnMayContainStrings of theClassUnion . - Return
false .
- If
MayContainStrings of theClassSetOperand istrue , returntrue . - If
ClassUnion is present, returnMayContainStrings of theClassUnion . - Return
false .
- If
MayContainStrings of the firstClassSetOperand isfalse , returnfalse . - If
MayContainStrings of the secondClassSetOperand isfalse , returnfalse . - Return
true .
- If
MayContainStrings of theClassIntersection isfalse , returnfalse . - If
MayContainStrings of theClassSetOperand isfalse , returnfalse . - Return
true .
- Return
MayContainStrings of the firstClassSetOperand .
- Return
MayContainStrings of theClassSubtraction .
- If
MayContainStrings of theClassString istrue , returntrue . - Return
MayContainStrings of theClassStringDisjunctionContents .
- Return
true .
- Return
MayContainStrings of theNonEmptyClassString .
- If
NonEmptyClassString is present, returntrue . - Return
false .
22.2.1.8 Static Semantics: GroupSpecifiersThatMatch ( thisGroupName )
The abstract operation GroupSpecifiersThatMatch takes argument thisGroupName (a
- Let name be the
CapturingGroupName of thisGroupName. - Let pattern be the
Pattern containing thisGroupName. - Let result be a new empty
List . - For each
GroupSpecifier gs that pattern contains, do- If the
CapturingGroupName of gs is name, then- Append gs to result.
- If the
- Return result.
22.2.1.9 Static Semantics: CapturingGroupName
The
- Let idTextUnescaped be
RegExpIdentifierCodePoints ofRegExpIdentifierName . - Return
CodePointsToString (idTextUnescaped).
22.2.1.10 Static Semantics: RegExpIdentifierCodePoints
The
- Let cp be
RegExpIdentifierCodePoint ofRegExpIdentifierStart . - Return « cp ».
- Let cps be
RegExpIdentifierCodePoints of the derivedRegExpIdentifierName . - Let cp be
RegExpIdentifierCodePoint ofRegExpIdentifierPart . - Return the
list-concatenation of cps and « cp ».
22.2.1.11 Static Semantics: RegExpIdentifierCodePoint
The
- Return the code point matched by
IdentifierStartChar .
- Return the code point matched by
IdentifierPartChar .
- Return the code point whose numeric value is the
CharacterValue ofRegExpUnicodeEscapeSequence .
- Let lead be the code unit whose numeric value is the numeric value of the code point matched by
UnicodeLeadSurrogate . - Let trail be the code unit whose numeric value is the numeric value of the code point matched by
UnicodeTrailSurrogate . - Return
UTF16SurrogatePairToCodePoint (lead, trail).
22.2.2 Pattern Semantics
A regular expression pattern is converted into an
A u
nor a v
. Otherwise, it is a Unicode pattern. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (
The syntax and semantics of
For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character)
Patterns are passed to the RegExp
An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.
22.2.2.1 Notation
The descriptions below use the following internal data structures:
-
A CharSetElement is one of the two following entities:
-
If rer.[[UnicodeSets]] is
false , then a CharSetElement is a character in the sense of the Pattern Semantics above. -
If rer.[[UnicodeSets]] is
true , then a CharSetElement is a sequence whose elements are characters in the sense of the Pattern Semantics above. This includes the empty sequence, sequences of one character, and sequences of more than one character. For convenience, when working with CharSetElements of this kind, an individual character is treated interchangeably with a sequence of one character.
-
If rer.[[UnicodeSets]] is
- A CharSet is a mathematical set of CharSetElements.
-
A CaptureRange is a
Record { [[StartIndex]], [[EndIndex]] } that represents the range of characters included in a capture, where [[StartIndex]] is aninteger representing the start index (inclusive) of the range within Input, and [[EndIndex]] is aninteger representing the end index (exclusive) of the range within Input. For anyCaptureRange , these indices must satisfy the invariant that [[StartIndex]] ≤ [[EndIndex]]. -
A MatchState is a
Record { [[Input]], [[EndIndex]], [[Captures]] } where [[Input]] is aList of characters representing the String being matched, [[EndIndex]] is aninteger , and [[Captures]] is aList of values, one for eachleft-capturing parenthesis in the pattern. States are used to represent partial match states in the regular expression matching algorithms. The [[EndIndex]] is one plus the index of the last input character matched so far by the pattern, while [[Captures]] holds the results of capturing parentheses. The nth element of [[Captures]] is either aCaptureRange representing the range of characters captured by the nth set of capturing parentheses, orundefined if the nth set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process. -
A MatchResult is either a
MatchState or the special tokenfailure that indicates that the match failed. -
A MatcherContinuation is an
Abstract Closure that takes oneMatchState argument and returns aMatchResult result. TheMatcherContinuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against Input, starting at the intermediate state given by itsMatchState argument. If the match succeeds, theMatcherContinuation returns the finalMatchState that it reached; if the match fails, theMatcherContinuation returnsfailure . -
A Matcher is an
Abstract Closure that takes two arguments—aMatchState and aMatcherContinuation —and returns aMatchResult result. AMatcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against theMatchState 's [[Input]], starting at the intermediate state given by itsMatchState argument. TheMatcherContinuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a newMatchState , theMatcher then callsMatcherContinuation on that newMatchState to test if the rest of the pattern can match as well. If it can, theMatcher returns theMatchState returned byMatcherContinuation ; if not, theMatcher may try different choices at its choice points, repeatedly callingMatcherContinuation until it either succeeds or all possibilities have been exhausted.
22.2.2.1.1 RegExp Records
A RegExp Record is a
It has the following fields:
Field Name | Value | Meaning |
---|---|---|
[[IgnoreCase]] | a Boolean | indicates whether |
[[Multiline]] | a Boolean | indicates whether |
[[DotAll]] | a Boolean | indicates whether |
[[Unicode]] | a Boolean | indicates whether |
[[UnicodeSets]] | a Boolean | indicates whether |
[[CapturingGroupsCount]] | a non-negative |
the number of |
22.2.2.2 Runtime Semantics: CompilePattern
The
- Let m be
CompileSubpattern ofDisjunction with arguments rer andforward . - Return a new
Abstract Closure with parameters (Input, index) that captures rer and m and performs the following steps when called:Assert : Input is aList of characters.Assert : 0 ≤ index ≤ the number of elements in Input.- Let c be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let cap be a
List of rer.[[CapturingGroupsCount]]undefined values, indexed 1 through rer.[[CapturingGroupsCount]]. - Let x be the
MatchState { [[Input]]: Input, [[EndIndex]]: index, [[Captures]]: cap }. - Return m(x, c).
A Pattern compiles to an
22.2.2.3 Runtime Semantics: CompileSubpattern
The
This section is amended in
It is defined piecewise over the following productions:
- Let m1 be
CompileSubpattern ofAlternative with arguments rer and direction. - Let m2 be
CompileSubpattern ofDisjunction with arguments rer and direction. - Return
MatchTwoAlternatives (m1, m2).
The |
regular expression operator separates two alternatives. The pattern first tries to match the left |
produce
/a|ab/.exec("abc")
returns the result
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
The order in which the two alternatives are tried is independent of the value of direction.
- Return
EmptyMatcher ().
- Let m1 be
CompileSubpattern ofAlternative with arguments rer and direction. - Let m2 be
CompileSubpattern ofTerm with arguments rer and direction. - Return
MatchSequence (m1, m2, direction).
Consecutive
- Return
CompileAssertion ofAssertion with argument rer.
The resulting
- Return
CompileAtom ofAtom with arguments rer and direction.
- Let m be
CompileAtom ofAtom with arguments rer and direction. - Let q be
CompileQuantifier ofQuantifier . Assert : q.[[Min]] ≤ q.[[Max]].- Let parenIndex be
CountLeftCapturingParensBefore (Term ). - Let parenCount be
CountLeftCapturingParensWithin (Atom ). - Return a new
Matcher with parameters (x, c) that captures m, q, parenIndex, and parenCount and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Return
RepeatMatcher (m, q.[[Min]], q.[[Max]], q.[[Greedy]], x, c, parenIndex, parenCount).
22.2.2.3.1 RepeatMatcher ( m, min, max, greedy, x, c, parenIndex, parenCount )
The abstract operation RepeatMatcher takes arguments m (a
- If max = 0, return c(x).
- Let d be a new
MatcherContinuation with parameters (y) that captures m, min, max, greedy, x, c, parenIndex, and parenCount and performs the following steps when called:Assert : y is aMatchState .- If min = 0 and y.[[EndIndex]] = x.[[EndIndex]], return
failure . - If min = 0, let min2 be 0; otherwise let min2 be min - 1.
- If max = +∞, let max2 be +∞; otherwise let max2 be max - 1.
- Return
RepeatMatcher (m, min2, max2, greedy, y, c, parenIndex, parenCount).
- Let cap be a copy of x.[[Captures]].
- For each
integer k in theinclusive interval from parenIndex + 1 to parenIndex + parenCount, set cap[k] toundefined . - Let Input be x.[[Input]].
- Let e be x.[[EndIndex]].
- Let xr be the
MatchState { [[Input]]: Input, [[EndIndex]]: e, [[Captures]]: cap }. - If min ≠ 0, return m(xr, d).
- If greedy is
false , then- Let z be c(x).
- If z is not
failure , return z. - Return m(xr, d).
- Let z be m(xr, d).
- If z is not
failure , return z. - Return c(x).
An
If the
Compare
/a[a-z]{2,4}/.exec("abcdefghi")
which returns
/a[a-z]{2,4}?/.exec("abcdefghi")
which returns
Consider also
/(aa|aabaac|ba|b|c)*/.exec("aabaac")
which, by the choice point ordering above, returns the array
["aaba", "ba"]
and not any of:
["aabaac", "aabaac"]
["aabaac", "c"]
The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:
"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")
which returns the gcd in unary notation
Step
/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")
which returns the array
["zaacbbbcac", "z", "ac", "a", undefined, "c"]
and not
["zaacbbbcac", "z", "ac", "a", "bbb", "c"]
because each iteration of the outermost *
clears all captured Strings contained in the quantified
Step
/(a*)*/.exec("b")
or the slightly more complicated:
/(a*)b\1+/.exec("baaaac")
which returns the array
["b", ""]
22.2.2.3.2 EmptyMatcher ( )
The abstract operation EmptyMatcher takes no arguments and returns a
- Return a new
Matcher with parameters (x, c) that captures nothing and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Return c(x).
22.2.2.3.3 MatchTwoAlternatives ( m1, m2 )
The abstract operation MatchTwoAlternatives takes arguments m1 (a
- Return a new
Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let r be m1(x, c).
- If r is not
failure , return r. - Return m2(x, c).
22.2.2.3.4 MatchSequence ( m1, m2, direction )
The abstract operation MatchSequence takes arguments m1 (a
- If direction is
forward , then- Return a new
Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures c and m2 and performs the following steps when called:Assert : y is aMatchState .- Return m2(y, c).
- Return m1(x, d).
- Return a new
- Else,
Assert : direction isbackward .- Return a new
Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures c and m1 and performs the following steps when called:Assert : y is aMatchState .- Return m1(y, c).
- Return m2(x, d).
22.2.2.4 Runtime Semantics: CompileAssertion
The
This section is amended in
It is defined piecewise over the following productions:
- Return a new
Matcher with parameters (x, c) that captures rer and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let Input be x.[[Input]].
- Let e be x.[[EndIndex]].
- If e = 0, or if rer.[[Multiline]] is
true and the character Input[e - 1] is matched byLineTerminator , then- Return c(x).
- Return
failure .
Even when the y
flag is used with a pattern, ^
always matches only at the beginning of Input, or (if rer.[[Multiline]] is
- Return a new
Matcher with parameters (x, c) that captures rer and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let Input be x.[[Input]].
- Let e be x.[[EndIndex]].
- Let InputLength be the number of elements in Input.
- If e = InputLength, or if rer.[[Multiline]] is
true and the character Input[e] is matched byLineTerminator , then- Return c(x).
- Return
failure .
- Return a new
Matcher with parameters (x, c) that captures rer and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let Input be x.[[Input]].
- Let e be x.[[EndIndex]].
- Let a be
IsWordChar (rer, Input, e - 1). - Let b be
IsWordChar (rer, Input, e). - If a is
true and b isfalse , or if a isfalse and b istrue , return c(x). - Return
failure .
- Return a new
Matcher with parameters (x, c) that captures rer and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let Input be x.[[Input]].
- Let e be x.[[EndIndex]].
- Let a be
IsWordChar (rer, Input, e - 1). - Let b be
IsWordChar (rer, Input, e). - If a is
true and b istrue , or if a isfalse and b isfalse , return c(x). - Return
failure .
- Let m be
CompileSubpattern ofDisjunction with arguments rer andforward . - Return a new
Matcher with parameters (x, c) that captures m and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let r be m(x, d).
- If r is
failure , returnfailure . Assert : r is aMatchState .- Let cap be r.[[Captures]].
- Let Input be x.[[Input]].
- Let xe be x.[[EndIndex]].
- Let z be the
MatchState { [[Input]]: Input, [[EndIndex]]: xe, [[Captures]]: cap }. - Return c(z).
The form (?=
)
specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside (?=
form (this unusual behaviour is inherited from Perl). This only matters when the
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b
and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
- Let m be
CompileSubpattern ofDisjunction with arguments rer andforward . - Return a new
Matcher with parameters (x, c) that captures m and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let r be m(x, d).
- If r is not
failure , returnfailure . - Return c(x).
The form (?!
)
specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a
not immediately followed by some positive number n of a
's, a b
, another n a
's (specified by the first \2
) and a c
. The second \2
is outside the negative lookahead, so it matches against
["baaabaac", "ba", undefined, "abaac"]
- Let m be
CompileSubpattern ofDisjunction with arguments rer andbackward . - Return a new
Matcher with parameters (x, c) that captures m and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let r be m(x, d).
- If r is
failure , returnfailure . Assert : r is aMatchState .- Let cap be r.[[Captures]].
- Let Input be x.[[Input]].
- Let xe be x.[[EndIndex]].
- Let z be the
MatchState { [[Input]]: Input, [[EndIndex]]: xe, [[Captures]]: cap }. - Return c(z).
- Let m be
CompileSubpattern ofDisjunction with arguments rer andbackward . - Return a new
Matcher with parameters (x, c) that captures m and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let r be m(x, d).
- If r is not
failure , returnfailure . - Return c(x).
22.2.2.4.1 IsWordChar ( rer, Input, e )
The abstract operation IsWordChar takes arguments rer (a
- Let InputLength be the number of elements in Input.
- If e = -1 or e = InputLength, return
false . - Let c be the character Input[e].
- If
WordCharacters (rer) contains c, returntrue . - Return
false .
22.2.2.5 Runtime Semantics: CompileQuantifier
The
- Let qp be
CompileQuantifierPrefix ofQuantifierPrefix . - Return the
Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]:true }.
- Let qp be
CompileQuantifierPrefix ofQuantifierPrefix . - Return the
Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]:false }.
22.2.2.6 Runtime Semantics: CompileQuantifierPrefix
The
- Return the
Record { [[Min]]: 0, [[Max]]: +∞ }.
- Return the
Record { [[Min]]: 1, [[Max]]: +∞ }.
- Return the
Record { [[Min]]: 0, [[Max]]: 1 }.
- Let i be the MV of
DecimalDigits (see12.9.3 ). - Return the
Record { [[Min]]: i, [[Max]]: i }.
- Let i be the MV of
DecimalDigits . - Return the
Record { [[Min]]: i, [[Max]]: +∞ }.
- Let i be the MV of the first
DecimalDigits . - Let j be the MV of the second
DecimalDigits . - Return the
Record { [[Min]]: i, [[Max]]: j }.
22.2.2.7 Runtime Semantics: CompileAtom
The
This section is amended in
It is defined piecewise over the following productions:
- Let ch be the character matched by
PatternCharacter . - Let A be a one-element
CharSet containing the character ch. - Return
CharacterSetMatcher (rer, A,false , direction).
- Let A be
AllCharacters (rer). - If rer.[[DotAll]] is not
true , then- Remove from A all characters corresponding to a code point on the right-hand side of the
LineTerminator production.
- Remove from A all characters corresponding to a code point on the right-hand side of the
- Return
CharacterSetMatcher (rer, A,false , direction).
- Let cc be
CompileCharacterClass ofCharacterClass with argument rer. - Let cs be cc.[[CharSet]].
- If rer.[[UnicodeSets]] is
false , or if everyCharSetElement of cs consists of a single character (including if cs is empty), returnCharacterSetMatcher (rer, cs, cc.[[Invert]], direction). Assert : cc.[[Invert]] isfalse .- Let lm be an empty
List ofMatchers . - For each
CharSetElement s in cs containing more than 1 character, iterating in descending order of length, do- Let cs2 be a one-element
CharSet containing the last code point of s. - Let m2 be
CharacterSetMatcher (rer, cs2,false , direction). - For each code point c1 in s, iterating backwards from its second-to-last code point, do
- Let cs1 be a one-element
CharSet containing c1. - Let m1 be
CharacterSetMatcher (rer, cs1,false , direction). - Set m2 to
MatchSequence (m1, m2, direction).
- Let cs1 be a one-element
- Append m2 to lm.
- Let cs2 be a one-element
- Let singles be the
CharSet containing everyCharSetElement of cs that consists of a single character. - Append
CharacterSetMatcher (rer, singles,false , direction) to lm. - If cs contains the empty sequence of characters, append
EmptyMatcher () to lm. - Let m2 be the last
Matcher in lm. - For each
Matcher m1 of lm, iterating backwards from its second-to-last element, do- Set m2 to
MatchTwoAlternatives (m1, m2).
- Set m2 to
- Return m2.
- Let m be
CompileSubpattern ofDisjunction with arguments rer and direction. - Let parenIndex be
CountLeftCapturingParensBefore (Atom ). - Return a new
Matcher with parameters (x, c) that captures direction, m, and parenIndex and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures x, c, direction, and parenIndex and performs the following steps when called:Assert : y is aMatchState .- Let cap be a copy of y.[[Captures]].
- Let Input be x.[[Input]].
- Let xe be x.[[EndIndex]].
- Let ye be y.[[EndIndex]].
- If direction is
forward , thenAssert : xe ≤ ye.- Let r be the
CaptureRange { [[StartIndex]]: xe, [[EndIndex]]: ye }.
- Else,
Assert : direction isbackward .Assert : ye ≤ xe.- Let r be the
CaptureRange { [[StartIndex]]: ye, [[EndIndex]]: xe }.
- Set cap[parenIndex + 1] to r.
- Let z be the
MatchState { [[Input]]: Input, [[EndIndex]]: ye, [[Captures]]: cap }. - Return c(z).
- Return m(x, d).
Parentheses of the form (
)
serve both to group the components of the \
followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching (?:
)
instead.
- Return
CompileSubpattern ofDisjunction with arguments rer and direction.
- Let n be the
CapturingGroupNumber ofDecimalEscape . Assert : n ≤ rer.[[CapturingGroupsCount]].- Return
BackreferenceMatcher (rer, n, direction).
An escape sequence of the form \
followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (
- Let cv be the
CharacterValue ofCharacterEscape . - Let ch be the character whose character value is cv.
- Let A be a one-element
CharSet containing the character ch. - Return
CharacterSetMatcher (rer, A,false , direction).
- Let cs be
CompileToCharSet ofCharacterClassEscape with argument rer. - If rer.[[UnicodeSets]] is
false , or if everyCharSetElement of cs consists of a single character (including if cs is empty), returnCharacterSetMatcher (rer, cs,false , direction). - Let lm be an empty
List ofMatchers . - For each
CharSetElement s in cs containing more than 1 character, iterating in descending order of length, do- Let cs2 be a one-element
CharSet containing the last code point of s. - Let m2 be
CharacterSetMatcher (rer, cs2,false , direction). - For each code point c1 in s, iterating backwards from its second-to-last code point, do
- Let cs1 be a one-element
CharSet containing c1. - Let m1 be
CharacterSetMatcher (rer, cs1,false , direction). - Set m2 to
MatchSequence (m1, m2, direction).
- Let cs1 be a one-element
- Append m2 to lm.
- Let cs2 be a one-element
- Let singles be the
CharSet containing everyCharSetElement of cs that consists of a single character. - Append
CharacterSetMatcher (rer, singles,false , direction) to lm. - If cs contains the empty sequence of characters, append
EmptyMatcher () to lm. - Let m2 be the last
Matcher in lm. - For each
Matcher m1 of lm, iterating backwards from its second-to-last element, do- Set m2 to
MatchTwoAlternatives (m1, m2).
- Set m2 to
- Return m2.
- Let matchingGroupSpecifiers be
GroupSpecifiersThatMatch (GroupName ). Assert : matchingGroupSpecifiers contains a singleGroupSpecifier .- Let groupSpecifier be the sole element of matchingGroupSpecifiers.
- Let parenIndex be
CountLeftCapturingParensBefore (groupSpecifier). - Return
BackreferenceMatcher (rer, parenIndex, direction).
22.2.2.7.1 CharacterSetMatcher ( rer, A, invert, direction )
The abstract operation CharacterSetMatcher takes arguments rer (a
- If rer.[[UnicodeSets]] is
true , thenAssert : invert isfalse .Assert : EveryCharSetElement of A consists of a single character.
- Return a new
Matcher with parameters (x, c) that captures rer, A, invert, and direction and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let Input be x.[[Input]].
- Let e be x.[[EndIndex]].
- If direction is
forward , let f be e + 1. - Else, let f be e - 1.
- Let InputLength be the number of elements in Input.
- If f < 0 or f > InputLength, return
failure . - Let index be
min (e, f). - Let ch be the character Input[index].
- Let cc be
Canonicalize (rer, ch). - If there exists a
CharSetElement in A containing exactly one character a such thatCanonicalize (rer, a) is cc, let found betrue . Otherwise, let found befalse . - If invert is
false and found isfalse , returnfailure . - If invert is
true and found istrue , returnfailure . - Let cap be x.[[Captures]].
- Let y be the
MatchState { [[Input]]: Input, [[EndIndex]]: f, [[Captures]]: cap }. - Return c(y).
22.2.2.7.2 BackreferenceMatcher ( rer, n, direction )
The abstract operation BackreferenceMatcher takes arguments rer (a
Assert : n ≥ 1.- Return a new
Matcher with parameters (x, c) that captures rer, n, and direction and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let Input be x.[[Input]].
- Let cap be x.[[Captures]].
- Let r be cap[n].
- If r is
undefined , return c(x). - Let e be x.[[EndIndex]].
- Let rs be r.[[StartIndex]].
- Let re be r.[[EndIndex]].
- Let len be re - rs.
- If direction is
forward , let f be e + len. - Else, let f be e - len.
- Let InputLength be the number of elements in Input.
- If f < 0 or f > InputLength, return
failure . - Let g be
min (e, f). - If there exists an
integer i in theinterval from 0 (inclusive) to len (exclusive) such thatCanonicalize (rer, Input[rs + i]) is notCanonicalize (rer, Input[g + i]), returnfailure . - Let y be the
MatchState { [[Input]]: Input, [[EndIndex]]: f, [[Captures]]: cap }. - Return c(y).
22.2.2.7.3 Canonicalize ( rer, ch )
The abstract operation Canonicalize takes arguments rer (a
- If
HasEitherUnicodeFlag (rer) istrue and rer.[[IgnoreCase]] istrue , then- If the file
CaseFolding.txt
of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch. - Return ch.
- If the file
- If rer.[[IgnoreCase]] is
false , return ch. Assert : ch is a UTF-16 code unit.- Let cp be the code point whose numeric value is the numeric value of ch.
- Let u be the result of toUppercase(« cp »), according to the Unicode Default Case Conversion algorithm.
- Let uStr be
CodePointsToString (u). - If the length of uStr ≠ 1, return ch.
- Let cu be uStr's single code unit element.
- If the numeric value of ch ≥ 128 and the numeric value of cu < 128, return ch.
- Return cu.
In case-insignificant matches when ß
(U+00DF LATIN SMALL LETTER SHARP S) to ss
or SS
. It may however map code points outside the Basic Latin block to code points within it—for example, ſ
(U+017F LATIN SMALL LETTER LONG S) case-folds to s
(U+0073 LATIN SMALL LETTER S) and K
(U+212A KELVIN SIGN) case-folds to k
(U+006B LATIN SMALL LETTER K). Strings containing those code points are matched by regular expressions such as /[a-z]/ui
.
In case-insignificant matches when Ω
(U+2126 OHM SIGN) is mapped by toUppercase to itself but by toCasefold to ω
(U+03C9 GREEK SMALL LETTER OMEGA) along with Ω
(U+03A9 GREEK CAPITAL LETTER OMEGA), so /[ω]/ui
and /[\u03A9]/ui
but not by /[ω]/i
or /[\u03A9]/i
. Also, no code point outside the Basic Latin block is mapped to a code point within it, so strings such as /[a-z]/i
.
22.2.2.8 Runtime Semantics: CompileCharacterClass
The
- Let A be
CompileToCharSet ofClassContents with argument rer. - Return the
Record { [[CharSet]]: A, [[Invert]]:false }.
- Let A be
CompileToCharSet ofClassContents with argument rer. - If rer.[[UnicodeSets]] is
true , then- Return the
Record { [[CharSet]]:CharacterComplement (rer, A), [[Invert]]:false }.
- Return the
- Return the
Record { [[CharSet]]: A, [[Invert]]:true }.
22.2.2.9 Runtime Semantics: CompileToCharSet
The
This section is amended in
It is defined piecewise over the following productions:
- Return the empty
CharSet .
- Let A be
CompileToCharSet ofClassAtom with argument rer. - Let B be
CompileToCharSet ofNonemptyClassRangesNoDash with argument rer. - Return the union of
CharSets A and B.
- Let A be
CompileToCharSet of the firstClassAtom with argument rer. - Let B be
CompileToCharSet of the secondClassAtom with argument rer. - Let C be
CompileToCharSet ofClassContents with argument rer. - Let D be
CharacterRange (A, B). - Return the union of D and C.
- Let A be
CompileToCharSet ofClassAtomNoDash with argument rer. - Let B be
CompileToCharSet ofNonemptyClassRangesNoDash with argument rer. - Return the union of
CharSets A and B.
- Let A be
CompileToCharSet ofClassAtomNoDash with argument rer. - Let B be
CompileToCharSet ofClassAtom with argument rer. - Let C be
CompileToCharSet ofClassContents with argument rer. - Let D be
CharacterRange (A, B). - Return the union of D and C.
Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i
matches only the letters E
, F
, e
, and f
, while the pattern /[E-f]/i
matches all uppercase and lowercase letters in the Unicode Basic Latin block as well as the symbols [
, \
, ]
, ^
, _
, and `
.
A -
character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of
- Return the
CharSet containing the single character-
U+002D (HYPHEN-MINUS).
- Return the
CharSet containing the character matched bySourceCharacter .
- Let cv be the
CharacterValue of thisClassEscape . - Let c be the character whose character value is cv.
- Return the
CharSet containing the single character c.
A \b
, \B
, and backreferences. Inside a \b
means the backspace character, while \B
and backreferences raise errors. Using a backreference inside a
- Return the ten-element
CharSet containing the characters0
,1
,2
,3
,4
,5
,6
,7
,8
, and9
.
- Let S be the
CharSet returned by .CharacterClassEscape :: d - Return
CharacterComplement (rer, S).
- Return the
CharSet containing all characters corresponding to a code point on the right-hand side of theWhiteSpace orLineTerminator productions.
- Let S be the
CharSet returned by .CharacterClassEscape :: s - Return
CharacterComplement (rer, S).
- Return
MaybeSimpleCaseFolding (rer,WordCharacters (rer)).
- Let S be the
CharSet returned by .CharacterClassEscape :: w - Return
CharacterComplement (rer, S).
- Return
CompileToCharSet ofUnicodePropertyValueExpression with argument rer.
- Let S be
CompileToCharSet ofUnicodePropertyValueExpression with argument rer. Assert : S contains only single code points.- Return
CharacterComplement (rer, S).
- Let ps be the
source text matched by UnicodePropertyName . - Let p be
UnicodeMatchProperty (rer, ps). Assert : p is a Unicodeproperty name or property alias listed in the “Property name and aliases” column ofTable 67 .- Let vs be the
source text matched by UnicodePropertyValue . - Let v be
UnicodeMatchPropertyValue (p, vs). - Let A be the
CharSet containing all Unicode code points whose character database definition includes the property p with value v. - Return
MaybeSimpleCaseFolding (rer, A).
- Let s be the
source text matched by LoneUnicodePropertyNameOrValue . - If
UnicodeMatchPropertyValue (General_Category
, s) is a Unicode property value or property value alias for the General_Category (gc) property listed inPropertyValueAliases.txt
, then- Return the
CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value s.
- Return the
- Let p be
UnicodeMatchProperty (rer, s). Assert : p is a binary Unicode property or binary property alias listed in the “Property name and aliases” column ofTable 68 , or a binary Unicode property of strings listed in the “Property name ” column ofTable 69 .- Let A be the
CharSet containing all CharSetElements whose character database definition includes the property p with value “True”. - Return
MaybeSimpleCaseFolding (rer, A).
- Let A be
CompileToCharSet ofClassSetRange with argument rer. - If
ClassUnion is present, then- Let B be
CompileToCharSet ofClassUnion with argument rer. - Return the union of
CharSets A and B.
- Let B be
- Return A.
- Let A be
CompileToCharSet ofClassSetOperand with argument rer. - If
ClassUnion is present, then- Let B be
CompileToCharSet ofClassUnion with argument rer. - Return the union of
CharSets A and B.
- Let B be
- Return A.
- Let A be
CompileToCharSet of the firstClassSetOperand with argument rer. - Let B be
CompileToCharSet of the secondClassSetOperand with argument rer. - Return the intersection of
CharSets A and B.
- Let A be
CompileToCharSet of theClassIntersection with argument rer. - Let B be
CompileToCharSet of theClassSetOperand with argument rer. - Return the intersection of
CharSets A and B.
- Let A be
CompileToCharSet of the firstClassSetOperand with argument rer. - Let B be
CompileToCharSet of the secondClassSetOperand with argument rer. - Return the
CharSet containing the CharSetElements of A which are not also CharSetElements of B.
- Let A be
CompileToCharSet of theClassSubtraction with argument rer. - Let B be
CompileToCharSet of theClassSetOperand with argument rer. - Return the
CharSet containing the CharSetElements of A which are not also CharSetElements of B.
- Let A be
CompileToCharSet of the firstClassSetCharacter with argument rer. - Let B be
CompileToCharSet of the secondClassSetCharacter with argument rer. - Return
MaybeSimpleCaseFolding (rer,CharacterRange (A, B)).
The result will often consist of two or more ranges. When UnicodeSets is
- Let A be
CompileToCharSet ofClassSetCharacter with argument rer. - Return
MaybeSimpleCaseFolding (rer, A).
- Let A be
CompileToCharSet ofClassStringDisjunction with argument rer. - Return
MaybeSimpleCaseFolding (rer, A).
- Return
CompileToCharSet ofNestedClass with argument rer.
- Return
CompileToCharSet ofClassContents with argument rer.
- Let A be
CompileToCharSet ofClassContents with argument rer. - Return
CharacterComplement (rer, A).
- Return
CompileToCharSet ofCharacterClassEscape with argument rer.
- Return
CompileToCharSet ofClassStringDisjunctionContents with argument rer.
- Let s be
CompileClassSetString ofClassString with argument rer. - Return the
CharSet containing the one string s.
- Let s be
CompileClassSetString ofClassString with argument rer. - Let A be the
CharSet containing the one string s. - Let B be
CompileToCharSet ofClassStringDisjunctionContents with argument rer. - Return the union of
CharSets A and B.
- Let cv be the
CharacterValue of thisClassSetCharacter . - Let c be the character whose character value is cv.
- Return the
CharSet containing the single character c.
- Return the
CharSet containing the single character U+0008 (BACKSPACE).
22.2.2.9.1 CharacterRange ( A, B )
The abstract operation CharacterRange takes arguments A (a
Assert : A and B each contain exactly one character.- Let a be the one character in
CharSet A. - Let b be the one character in
CharSet B. - Let i be the character value of character a.
- Let j be the character value of character b.
Assert : i ≤ j.- Return the
CharSet containing all characters with a character value in theinclusive interval from i to j.
22.2.2.9.2 HasEitherUnicodeFlag ( rer )
The abstract operation HasEitherUnicodeFlag takes argument rer (a
- If rer.[[Unicode]] is
true or rer.[[UnicodeSets]] istrue , then- Return
true .
- Return
- Return
false .
22.2.2.9.3 WordCharacters ( rer )
The abstract operation WordCharacters takes argument rer (a \b
, \B
, \w
, and \W
It performs the following steps when called:
- Let basicWordChars be the
CharSet containing every character inthe ASCII word characters . - Let extraWordChars be the
CharSet containing all characters c such that c is not in basicWordChars butCanonicalize (rer, c) is in basicWordChars. Assert : extraWordChars is empty unlessHasEitherUnicodeFlag (rer) istrue and rer.[[IgnoreCase]] istrue .- Return the union of basicWordChars and extraWordChars.
22.2.2.9.4 AllCharacters ( rer )
The abstract operation AllCharacters takes argument rer (a
- If rer.[[UnicodeSets]] is
true and rer.[[IgnoreCase]] istrue , then- Return the
CharSet containing all Unicode code points c that do not have a Simple Case Folding mapping (that is,scf (c)=c).
- Return the
- Else if
HasEitherUnicodeFlag (rer) istrue , then- Return the
CharSet containing all code point values.
- Return the
- Else,
- Return the
CharSet containing all code unit values.
- Return the
22.2.2.9.5 MaybeSimpleCaseFolding ( rer, A )
The abstract operation MaybeSimpleCaseFolding takes arguments rer (a CaseFolding.txt
of the Unicode Character Database (each of which maps a single code point to another single code point) to map each
- If rer.[[UnicodeSets]] is
false or rer.[[IgnoreCase]] isfalse , return A. - Let B be a new empty
CharSet . - For each
CharSetElement s of A, do- Let t be an empty sequence of characters.
- For each single code point cp in s, do
- Append
scf (cp) to t.
- Append
- Add t to B.
- Return B.
22.2.2.9.6 CharacterComplement ( rer, S )
The abstract operation CharacterComplement takes arguments rer (a
- Let A be
AllCharacters (rer). - Return the
CharSet containing the CharSetElements of A which are not also CharSetElements of S.
22.2.2.9.7 UnicodeMatchProperty ( rer, p )
The abstract operation UnicodeMatchProperty takes arguments rer (a
- If rer.[[UnicodeSets]] is
true and p is a Unicodeproperty name listed in the “Property name ” column ofTable 69 , then- Return the
List of Unicode code points p.
- Return the
Assert : p is a Unicodeproperty name or property alias listed in the “Property name and aliases” column ofTable 67 orTable 68 .- Let c be the canonical
property name of p as given in the “Canonicalproperty name ” column of the corresponding row. - Return the
List of Unicode code points c.
Implementations must support the Unicode property names and aliases listed in
For example, Script_Extensions
(scx
(property alias) are valid, but script_extensions
or Scx
aren't.
The listed properties form a superset of what UTS18 RL1.2 requires.
The spellings of entries in these tables (including casing) match the spellings used in the file PropertyAliases.txt
in the Unicode Character Database. The precise spellings in that file are guaranteed to be stable.
Canonical |
|
---|---|
General_Category |
General_Category |
gc |
|
Script |
Script |
sc |
|
Script_Extensions |
Script_Extensions |
scx |
Canonical |
|
---|---|
ASCII |
ASCII |
ASCII_Hex_Digit |
ASCII_Hex_Digit |
AHex |
|
Alphabetic |
Alphabetic |
Alpha |
|
Any |
Any |
Assigned |
Assigned |
Bidi_Control |
Bidi_Control |
Bidi_C |
|
Bidi_Mirrored |
Bidi_Mirrored |
Bidi_M |
|
Case_Ignorable |
Case_Ignorable |
CI |
|
Cased |
Cased |
Changes_When_Casefolded |
Changes_When_Casefolded |
CWCF |
|
Changes_When_Casemapped |
Changes_When_Casemapped |
CWCM |
|
Changes_When_Lowercased |
Changes_When_Lowercased |
CWL |
|
Changes_When_NFKC_Casefolded |
Changes_When_NFKC_Casefolded |
CWKCF |
|
Changes_When_Titlecased |
Changes_When_Titlecased |
CWT |
|
Changes_When_Uppercased |
Changes_When_Uppercased |
CWU |
|
Dash |
Dash |
Default_Ignorable_Code_Point |
Default_Ignorable_Code_Point |
DI |
|
Deprecated |
Deprecated |
Dep |
|
Diacritic |
Diacritic |
Dia |
|
Emoji |
Emoji |
Emoji_Component |
Emoji_Component |
EComp |
|
Emoji_Modifier |
Emoji_Modifier |
EMod |
|
Emoji_Modifier_Base |
Emoji_Modifier_Base |
EBase |
|
Emoji_Presentation |
Emoji_Presentation |
EPres |
|
Extended_Pictographic |
Extended_Pictographic |
ExtPict |
|
Extender |
Extender |
Ext |
|
Grapheme_Base |
Grapheme_Base |
Gr_Base |
|
Grapheme_Extend |
Grapheme_Extend |
Gr_Ext |
|
Hex_Digit |
Hex_Digit |
Hex |
|
IDS_Binary_Operator |
IDS_Binary_Operator |
IDSB |
|
IDS_Trinary_Operator |
IDS_Trinary_Operator |
IDST |
|
ID_Continue |
ID_Continue |
IDC |
|
ID_Start |
ID_Start |
IDS |
|
Ideographic |
Ideographic |
Ideo |
|
Join_Control |
Join_Control |
Join_C |
|
Logical_Order_Exception |
Logical_Order_Exception |
LOE |
|
Lowercase |
Lowercase |
Lower |
|
Math |
Math |
Noncharacter_Code_Point |
Noncharacter_Code_Point |
NChar |
|
Pattern_Syntax |
Pattern_Syntax |
Pat_Syn |
|
Pattern_White_Space |
Pattern_White_Space |
Pat_WS |
|
Quotation_Mark |
Quotation_Mark |
QMark |
|
Radical |
Radical |
Regional_Indicator |
Regional_Indicator |
RI |
|
Sentence_Terminal |
Sentence_Terminal |
STerm |
|
Soft_Dotted |
Soft_Dotted |
SD |
|
Terminal_Punctuation |
Terminal_Punctuation |
Term |
|
Unified_Ideograph |
Unified_Ideograph |
UIdeo |
|
Uppercase |
Uppercase |
Upper |
|
Variation_Selector |
Variation_Selector |
VS |
|
White_Space |
White_Space |
space |
|
XID_Continue |
XID_Continue |
XIDC |
|
XID_Start |
XID_Start |
XIDS |
Basic_Emoji |
Emoji_Keycap_Sequence |
RGI_Emoji_Modifier_Sequence |
RGI_Emoji_Flag_Sequence |
RGI_Emoji_Tag_Sequence |
RGI_Emoji_ZWJ_Sequence |
RGI_Emoji |
22.2.2.9.8 UnicodeMatchPropertyValue ( p, v )
The abstract operation UnicodeMatchPropertyValue takes arguments p (
Assert : p is a canonical, unaliased Unicodeproperty name listed in the “Canonicalproperty name ” column ofTable 67 .Assert : v is a property value or property value alias for the Unicode property p listed inPropertyValueAliases.txt
.- Let value be the canonical property value of v as given in the “Canonical property value” column of the corresponding row.
- Return the
List of Unicode code points value.
Implementations must support the Unicode property values and property value aliases listed in PropertyValueAliases.txt
for the properties listed in
For example, Xpeo
and Old_Persian
are valid Script_Extensions
values, but xpeo
and Old Persian
aren't.
This algorithm differs from the matching rules for symbolic values listed in UAX44: case, Is
prefix is not supported.
22.2.2.10 Runtime Semantics: CompileClassSetString
The
- Return an empty sequence of characters.
- Return
CompileClassSetString ofNonEmptyClassString with argument rer.
- Let cs be
CompileToCharSet ofClassSetCharacter with argument rer. - Let s1 be the sequence of characters that is the single
CharSetElement of cs. - If
NonEmptyClassString is present, then- Let s2 be
CompileClassSetString ofNonEmptyClassString with argument rer. - Return the concatenation of s1 and s2.
- Let s2 be
- Return s1.
22.2.3 Abstract Operations for RegExp Creation
22.2.3.1 RegExpCreate ( P, F )
The abstract operation RegExpCreate takes arguments P (an
- Let obj be !
RegExpAlloc (%RegExp% ). - Return ?
RegExpInitialize (obj, P, F).
22.2.3.2 RegExpAlloc ( newTarget )
The abstract operation RegExpAlloc takes argument newTarget (a
- Let obj be ?
OrdinaryCreateFromConstructor (newTarget,"%RegExp.prototype%" , « [[OriginalSource]], [[OriginalFlags]], [[RegExpRecord]], [[RegExpMatcher]] »). - Perform !
DefinePropertyOrThrow (obj,"lastIndex" , PropertyDescriptor { [[Writable]]:true , [[Enumerable]]:false , [[Configurable]]:false }). - Return obj.
22.2.3.3 RegExpInitialize ( obj, pattern, flags )
The abstract operation RegExpInitialize takes arguments obj (an Object), pattern (an
- If pattern is
undefined , let P be the empty String. - Else, let P be ?
ToString (pattern). - If flags is
undefined , let F be the empty String. - Else, let F be ?
ToString (flags). - If F contains any code unit other than
"d" ,"g" ,"i" ,"m" ,"s" ,"u" ,"v" , or"y" , or if F contains any code unit more than once, throw aSyntaxError exception. - If F contains
"i" , let i betrue ; else let i befalse . - If F contains
"m" , let m betrue ; else let m befalse . - If F contains
"s" , let s betrue ; else let s befalse . - If F contains
"u" , let u betrue ; else let u befalse . - If F contains
"v" , let v betrue ; else let v befalse . - If u is
true or v istrue , then- Let patternText be
StringToCodePoints (P).
- Let patternText be
- Else,
- Let patternText be the result of interpreting each of P's 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
- Let parseResult be
ParsePattern (patternText, u, v). - If parseResult is a non-empty
List ofSyntaxError objects, throw aSyntaxError exception. Assert : parseResult is aPattern Parse Node .- Set obj.[[OriginalSource]] to P.
- Set obj.[[OriginalFlags]] to F.
- Let capturingGroupsCount be
CountLeftCapturingParensWithin (parseResult). - Let rer be the
RegExp Record { [[IgnoreCase]]: i, [[Multiline]]: m, [[DotAll]]: s, [[Unicode]]: u, [[UnicodeSets]]: v, [[CapturingGroupsCount]]: capturingGroupsCount }. - Set obj.[[RegExpRecord]] to rer.
- Set obj.[[RegExpMatcher]] to
CompilePattern of parseResult with argument rer. - Perform ?
Set (obj,"lastIndex" ,+0 𝔽,true ). - Return obj.
22.2.3.4 Static Semantics: ParsePattern ( patternText, u, v )
The abstract operation ParsePattern takes arguments patternText (a sequence of Unicode code points), u (a Boolean), and v (a Boolean) and returns a
This section is amended in
It performs the following steps when called:
- If v is
true and u istrue , then- Let parseResult be a
List containing one or moreSyntaxError objects.
- Let parseResult be a
- Else if v is
true , then - Else if u is
true , then - Else,
- Return parseResult.
22.2.4 The RegExp Constructor
The RegExp
- is %RegExp%.
- is the initial value of the
"RegExp" property of theglobal object . - creates and initializes a new RegExp object when called as a
constructor . - when called as a function rather than as a
constructor , returns either a new RegExp object, or the argument itself if the only argument is a RegExp object. - may be used as the value of an
extends
clause of a class definition. Subclassconstructors that intend to inherit the specified RegExp behaviour must include asuper
call to the RegExpconstructor to create and initialize subclass instances with the necessary internal slots.
22.2.4.1 RegExp ( pattern, flags )
This function performs the following steps when called:
- Let patternIsRegExp be ?
IsRegExp (pattern). - If NewTarget is
undefined , then- Let newTarget be the
active function object . - If patternIsRegExp is
true and flags isundefined , then
- Let newTarget be the
- Else,
- Let newTarget be NewTarget.
- If pattern
is an Object and pattern has a [[RegExpMatcher]] internal slot, then- Let P be pattern.[[OriginalSource]].
- If flags is
undefined , let F be pattern.[[OriginalFlags]]. - Else, let F be flags.
- Else if patternIsRegExp is
true , then - Else,
- Let P be pattern.
- Let F be flags.
- Let O be ?
RegExpAlloc (newTarget). - Return ?
RegExpInitialize (O, P, F).
If pattern is supplied using a
22.2.5 Properties of the RegExp Constructor
The RegExp
- has a [[Prototype]] internal slot whose value is
%Function.prototype% . - has the following properties:
22.2.5.1 RegExp.prototype
The initial value of RegExp.prototype
is the
This property has the attributes { [[Writable]]:
22.2.5.2 get RegExp [ @@species ]
RegExp[@@species]
is an
- Return the
this value.
The value of the
RegExp prototype methods normally use their
22.2.6 Properties of the RegExp Prototype Object
The RegExp prototype object:
- is %RegExp.prototype%.
- is an
ordinary object . - is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.
- has a [[Prototype]] internal slot whose value is
%Object.prototype% .
The RegExp prototype object does not have a
22.2.6.1 RegExp.prototype.constructor
The initial value of RegExp.prototype.constructor
is
22.2.6.2 RegExp.prototype.exec ( string )
This method searches string for an occurrence of the regular expression pattern and returns an Array containing the results of the match, or
It performs the following steps when called:
- Let R be the
this value. - Perform ?
RequireInternalSlot (R, [[RegExpMatcher]]). - Let S be ?
ToString (string). - Return ?
RegExpBuiltinExec (R, S).
22.2.6.3 get RegExp.prototype.dotAll
RegExp.prototype.dotAll
is an
- Let R be the
this value. - Let cu be the code unit 0x0073 (LATIN SMALL LETTER S).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.4 get RegExp.prototype.flags
RegExp.prototype.flags
is an
- Let R be the
this value. - If R
is not an Object , throw aTypeError exception. - Let codeUnits be a new empty
List . - Let hasIndices be
ToBoolean (?Get (R,"hasIndices" )). - If hasIndices is
true , append the code unit 0x0064 (LATIN SMALL LETTER D) to codeUnits. - Let global be
ToBoolean (?Get (R,"global" )). - If global is
true , append the code unit 0x0067 (LATIN SMALL LETTER G) to codeUnits. - Let ignoreCase be
ToBoolean (?Get (R,"ignoreCase" )). - If ignoreCase is
true , append the code unit 0x0069 (LATIN SMALL LETTER I) to codeUnits. - Let multiline be
ToBoolean (?Get (R,"multiline" )). - If multiline is
true , append the code unit 0x006D (LATIN SMALL LETTER M) to codeUnits. - Let dotAll be
ToBoolean (?Get (R,"dotAll" )). - If dotAll is
true , append the code unit 0x0073 (LATIN SMALL LETTER S) to codeUnits. - Let unicode be
ToBoolean (?Get (R,"unicode" )). - If unicode is
true , append the code unit 0x0075 (LATIN SMALL LETTER U) to codeUnits. - Let unicodeSets be
ToBoolean (?Get (R,"unicodeSets" )). - If unicodeSets is
true , append the code unit 0x0076 (LATIN SMALL LETTER V) to codeUnits. - Let sticky be
ToBoolean (?Get (R,"sticky" )). - If sticky is
true , append the code unit 0x0079 (LATIN SMALL LETTER Y) to codeUnits. - Return the String value whose code units are the elements of the
List codeUnits. If codeUnits has no elements, the empty String is returned.
22.2.6.4.1 RegExpHasFlag ( R, codeUnit )
The abstract operation RegExpHasFlag takes arguments R (an
- If R
is not an Object , throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains codeUnit, return
true . - Return
false .
22.2.6.5 get RegExp.prototype.global
RegExp.prototype.global
is an
- Let R be the
this value. - Let cu be the code unit 0x0067 (LATIN SMALL LETTER G).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.6 get RegExp.prototype.hasIndices
RegExp.prototype.hasIndices
is an
- Let R be the
this value. - Let cu be the code unit 0x0064 (LATIN SMALL LETTER D).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.7 get RegExp.prototype.ignoreCase
RegExp.prototype.ignoreCase
is an
- Let R be the
this value. - Let cu be the code unit 0x0069 (LATIN SMALL LETTER I).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.8 RegExp.prototype [ @@match ] ( string )
This method performs the following steps when called:
- Let rx be the
this value. - If rx
is not an Object , throw aTypeError exception. - Let S be ?
ToString (string). - Let flags be ?
ToString (?Get (rx,"flags" )). - If flags does not contain
"g" , then- Return ?
RegExpExec (rx, S).
- Return ?
- Else,
- If flags contains
"u" or flags contains"v" , let fullUnicode betrue . Otherwise, let fullUnicode befalse . - Perform ?
Set (rx,"lastIndex" ,+0 𝔽,true ). - Let A be !
ArrayCreate (0). - Let n be 0.
- Repeat,
- Let result be ?
RegExpExec (rx, S). - If result is
null , then- If n = 0, return
null . - Return A.
- If n = 0, return
- Else,
- Let matchStr be ?
ToString (?Get (result,"0" )). - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽 (n)), matchStr). - If matchStr is the empty String, then
- Set n to n + 1.
- Let matchStr be ?
- Let result be ?
- If flags contains
The value of the
The
22.2.6.9 RegExp.prototype [ @@matchAll ] ( string )
This method performs the following steps when called:
- Let R be the
this value. - If R
is not an Object , throw aTypeError exception. - Let S be ?
ToString (string). - Let C be ?
SpeciesConstructor (R,%RegExp% ). - Let flags be ?
ToString (?Get (R,"flags" )). - Let matcher be ?
Construct (C, « R, flags »). - Let lastIndex be ?
ToLength (?Get (R,"lastIndex" )). - Perform ?
Set (matcher,"lastIndex" , lastIndex,true ). - If flags contains
"g" , let global betrue . - Else, let global be
false . - If flags contains
"u" or flags contains"v" , let fullUnicode betrue . - Else, let fullUnicode be
false . - Return
CreateRegExpStringIterator (matcher, S, global, fullUnicode).
The value of the
22.2.6.10 get RegExp.prototype.multiline
RegExp.prototype.multiline
is an
- Let R be the
this value. - Let cu be the code unit 0x006D (LATIN SMALL LETTER M).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.11 RegExp.prototype [ @@replace ] ( string, replaceValue )
This method performs the following steps when called:
- Let rx be the
this value. - If rx
is not an Object , throw aTypeError exception. - Let S be ?
ToString (string). - Let lengthS be the length of S.
- Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then- Set replaceValue to ?
ToString (replaceValue).
- Set replaceValue to ?
- Let flags be ?
ToString (?Get (rx,"flags" )). - If flags contains
"g" , let global betrue . Otherwise, let global befalse . - If global is
true , then- Perform ?
Set (rx,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Let results be a new empty
List . - Let done be
false . - Repeat, while done is
false ,- Let result be ?
RegExpExec (rx, S). - If result is
null , then- Set done to
true .
- Set done to
- Else,
- Let result be ?
- Let accumulatedResult be the empty String.
- Let nextSourcePosition be 0.
- For each element result of results, do
- Let resultLength be ?
LengthOfArrayLike (result). - Let nCaptures be
max (resultLength - 1, 0). - Let matched be ?
ToString (?Get (result,"0" )). - Let matchLength be the length of matched.
- Let position be ?
ToIntegerOrInfinity (?Get (result,"index" )). - Set position to the result of
clamping position between 0 and lengthS. - Let captures be a new empty
List . - Let n be 1.
- Repeat, while n ≤ nCaptures,
- Let capN be ?
Get (result, !ToString (𝔽 (n))). - If capN is not
undefined , then- Set capN to ?
ToString (capN).
- Set capN to ?
- Append capN to captures.
- NOTE: When n = 1, the preceding step puts the first element into captures (at index 0). More generally, the nth capture (the characters captured by the nth set of capturing parentheses) is at captures[n - 1].
- Set n to n + 1.
- Let capN be ?
- Let namedCaptures be ?
Get (result,"groups" ). - If functionalReplace is
true , then- Let replacerArgs be the
list-concatenation of « matched », captures, and «𝔽 (position), S ». - If namedCaptures is not
undefined , then- Append namedCaptures to replacerArgs.
- Let replValue be ?
Call (replaceValue,undefined , replacerArgs). - Let replacement be ?
ToString (replValue).
- Let replacerArgs be the
- Else,
- If namedCaptures is not
undefined , then- Set namedCaptures to ?
ToObject (namedCaptures).
- Set namedCaptures to ?
- Let replacement be ?
GetSubstitution (matched, S, position, captures, namedCaptures, replaceValue).
- If namedCaptures is not
- If position ≥ nextSourcePosition, then
- NOTE: position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of rx. In such cases, the corresponding substitution is ignored.
- Set accumulatedResult to the
string-concatenation of accumulatedResult, thesubstring of S from nextSourcePosition to position, and replacement. - Set nextSourcePosition to position + matchLength.
- Let resultLength be ?
- If nextSourcePosition ≥ lengthS, return accumulatedResult.
- Return the
string-concatenation of accumulatedResult and thesubstring of S from nextSourcePosition.
The value of the
22.2.6.12 RegExp.prototype [ @@search ] ( string )
This method performs the following steps when called:
- Let rx be the
this value. - If rx
is not an Object , throw aTypeError exception. - Let S be ?
ToString (string). - Let previousLastIndex be ?
Get (rx,"lastIndex" ). - If
SameValue (previousLastIndex,+0 𝔽) isfalse , then- Perform ?
Set (rx,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Let result be ?
RegExpExec (rx, S). - Let currentLastIndex be ?
Get (rx,"lastIndex" ). - If
SameValue (currentLastIndex, previousLastIndex) isfalse , then- Perform ?
Set (rx,"lastIndex" , previousLastIndex,true ).
- Perform ?
- If result is
null , return-1 𝔽. - Return ?
Get (result,"index" ).
The value of the
The
22.2.6.13 get RegExp.prototype.source
RegExp.prototype.source
is an
- Let R be the
this value. - If R
is not an Object , throw aTypeError exception. - If R does not have an [[OriginalSource]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , return"(?:)" . - Otherwise, throw a
TypeError exception.
- If
Assert : R has an [[OriginalFlags]] internal slot.- Let src be R.[[OriginalSource]].
- Let flags be R.[[OriginalFlags]].
- Return
EscapeRegExpPattern (src, flags).
22.2.6.13.1 EscapeRegExpPattern ( P, F )
The abstract operation EscapeRegExpPattern takes arguments P (a String) and F (a String) and returns a String. It performs the following steps when called:
- If F contains
"v" , then- Let patternSymbol be
Pattern .[+UnicodeMode, +UnicodeSetsMode]
- Let patternSymbol be
- Else if F contains
"u" , then- Let patternSymbol be
Pattern .[+UnicodeMode, ~UnicodeSetsMode]
- Let patternSymbol be
- Else,
- Let patternSymbol be
Pattern .[~UnicodeMode, ~UnicodeSetsMode]
- Let patternSymbol be
- Let S be a String in the form of a patternSymbol equivalent to P interpreted as UTF-16 encoded Unicode code points (
6.1.4 ), in which certain code points are escaped as described below. S may or may not differ from P; however, theAbstract Closure that would result from evaluating S as a patternSymbol must behave identically to theAbstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for P and F must produce identical results. - The code points
/
or anyLineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that thestring-concatenation of"/" , S,"/" , and F can be parsed (in an appropriate lexical context) as aRegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is"/" , then S could be"\/" or"\u002F" , among other possibilities, but not"/" , because///
followed by F would be parsed as aSingleLineComment rather than aRegularExpressionLiteral . If P is the empty String, this specification can be met by letting S be"(?:)" . - Return S.
22.2.6.14 RegExp.prototype [ @@split ] ( string, limit )
This method returns an Array into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the
The /a*?/[Symbol.split]("ab")
evaluates to the array ["a", "b"]
, while /a*/[Symbol.split]("ab")
evaluates to the array ["","b"]
.)
If string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.
If the regular expression contains capturing parentheses, then each time separator is matched the results (including any
/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")
evaluates to the array
["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]
If limit is not
This method performs the following steps when called:
- Let rx be the
this value. - If rx
is not an Object , throw aTypeError exception. - Let S be ?
ToString (string). - Let C be ?
SpeciesConstructor (rx,%RegExp% ). - Let flags be ?
ToString (?Get (rx,"flags" )). - If flags contains
"u" or flags contains"v" , let unicodeMatching betrue . - Else, let unicodeMatching be
false . - If flags contains
"y" , let newFlags be flags. - Else, let newFlags be the
string-concatenation of flags and"y" . - Let splitter be ?
Construct (C, « rx, newFlags »). - Let A be !
ArrayCreate (0). - Let lengthA be 0.
- If limit is
undefined , let lim be 232 - 1; else let lim beℝ (?ToUint32 (limit)). - If lim = 0, return A.
- If S is the empty String, then
- Let z be ?
RegExpExec (splitter, S). - If z is not
null , return A. - Perform !
CreateDataPropertyOrThrow (A,"0" , S). - Return A.
- Let z be ?
- Let size be the length of S.
- Let p be 0.
- Let q be p.
- Repeat, while q < size,
- Perform ?
Set (splitter,"lastIndex" ,𝔽 (q),true ). - Let z be ?
RegExpExec (splitter, S). - If z is
null , then- Set q to
AdvanceStringIndex (S, q, unicodeMatching).
- Set q to
- Else,
- Let e be
ℝ (?ToLength (?Get (splitter,"lastIndex" ))). - Set e to
min (e, size). - If e = p, then
- Set q to
AdvanceStringIndex (S, q, unicodeMatching).
- Set q to
- Else,
- Let T be the
substring of S from p to q. - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽 (lengthA)), T). - Set lengthA to lengthA + 1.
- If lengthA = lim, return A.
- Set p to e.
- Let numberOfCaptures be ?
LengthOfArrayLike (z). - Set numberOfCaptures to
max (numberOfCaptures - 1, 0). - Let i be 1.
- Repeat, while i ≤ numberOfCaptures,
- Set q to p.
- Let T be the
- Let e be
- Perform ?
- Let T be the
substring of S from p to size. - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽 (lengthA)), T). - Return A.
The value of the
This method ignores the value of the
22.2.6.15 get RegExp.prototype.sticky
RegExp.prototype.sticky
is an
- Let R be the
this value. - Let cu be the code unit 0x0079 (LATIN SMALL LETTER Y).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.16 RegExp.prototype.test ( S )
This method performs the following steps when called:
- Let R be the
this value. - If R
is not an Object , throw aTypeError exception. - Let string be ?
ToString (S). - Let match be ?
RegExpExec (R, string). - If match is not
null , returntrue ; else returnfalse .
22.2.6.17 RegExp.prototype.toString ( )
- Let R be the
this value. - If R
is not an Object , throw aTypeError exception. - Let pattern be ?
ToString (?Get (R,"source" )). - Let flags be ?
ToString (?Get (R,"flags" )). - Let result be the
string-concatenation of"/" , pattern,"/" , and flags. - Return result.
The returned String has the form of a
22.2.6.18 get RegExp.prototype.unicode
RegExp.prototype.unicode
is an
- Let R be the
this value. - Let cu be the code unit 0x0075 (LATIN SMALL LETTER U).
- Return ?
RegExpHasFlag (R, cu).
22.2.6.19 get RegExp.prototype.unicodeSets
RegExp.prototype.unicodeSets
is an
- Let R be the
this value. - Let cu be the code unit 0x0076 (LATIN SMALL LETTER V).
- Return ?
RegExpHasFlag (R, cu).
22.2.7 Abstract Operations for RegExp Matching
22.2.7.1 RegExpExec ( R, S )
The abstract operation RegExpExec takes arguments R (an Object) and S (a String) and returns either a
- Let exec be ?
Get (R,"exec" ). - If
IsCallable (exec) istrue , then- Let result be ?
Call (exec, R, « S »). - If result
is not an Object and result is notnull , throw aTypeError exception. - Return result.
- Let result be ?
- Perform ?
RequireInternalSlot (R, [[RegExpMatcher]]). - Return ?
RegExpBuiltinExec (R, S).
If a callable
22.2.7.2 RegExpBuiltinExec ( R, S )
The abstract operation RegExpBuiltinExec takes arguments R (an initialized RegExp instance) and S (a String) and returns either a
- Let length be the length of S.
- Let lastIndex be
ℝ (?ToLength (?Get (R,"lastIndex" ))). - Let flags be R.[[OriginalFlags]].
- If flags contains
"g" , let global betrue ; else let global befalse . - If flags contains
"y" , let sticky betrue ; else let sticky befalse . - If flags contains
"d" , let hasIndices betrue ; else let hasIndices befalse . - If global is
false and sticky isfalse , set lastIndex to 0. - Let matcher be R.[[RegExpMatcher]].
- If flags contains
"u" or flags contains"v" , let fullUnicode betrue ; else let fullUnicode befalse . - Let matchSucceeded be
false . - If fullUnicode is
true , let input beStringToCodePoints (S). Otherwise, let input be aList whose elements are the code units that are the elements of S. - NOTE: Each element of input is considered to be a character.
- Repeat, while matchSucceeded is
false ,- If lastIndex > length, then
- If global is
true or sticky istrue , then- Perform ?
Set (R,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Return
null .
- If global is
- Let inputIndex be the index into input of the character that was obtained from element lastIndex of S.
- Let r be matcher(input, inputIndex).
- If r is
failure , then- If sticky is
true , then- Perform ?
Set (R,"lastIndex" ,+0 𝔽,true ). - Return
null .
- Perform ?
- Set lastIndex to
AdvanceStringIndex (S, lastIndex, fullUnicode).
- If sticky is
- Else,
Assert : r is aMatchState .- Set matchSucceeded to
true .
- If lastIndex > length, then
- Let e be r.[[EndIndex]].
- If fullUnicode is
true , set e toGetStringIndex (S, e). - If global is
true or sticky istrue , then - Let n be the number of elements in r.[[Captures]].
Assert : n = R.[[RegExpRecord]].[[CapturingGroupsCount]].Assert : n < 232 - 1.- Let A be !
ArrayCreate (n + 1). Assert : Themathematical value of A's"length" property is n + 1.- Perform !
CreateDataPropertyOrThrow (A,"index" ,𝔽 (lastIndex)). - Perform !
CreateDataPropertyOrThrow (A,"input" , S). - Let match be the
Match Record { [[StartIndex]]: lastIndex, [[EndIndex]]: e }. - Let indices be a new empty
List . - Let groupNames be a new empty
List . - Append match to indices.
- Let matchedSubstr be
GetMatchString (S, match). - Perform !
CreateDataPropertyOrThrow (A,"0" , matchedSubstr). - If R contains any
GroupName , then- Let groups be
OrdinaryObjectCreate (null ). - Let hasGroups be
true .
- Let groups be
- Else,
- Let groups be
undefined . - Let hasGroups be
false .
- Let groups be
- Perform !
CreateDataPropertyOrThrow (A,"groups" , groups). - For each
integer i such that 1 ≤ i ≤ n, in ascending order, do- Let captureI be ith element of r.[[Captures]].
- If captureI is
undefined , then- Let capturedValue be
undefined . - Append
undefined to indices.
- Let capturedValue be
- Else,
- Let captureStart be captureI.[[StartIndex]].
- Let captureEnd be captureI.[[EndIndex]].
- If fullUnicode is
true , then- Set captureStart to
GetStringIndex (S, captureStart). - Set captureEnd to
GetStringIndex (S, captureEnd).
- Set captureStart to
- Let capture be the
Match Record { [[StartIndex]]: captureStart, [[EndIndex]]: captureEnd }. - Let capturedValue be
GetMatchString (S, capture). - Append capture to indices.
- Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽 (i)), capturedValue). - If the ith capture of R was defined with a
GroupName , then- Let s be the
CapturingGroupName of thatGroupName . - Perform !
CreateDataPropertyOrThrow (groups, s, capturedValue). - Append s to groupNames.
- Let s be the
- Else,
- Append
undefined to groupNames.
- Append
- If hasIndices is
true , then- Let indicesArray be
MakeMatchIndicesIndexPairArray (S, indices, groupNames, hasGroups). - Perform !
CreateDataPropertyOrThrow (A,"indices" , indicesArray).
- Let indicesArray be
- Return A.
22.2.7.3 AdvanceStringIndex ( S, index, unicode )
The abstract operation AdvanceStringIndex takes arguments S (a String), index (a non-negative
Assert : index ≤ 253 - 1.- If unicode is
false , return index + 1. - Let length be the length of S.
- If index + 1 ≥ length, return index + 1.
- Let cp be
CodePointAt (S, index). - Return index + cp.[[CodeUnitCount]].
22.2.7.4 GetStringIndex ( S, codePointIndex )
The abstract operation GetStringIndex takes arguments S (a String) and codePointIndex (a non-negative
- If S is the empty String, return 0.
- Let len be the length of S.
- Let codeUnitCount be 0.
- Let codePointCount be 0.
- Repeat, while codeUnitCount < len,
- If codePointCount = codePointIndex, return codeUnitCount.
- Let cp be
CodePointAt (S, codeUnitCount). - Set codeUnitCount to codeUnitCount + cp.[[CodeUnitCount]].
- Set codePointCount to codePointCount + 1.
- Return len.
22.2.7.5 Match Records
A Match Record is a
Match Records have the fields listed in
Field Name | Value | Meaning |
---|---|---|
[[StartIndex]] | a non-negative |
The number of code units from the start of a string at which the match begins (inclusive). |
[[EndIndex]] | an |
The number of code units from the start of a string at which the match ends (exclusive). |
22.2.7.6 GetMatchString ( S, match )
The abstract operation GetMatchString takes arguments S (a String) and match (a
22.2.7.7 GetMatchIndexPair ( S, match )
The abstract operation GetMatchIndexPair takes arguments S (a String) and match (a
Assert : match.[[StartIndex]] ≤ match.[[EndIndex]] ≤ the length of S.- Return
CreateArrayFromList («𝔽 (match.[[StartIndex]]),𝔽 (match.[[EndIndex]]) »).
22.2.7.8 MakeMatchIndicesIndexPairArray ( S, indices, groupNames, hasGroups )
The abstract operation MakeMatchIndicesIndexPairArray takes arguments S (a String), indices (a
- Let n be the number of elements in indices.
Assert : n < 232 - 1.Assert : groupNames has n - 1 elements.- NOTE: The groupNames
List contains elements aligned with the indicesList starting at indices[1]. - Let A be !
ArrayCreate (n). - If hasGroups is
true , then- Let groups be
OrdinaryObjectCreate (null ).
- Let groups be
- Else,
- Let groups be
undefined .
- Let groups be
- Perform !
CreateDataPropertyOrThrow (A,"groups" , groups). - For each
integer i such that 0 ≤ i < n, in ascending order, do- Let matchIndices be indices[i].
- If matchIndices is not
undefined , then- Let matchIndexPair be
GetMatchIndexPair (S, matchIndices).
- Let matchIndexPair be
- Else,
- Let matchIndexPair be
undefined .
- Let matchIndexPair be
- Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽 (i)), matchIndexPair). - If i > 0 and groupNames[i - 1] is not
undefined , thenAssert : groups is notundefined .- Perform !
CreateDataPropertyOrThrow (groups, groupNames[i - 1], matchIndexPair).
- Return A.
22.2.8 Properties of RegExp Instances
RegExp instances are
Prior to ECMAScript 2015, RegExp instances were specified as having the own RegExp.prototype
.
RegExp instances also have the following property:
22.2.8.1 lastIndex
The value of the
22.2.9 RegExp String Iterator Objects
A RegExp String Iterator is an object, that represents a specific iteration over some specific String instance object, matching against some specific RegExp instance object. There is not a named
22.2.9.1 CreateRegExpStringIterator ( R, S, global, fullUnicode )
The abstract operation CreateRegExpStringIterator takes arguments R (an Object), S (a String), global (a Boolean), and fullUnicode (a Boolean) and returns a Generator. It performs the following steps when called:
- Let closure be a new
Abstract Closure with no parameters that captures R, S, global, and fullUnicode and performs the following steps when called:- Repeat,
- Let match be ?
RegExpExec (R, S). - If match is
null , returnundefined . - If global is
false , then- Perform ?
GeneratorYield (CreateIterResultObject (match,false )). - Return
undefined .
- Perform ?
- Let matchStr be ?
ToString (?Get (match,"0" )). - If matchStr is the empty String, then
- Perform ?
GeneratorYield (CreateIterResultObject (match,false )).
- Let match be ?
- Repeat,
- Return
CreateIteratorFromClosure (closure,"%RegExpStringIteratorPrototype%" ,%RegExpStringIteratorPrototype% ).
22.2.9.2 The %RegExpStringIteratorPrototype% Object
The %RegExpStringIteratorPrototype% object:
- has properties that are inherited by all RegExp String Iterator Objects.
- is an
ordinary object . - has a [[Prototype]] internal slot whose value is
%IteratorPrototype% . - has the following properties:
22.2.9.2.1 %RegExpStringIteratorPrototype%.next ( )
- Return ?
GeneratorResume (this value,empty ,"%RegExpStringIteratorPrototype%" ).
22.2.9.2.2 %RegExpStringIteratorPrototype% [ @@toStringTag ]
The initial value of the
This property has the attributes { [[Writable]]: