The following tables list common elements of lexemes.
Concept |
Rule |
Representation |
Description |
|---|---|---|---|
| Decimal digit | DG | [0-9] | One character from '0'..'9'. |
| Octal digit | OC | [0-7] | One character from '0'..'7'. |
| Hexadecimal digit | HX | [0-9a-fA-F] | Any of the characters '0'..'9' and any of the letters 'A'..'F' and 'a'..'f'. |
| Single letter | LT | [A-Za-z_$] | Any of the characters 'A'..'Z', 'a'..'z', and the underscore (_) and dollar sign ($) characters. |
| Single
letter from the International Character Set |
LT18N | [A-Za-z_$\200-\377] | Any of the characters 'A'..'Z', 'a'..'z', the underscore (_) and dollar sign ($) characters, and any character in the top half of the 8-bit character set. |
| Shell 'word' | WD | [^ \t;\n'"] | Any character except space, tab, semicolon (;), linefeed, less than (<), greater than (>), and quotes (' or "). |
| File name | FL | [^ \t\n\}\;\>\<] | Any character except space, tab, semicolon (;), linefeed, right brace (}), less than (<), greater than (>), and tick (`). |
| Optional exponent | Exponent | [eE][+-]?{DG}+ | Numbers often allow an optional exponent. It is represented as an 'e' or 'E' followed by an optional plus (+) or minus (-), and then one or more decimal digits. |
| Whitespace | Whitespace | [ \t]+ | Whitespace is often used to separate two lexemes that would otherwise be misconstrued as a single lexeme. For example, stop in is two keywords, but stopin is an identifier. Apart from this separating property, Whitespace is usually ignored. Whitespace is a sequence of one or more tabs or spaces. |
| String literal | stringChar | ([^"\\\n]|([\\]({simpleEscape}| {octalEscape}|{hexEscape}))) |
Any character except the terminating quote character ("), or a newline (\n). If the character is a backslash (\), it is followed by an escaped sequence of characters. |
| Character literal | charChar | ([^'\\\n]|([\\]({simpleEscape}| {octalEscape}|{hexEscape}))) |
Any character except the terminating quote (') character, or a newline (\n). If the character is a backslash (\), it is followed by an escaped sequence of characters. |
| Environment variable identifier | EID | [^ \t\n;='"&\|] | Any character except space, tab, linefeed, less-than (<), greater-than (>), semicolon (;), equal sign (=), quotes (' or "), ampersand (&), backslash (\), and bar (|). |
| Universal character name | UCN | \\u{HX}{4}|\\U{HX}{8} | A universal character name is a backslash (\) followed by either a lowercase 'u' and 4 hexadecimal digits, or an uppercase 'U' and 8 hexadecimal digits. |
The escaped sequence of characters can be one of following three forms: