A bit of concept about Regex – part 1

Tram Ho

I: Introduction

Regular expressions are patterns used to find character sets that are combined in character strings. In JavaScript, regular expressions are also objects, ie when you create a regular expression, you have a corresponding object. These patterns are used a lot in JavaScript such as the exec and test methods of RegExp, or the match, replace, search, and split methods of String. In this chapter, let’s take a closer look at regular expressions in JavaScript.

II: Create a regular expression

You can create a regular expression in one of two ways:

A regular expression literal is used as follows:

var re = /ab+c/;

Code snippets containing plain regular descriptions, after being loaded into memory, translate those descriptions into regular expressions. These translated regular expressions will be treated as constants i.e. do not have to be recreated many times, which results in better performance.

Create a RegExp object as follows:

In this way, the regular expressions are translated at program execution time, so performance is not achieved with the use of pure formal descriptor. But the advantage is that it is interchangeable, so we often use them when we want it to be mutable, or when we are unsure about the regular patterns like keyboard input.

III: How to write a regex pattern

A regex pattern is a set of lowercase characters, such as / abc / , or a combination of lowercase and special characters like / abc / or / Chapter ( d +). D / . In the last example, we see that it contains the parentheses (() used as memory devices, i.e. the patterns in this () section after being searched can be recalled for use. for the next time. You can see more details at: Use parentheses to find substring.

Use simple templates

Simple patterns are patterns that can be built up from directly searchable characters. For example, pattern / abc / will find ‘abc’ segments in that exact order in the strings. This pattern should match “Hi, do you know your abc’s?” and “The latest airplane designs evolved from slabcraft.”, since both strings contain the text ‘abc’. As for the string ‘Grab crab’, it will not match because this string does not contain ‘abc’ in the correct order, but only contains ‘ab c’.

Using special characters Patterns may contain special characters for advanced search purposes where a direct search would be as difficult as finding a paragraph containing one or more letter b, or finding a or more space characters (while space). For example, the pattern / abc / can find fragments that contain: a character ‘a’ followed by none or one or more ‘b’, and finally a ‘c’ as string “cbbabbbbcdebc,” will be matched with the substring ‘abbbbc’.

The table below describes the full range of special characters that can be used with regular expressions.

Table 4.1 Special characters in regular expressions. Character (symbol, flag) Meaning

Characters (symbols, flags)Meaning
Search with the following rules: A backslash will turn a trailing lowercase character into a special character, meaning it cannot be used for regular search anymore. For example, the ‘b’ without the backslash will match the lowercase ‘b’, but when it has the backslash added, ‘ b’ it will not match. Any more characters, this time it becomes a special character. See also word boundary character for more details. However, if it precedes a special character, it turns this character into a lower case character, meaning you can look for this special character in your string like any other normal character. For example, pattern / a * / yes ‘ ‘ is a special character, and this pattern will depend on this character, so it is interpreted that it will find a match with 0 or more characters a. But, with the pattern / a * /, the character ‘ ‘ is now understood as a normal character, so this pattern will look for the substring ‘a *’. Don’t forget that is also a special character, when needed. by itself we must also mark it as a special character by placing in front of ().
^Matches the leading characters of a string. If there are many of these flags, it can even match the leading characters of each line (after the carriage return). For example, / ^ A / would not match ‘A’ in “an A” since ‘A’ is not the leading of the string right now, but it will match “An E” since ‘A’ is already at the beginning of the string. The meaning of ‘^’ will change as it appears as the first character in a character class, see complemented character sets for details.
$Matches at the end of string. If the multiline flag is flagged, it will match immediately before the newline character. For example, / t $ / does not match ‘t’ in the string “eater” but does not match in the string “eat”.
*Allow the preceding character to repeat 0 or more times. Equivalent to the writing {0,}. For example, / bo * / matches ‘boooo’ in the string “A ghost booooed” but not in the string “A birth warbled”.
+Allow the previous character to repeat once or more. Equivalent to {1,} spelling. For example, / a + / matches ‘a’ in the string “candy” and matches all consecutive a’s in the string “caaaaaaandy”.
?Allow the preceding character to repeat 0 times or only once. Equivalent to the way of writing {0,1}. For example, / e? Le? / Matches ‘el’ in the string “angel” and ‘le’ in the string “angle” or ‘l’ in “oslo”. If use this character immediately after any of the qualifiers *, + ,? or {}, both make the quantifier “anorexia” (stop matching after finding the right letter), in contrast to their inherent “greed” (matching all the characters they find). ). For example, applying the form / d + / to “123abc” yields “123”. But applying / d +? / To the string above we only get “1”. You can read more in sections x (? = Y) and x (?! Y) of this table.
.Seal. matches any single character except a carriage return. For example, /.n/ matches ‘an’ and ‘on’ in the string “no, an apple is on the tree”, but not ‘no’.
(x)Match ‘x’ and remember this match, like the example below. Parentheses are called memory brackets. The form / (foo) (bar) 1 2 / matches ‘foo’ and ‘bar’ in the string “foo bar foo bar”. 1 and 2 in the pattern match the last 2 words. Note that 1, 2, n is used to match the sections in the regex, it represents the preceding match group. For example, / (foo) (bar) 1 2 / is equivalent to the expression / (foo) (bar) foo bar /. The syntax $ 1, $ 2, and $ n is also used in replacing parts of a regex. For example, ‘bar foo’.replace (/ (…) (…) /,’ $ 2 $ 1 ‘) will reverse the two words’ bar’ and ‘foo’.
(?: x)Matches ‘x’, but doesn’t remember the match. Parentheses are called memory brackets, and they allow you to define sub-expressions for the matching operators. Consider the simple expression / (?: foo) {1,2} /. If this expression is written as / foo {1,2} /, {1,2} will apply only to the ‘o’ at the end of the string ‘foo’. For the brackets you don’t remember, {1,2} applies to the entire phrase ‘foo’.
x (? = y)Match ‘x’ only if ‘x’ followed by ‘y’. For example, / Jack (? = Sprat) / matches ‘Jack’ only if it is ‘Sprat’ behind it. / Jack (? = Sprat
x (?! y)Match ‘x’ only if ‘x’ is not followed by ‘y’. For example, /d+(?!.)/ matches only unsigned numbers. behind. The expression /d+(?!.)/.exec(“3.141 “) returns ‘141’ and not ‘3.141’.
x yMatches ‘x’ or ‘y’ For example, / green
{n}The preceding character must appear n times. n must be a positive integer. For example, / a {2} / does not match ‘a’ in “candy”, but it matches all ‘a’ in “caandy”, and the first 2 letters ‘a’ in “caaandy”. “.
{n, m}The preceding character must appear n to m times. n and m are positive integers and n <= m. If m is omitted, it is equivalent to ∞. For example, / a {1,3} / does not match any of the characters in “cndy”, the ‘a’ in “candy”, the first 2 characters of ‘a’ in “caandy”, and the three characters. The first ‘a’ in “caaaaaaandy”. Note that “caaaaaaandy” only matches the first 3 characters ‘a’ even though that string contains 7 characters ‘a’.
[xyz]Character class. This pattern is used to match any character in square brackets, including the escape sequences. In the character class, the period (.) And the asterisk (*) are no longer special characters, so we don’t need an escape character before it. You can specify a range of characters by using a hyphen character (-) as shown in the following example: The pattern [ad] matches similar to the pattern [abcd], which matches ‘b’ in “brisket. “and ‘c’ in” city “. The model / []] ]/ and / matches the entire string ” test.i.ng “.
[^ xyz]Negative character class. When the ^ character is first in square brackets, it negates this pattern. For example, [^ abc] is similar to [^ ac], matching ‘r’ in “brisket” and ‘h’ in “chop” is the first character not in the range a through c.
[ b]Matches backspace character – backspace (U + 0008). You must enclose it in square brackets if you want to match a backward shift character. (Don’t be confused with the b template).
bMatches the boundary character. The boundary character is a pseudo-character that matches the position where a character cannot be followed or preceded by another character. Sample equivalent (^ w
/ oo b /does not match ‘oo’ in string “moon”, because ‘oo’ is followed by the character ‘n’; / oon b / matches ‘oon’ in the string “moon”, because ‘oon’ is at the end of the string so it should not be followed by a character;
/ w b w /won’t match anything, because a character cannot follow a boundary character and a lower case character. Note: The Javascript regular expression compiler engine defines a character class as lowercase characters. Any character that is not of the character class is treated as a break character. This character class is quite limited: it consists of both upper and lower case Latin characters, decimals, and underscore. An accented character, like “é” or “ü”, unfortunately, is treated as a break.
Bmatches non-boundary characters. This pattern matches where the character before and after it is of the same type: either is a character or both are not a character. The beginning and the end of a string are not considered characters. For example, /B../ matches ‘oo’ in “noonday”, and /yB./ matches ‘ye’ as “possibly yesterday.”
cXX is a character in the range A to Z. This pattern matches a control character in a string. For example / cM / matches control-M (U + 000D) in string.
dMatches a numeric character. Equivalent to the sample [0-9]. For example: / d / or / [0-9] / matches ‘2’ in the string “B2 is the suite number.”
DMatches a character that is not a numeric character. Equivalent to the sample [^ 0-9]. For example; / D / or / [^ 0-9] / matches ‘B’ in “B2 is the suite number.”
fMatches the paging character – form feed (U + 000C).
nMatches a newline – line feed (U + 000A).
rMatches the carriage return (U + 000D).
SMatches a single space character, including blank – space, tab, pagination – form feed, line feed. Equivalent to [ f n r t v u00a0 u1680 u180e u2000 u2001 u2002 u2003 u2004 u2005 u2006 u2007 u2008 u2009 u200a u2028 u2029 u202f u205f u3000]. For example / s w * / matches ‘bar’ in “foo bar.”
SMatches a character other than a space. Equivalent to [^ f n r t v u00a0 u1680 u180e u2000 u2001 u2002 u2003 u2004 u2005 u2006 u2007 u2008 u2009 u200a U2028 u2029 u202f u205f u3000]. For example / S w * / matches ‘foo’ in the string “foo bar.”
tMatches a tab character (U + 0009).
vMatches the vertical tab character (U + 000B).
wMatches all alphanumeric characters and underscores. Sample equivalent [A-Za-z0-9_]. for example, / w / matches ‘a’ in “apple,” ‘5’ in “$ 5.28,” and ‘3’ in “3D.”
WMatches all non-text characters. Equivalent to the sample [^ A-Za-z0-9_]. for example, / W / or / [^ A-Za-z0-9 _] / matches ‘%’ in “50%.”
nWhere, n is a positive integer, a backward reference to the nth matching string in the expression (counts from left, starting with 1). For example, / apple (,) sorange 1 / hay / apple (,) sorange, / matches ‘apple, orange,’ in the string “apple, orange, cherry, peach.”
0Matches a NULL character (U + 0000). Note: do not add any numeric characters after 0, because 0 <numeric characters> is an escape sequence octal representation.
xhhMatches a character with the code hh (2 numbers in hexadecimal).
uhhhhMatches hhhh code (4 numbers in hexadecimal).

Encodes the escapse string the user enters with a simple replace function that uses a regular expression

Using parentheses Round brackets around any part of a regular expression will cause the resulting match to be remembered. Each time, the substring can be recalled for use, as described in Using Parenthesized Substring Matches.

For example, the pattern / Chapter ( d +). D * / matches ‘Chapter’ followed by one or more numeric characters, followed by a decimal point, possibly 0 or more characters. number. In addition, parentheses are used to remember one or more of the first matched numeric characters.

This pattern is found in the string “Open Chapter 4.3, paragraph 6”, remembers ‘4’ but is not found in the string “Chapter 3 and 4”, because that string has no period after the numeric character ‘3’.

To match a substring with no memory, put?: In the first position in parentheses. For example, (?: D +) matches one or more numeric characters, but doesn’t remember the match.

Working with Regular Expressions Regular expressions are used with the RegExp class’s test and exec methods, or the string’s match, replace, search, and split methods. These methods are explained in detail in the JavaScript Reference. ** Table 4.2 Methods used in Regular Expressions **

Phuong thDescribe
execA method of RegExp used to search for the matching pattern string. It returns an array containing the search results.
kiểm TRAA method of RegExp used to check if the pattern matches the string or not. It returns true or false value.
matchA method of the string used to find the matching pattern. It returns an array containing the search results or null if not found.
searchA method of the string that searches for the string matching the match pattern and returns the position of that string or -1 if not found.
replaceA method of the string that searches for a string by matching pattern and replaces the matched substring with an alternate string.
splitA string method uses a regular form or an immutable string to break the string into an array of substring.

When you want to know if a pattern is found in the string, use the test or search methods; to get more information (but slower) use the exec or match method.

As shown in the example below, the exec method is used to find the matching pattern string.

If you don’t need to access other regular expression properties, use the following:

var myArray = /d(b+)d/g.exec("cdbbdbsbz");

If you want to initialize a regular expression from a string:

With these codes, the match is successful and returns an array of results with the properties listed in the table below.

Conclusion: Here is some knowledge about Regex that I want to tell you. Please can you give me more comments.

Thanks

Share the news now

Source : Viblo