Learn Regex through examples

Tram Ho

Regex stands for Regular expressions, which is extremely useful for extracting information from any text or checking the input text for the required format, especially if you can. Use it in most programming languages ​​(JavaScript, Java, VB, C #, C / C ++, Python, Perl, Ruby, Delphi, R, Tcl, …).

In this article we will learn regex through examples. Try it on: https://regex101.com

Basic knowledge

Anchor points – ^ and $

  • ^ The : matches any string starting with ‘The’
  • end $ : match any string ending in ‘end’
  • ^ The end $ : exact string match is ‘The end’
  • roar : matches any string with the string ‘roar’ inside it

Quantifiers – * +? and {}

  • abc *: matches the string ‘ab’, followed by nothing or more letters c (e.g. ‘ab’, ‘abc’, ‘abccc’)
  • abc + : same as above, but here must have at least 1 letter c (eg ‘abc’, ‘abccc’)
  • abc? : similar to the first one, but followed by only 1 letter c (eg ‘ab’, ‘abc’)
  • abc {2} : matches ab string, followed by 2 letters c (eg ‘abcc’)
  • abc {2,} : same as above, but followed by two or more letters c (e.g. ‘abcc’, ‘abccccc’)
  • abc {2,5} : same as above, followed by 2 to 5 letters c
  • a (bc) *: matches a string with an a, followed by 0 or more phrases ‘bc’ (e.g. ‘ag’, ‘abc’, ‘abcbcbc’)
  • a (bc) {2,5} : the combination of the 2 above, we have the string with the letter a, followed by 2 to 5 phrases ‘bc’

OR operator – | or []

  • a (b | c) : string with the letter a, followed by b or c (e.g. ‘ab’, ‘ac’, ‘abd’)
  • a [bc] : same as above

Character types – d w s and.

  • d => alphanumeric characters (e.g. ‘a2b’, ‘a42c’)
  • w => all characters (uppercase, lowercase, numbers, underscores)
  • s => the way characters, including tabs and newline
  • . => all characters

However we should be careful when using . , use their normal and negative types to be more accurate. The negative of d , w , s is D , W , S

For example, D will return the inverse result of d , that is, all non-numeric characters.

In addition, to avoid confusion, special characters ^.[$()|*+?{ Must be added before, for example, a string with the character $, followed by 1 digit, we write $d .

Advanced knowledge

Group – ()

  • a (bc) => This one just used above, group 2 characters b, c into 1 group, and must be followed by ‘bc’
  • a (?: bc) * => this is similar to the one above, but it will be optional
  • a (? <foo> bc) => is used to name the group, for example, in the same regex string that has many groups, the naming will help you more easily distinguish groups.

This operator is useful when you need to retrieve information from a string, the information will be stored as an array and can be retrieved through the index, or used ?<foo> , you can access the group by their name. .

Bracket expressions – []

  • [abc] => string a or b or c, like a | b | c
  • [ac] => same as above, characters from a to c
  • [a-fA-F0-9] => format of hexadecimal string series
  • [0-9]% => string with 0 to 9 characters in front of%
  • [^ a-zA-Z] => string with non-a to z characters, A to Z, ^ used to negative

Note that inside [ ] special characters, including have no effect.

Boundary – b and B

babc b => do a whole word search, which means only the exact word is abc

b is an anchor point like ^ or $ , but anchor at both ends, accompanied by a negative B , for example:

Babc B will match strings that have the ‘abc’ character surrounded by other characters (e.g. ‘gabcy’)

Front and back – (? =) And (? <=)

  • d (? = r) => matches the character d only when behind it is the character r, but r is not part of the regex string (eg drone)
  • (? <= r) d => same as above, but d must go immediately after r (ex: third)

Similarly, we have its negatives (?!) And (?<!)

  • d (?! r) => matches d if it is not after r, r is not part of the regex string
  • (? <! r) d => same as above, but d can’t be after r

summary

As you can see regex can be used in lots of places, here are a few basic ones:

  • Data validation: The most familiar is checking whether the input string is in the correct email format when logging in or registering.
  • Data scanning: especially web scanning, checking all pages containing a certain set of words, this can be used for keywords in SEO
  • Sort data: convert data from “raw” format to other format.
  • string analysis: for example capture all GET URL parameters
  • replace string

The article is quite simple, hopefully can help you understand and easily use regex. Source: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285

Share the news now

Source : Viblo