Regex basic cheatsheet

Tram Ho

1. Basic topics

Anchors , ^ and $

  • ^The matches any string beginning with The
  • end$ matches string that ends with end
  • ^The end$ matches the string beginning and ending with The end
  • roar matches any string containing the word roar

Quantifiers , * , + ? and {}

  • abc* matches a string that has ab and after it can have 0 or more c
  • abc+ matches a string that has ab and after it has 1 or more c
  • abc? matches a string that has ab and after it can have 0 or 1 c
  • abc{2} matches a string that has ab and after it has 2 c
  • abc{2,} matches a string that has ab and after it has 2 or more c
  • abc{2,5} matches a string that has ab and after it has 2 to 5 c
  • a(bc)* matches a string that has a and after it can have 0 or more pairs of bc
  • a(bc){2,5} matches a string that has a and after it has 2 to 5 pairs of bc

OR operator , | , or []

  • a(b|c) matches a string that has a and after it has b or c
  • a[bc] matches a string that has and after it has no b or c

Character classes – d , w , s and .

  • d matches a numeric character
  • w matches 1 word (includes letters of the alphabet + _ character)
  • s matches a whitespace (including tab and newline).
  • . matches any character

The corresponding d , w and s have a negative form D , W and S . For example, D would match a character that was not a number

Flags

A regex usually has the format /abc/ , the pattern phrase used for searching is enclosed in two . At the end of each regex we can set a couple of flags (commonly used) as shown below:

  • g (global): does not return the first search result, but keep searching
  • m (multi-line): search on multiple lines
  • i (insensitive): is not case sensitive
  • x (extended): ignore spaces (space)

Intermediate topics

Grouping and capturing – ()

  • a(bc) : create a group bc
  • a(?:bc)* use ?: will disable the group, for example the above regex will match with a , abc
  • a(?<foo>bc) : name the group bc foo

Group naming is very useful when you need to extract the results from the regex, especially with complex regexes.

Bracket expressions – []

  • [abc] matches a string that has a, b, or c (similar to a|b|c )
  • [ac] matches a string consisting of one of the characters a through c (a, b, c).
  • [a-fA-F0-9] matches a 16 number, irrespective of case.
  • [0-9]% matches a 1-character string from 0 to 9 and after it is 1 character%
  • [^a-zA-Z] matches a string that contains no letters az and is not case sensitive.

Greedy and Lazy match

The characters ( , * , + , { , } , ) are called greedy operators , they will extend the string matching regex to the maximum.

For example, <.+> Would match this text <div>simple div</div> in This is a <div> simple div</div> test .

If you just need to find the div tag then you can use it ? :

  • <.+?> matches any one or more characters within <>

For a simpler way, we use Bracket expressions like above in combination with negation ^ , we will have a better solution that we don’t have to use . :

  • <[^<>]+> matches any character other than < or > one or more times within <>

Summary

As you can see, regex applications can have many and sure to recognize at least one of its applications in your work:

  • validate data
  • data scraping (especially web scans) – data scraping

Properly applying regex to your work will make it simpler and more efficient. Reference: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples -649dc1c3f285

Share the news now

Source : Viblo