regular expression

Tram Ho

I. What is regex?

1.1 Definition

  • Regex stands for Regular Expression , which describes a collection of other strings , following certain rules and syntax.
  • Regex is often used with search and text processing utilities based on specified patterns.
  • Many languages ​​support regular expressions (Regex) when handling strings like Javascript, Ruby …. and others through standard libraries such as Java, Python, C ++, .NET.
  • Most languages ​​provide Regex through libraries.

Regex is used to check the validity of the input data as strings, such as checking the url address, username, password, phone number, email … Regex results can be understood simply. is the result of a match between the expression. You can write Regex and try cases with the website http://rubular.com/ , https://regexr.com/

1.2 Advantages

  • Flexible, good use with many languages.
  • Handle strings concisely and conveniently.
  • Minimize time and effort when programming. Because just writing a sample expression can guarantee input.

1.3 Cons

Easy to mistake, easy to be potential error. Difficult to learn because of good rules and logical thinking.

II. Explain the meaning of characters in regex

  • ^: Start the string, matching the string beginning with the character after the ^.
  • a *: appears a 0 or more characters.
  • a +: A character appears at least once in the string.
  • a ?: With or without the letter a.
  • .: matches any single character except line breaks.
  • x | y: Either have x or y or have both
  • a {n}: The character appears n times in the string, n must be a positive integer.
  • a {n, m}: n <= m and n, m must be a positive integer. The a character appears at least n times and at most m times in the string. When m is not entered, m is regarded as having the value ∞ d: Number characters [0-9] D: Non-numeric characters [^ 0-9] s: Space. S: All words are not whitespace.

III. Some sample regex snippets.

– Password strength ^(?=.*[AZ].*[AZ])(?=.*[ [email protected] #$&*])(?=.*[0-9].*[0-9])(?=.*[az].*[az].*[az]).{8}$

– Hex color codes #([a-fA-F]|[0-9]){3, 6}

– Verify email address /[A-Z0-9._%+-] [email protected] [A-Z0-9-]+.+.[AZ]{2,4}/igm

– Thousands separator /d{1,3}(?=(d{3})+(?!d))/g

– Get the domain name from URL /https?://(?:[-w]+.)?([-w]+).w+(?:.w+)?/?.*/i

– Sort the keywords by counting the number of words

– Verify the phone number ^+?d{1,3}?[- .]?(?(?:d{2,3}))?[- .]?ddd[- .]?dddd$

– Verify date in DD / MM / YYYY format ^(?:(?:31(/|-|.)(?:0?[13578]|1[02]))1|(?:(?:29|30)(/|-|.)(?:0?[1,3-9]|1[0-2])2))(

– Email header analysis /b[A-Z0-9._%+-] [email protected] (?:[A-Z0-9-]+.)+[AZ]{2,6}b/it

IV. References

  1. https://freetuts.net/cac-quy-tac-regular-expression-can-ban-65.html
  2. https://ruby-doc.org/core-2.6.4/Regexp.html
  3. https://techblog.vn/su-dung-regex-trong-ruby
Share the news now

Source : Viblo