Breaking

Thursday, November 28, 2019

List of Javascript Regular Expressions! Create your own custom Regular Expressions

what is regular expression


What are Regular Expressions in Javascript?

Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec and test methods of RegExp, and with the match, matchAll, replace, search, and split methods of String. This chapter describes JavaScript regular expressions.


Regular Expressions to validate a Username


/^[a-z0-9_-]{3,16}$/

We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).


Regular Expressions to validate a Password


/^[a-z0-9_-]{6,18}$/

Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).


Regular Expressions to validate a Indian Mobile numbers


^[789]\d{9}$


Regular Expressions to validate any Mobile numbers


/^(\+\d{1,3}[- ]?)?\d{10}$/


Regular Expressions to validate a Hex Value


/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

We begin by telling the parser to find the beginning of the string (^). Next, a number sign is optional because it is followed by a question mark. The question mark tells the parser that the preceding character — in this case, a number sign — is optional, but to be "greedy" and capture it if it's there. Next, inside the first group (first group of parentheses), we can have two different situations. The first is any lowercase letter between a and for a number six times. The vertical bar tells us that we can also have three lowercase letters between a and for numbers instead. Finally, we want the end of the string ($).

The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f's.


Regular Expressions to validate a Slug


/^[a-z0-9-]+$/

You will be using this regex if you ever have to work with mod_rewrite and pretty URL's. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).


Regular Expressions to validate an Email


/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/  OR  /^([a-z0-9_\.\+-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country-specific TLD's (.ny.us or .co.uk). Finally, we want the end of the string ($).


Regular Expressions to validate a URL


/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

This regex is almost like taking the ending part of the above regex, slapping it between "http://" and some file structure at the end. It sounds a lot simpler than it really is. To start off, we search for the beginning of the line with the caret.

The first capturing group is all options. It allows the URL to begin with "http://", "https://", or neither of them. I have a question mark after the s to allow URL's that have http or https. In order to make this entire group optional, I just added a question mark to the end of it.

Next is the domain name: one or more numbers, letters, dots, or hypens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we say that this group can be matched as many times as we want. Pretty much this allows multiple directories to be matched along with a file at the end. I have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.

Then a trailing slash is matched, but it can be optional. Finally, we end with the end of the line.


Regular Expressions to validate an IP Address


/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/

Now, I'm not going to lie, I didn't write this regex; I got it from here. Now, that doesn't mean that I can't rip it apart character for character.

The first capture group really isn't a captured group because


was placed inside which tells the parser to not capture this group (more on this in the last regex). We also want this non-captured group to be repeated three times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup then a dot to move on.

The subgroup is also another non-capture group. It's just a bunch of character sets (things inside brackets): the string "25" followed by a number between 0 and 5; or the string "2" and a number between 0 and 4 and any number; or an optional zero or one followed by two numbers, with the second being optional.

After we match three of those, it's onto the next non-capturing group. This one wants: the string "25" followed by a number between 0 and 5; or the string "2" with a number between 0 and 4 and another number at the end; or an optional zero or one followed by two numbers, with the second being optional.

We end this confusing regex with the end of the string.



Regular Expressions to validate an HTML Tag


/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

One of the more useful regexes on the list. It matches any HTML tag with the content inside. As usually, we begin with the start of the line.

First comes the tag's name. It must be one or more letters long. This is the first capture group, it comes in handy when we have to grab the closing tag. The next thing are the tag's attributes. This is any character but a greater than sign (>). Since this is optional, but I want to match more than one character, the star is used. The plus sign makes up the attribute and value, and the star says as many attributes as you want.

Next comes the third non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forward slash, and a greater than sign. The first option looks for a greater than sign followed by any number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing group. In this case it was the tag's name. Now, if that couldn't be matched we want to look for a self closing tag (like an img, br, or hr tag). This needs to have one or more spaces followed by "/>".

The regex is ended with the end of the line.

how to create custom regular expression?

There are two ways to create Regular expression which are as follows

(1) Using a regular expression literal, which consists of a pattern enclosed between slashes

Var regex = /a+c/;

Regular expression literals are executed when the script is loaded.

(2) Calling the constructor function of the RegExp object 

var re = new RegExp('a+c');

A constructor function is executed runtime. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.


Compose Regular expression

Regular expressions are composed of two different patterns, simple characters or special characters.

Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /xyz/ matches character combinations in strings only when exactly the characters 'xyz' occur together and in that order. Such a match would succeed in the strings "Hi, do you know your xyz's?" and "The latest airplane designs evolved from slxyzraft." In both cases the match is with the substring 'xyz'. There is no match in the string 'Grab crab' because while it contains the substring 'xy z', it does not contain the exact substring 'xyz'.

Using special characters
When the search for a match requires something more than a direct match, such as finding one or more y's or finding white space, you can include special characters in the pattern.

The table provides a complete list and description of the special characters that can be used in regular expressions.

Assertions
It indicates in some way that a match is possible. Assertions include look-ahead, look-behind, and conditional expressions.

Boundaries
Indicate the beginnings and endings of lines and words.

Character Classes
Distinguishes kinds of characters such as, for example, distinguishing between letters and digits.

Groups and Ranges
Indicates groups and ranges of expression characters.

Quantifiers
Indicates numbers of characters or expressions to match.


(1)'\'

Matches according to the following rules:

A backslash that precedes a non-special character indicates that the next character is special and is not to be interpreted literally. For example, a 'b' without a preceding '\' generally matches lowercase 'b's wherever they occur — the character will be interpreted literally. But a sequence of '\b' doesn't match any character; it denotes a word boundary.

A backslash that precedes a special character indicates that the next character is not special and should be interpreted literally. See "Escaping" below for details.


If you're using the RegExp constructor with a string, don't forget that backslash is an escape character in string literals, and so to put a backslash in the pattern, you need to escape it in the string literal. /[a-z]\s/i and new RegExp("[a-z]\\s", "i") create the same regular expression: an expression that searches for any letter in the range A-Z followed by a whitespace character (\s, see below). To include a literal backslash in an expression created via new RegExp with a string literal, you need to escape it at both the string literal level and the regular expression level: /[a-z]:\\/i and new RegExp("[a-z]:\\\\","i") create the same expression, which would match a string like "C:\".

(2)'^'

Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.

For example, /^A/ does not match the 'A' in "an A", but does match the 'A' in "An E".


The '^' has a different meaning when it appears as the first character in a character set pattern. See complemented character sets for details and an example.

(3)'$'

Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character.

For example, /t$/ does not match the 't' in "eater", but does match it in "eat".

(4)'*'

Matches the preceding expression 0 or more times. Equivalent to {0,}. For example, /bo*/ matches 'boooo' in "A ghost booooed" and 'b' in "A bird warbled" but nothing in "A goat grunted".

(5)'+'

Matches the preceding expression 1 or more times. Equivalent to {1,}.

For example, /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy", but nothing in "cndy".

(6)'?'

Matches the preceding expression 0 or 1 time. Equivalent to {0,1}.

For example, /e?le?/ matches the 'el' in "angel" and the 'le' in "angle" and also the 'l' in "oslo".

If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the fewest possible characters), as opposed to the default, which is greedy (matching as many characters as possible). For example, applying /\d+/ to "123abc" matches "123". But applying /\d+?/ to that same string matches only the "1".

Also used in lookahead assertions, as described in the x(?=y) and x(?!y) entries of this table.

(7)'.'

(The decimal point) matches any single character except the newline character, by default.

For example, /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.

If the s ("dotAll") flag is set to true, it also matches newline characters.


(8)'(x)'

Matches 'x' and remembers the match, as the following example shows. The parentheses are called capturing parentheses.

The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 denote the first and second parenthesized substring matches - foo and bar, matching the string's last two words. Note that \1, \2, ..., \n are used in the matching part of the regex, for more information, see \n below. In the replacement part of a regex the syntax $1, $2, ..., $n must be used, e.g.: 'bar foo'.replace(/(...) (...)/, '$2 $1'). $& means the whole matched string.


(9)(?:x)

Matches 'x' but does not remember the match. The parentheses are called non-capturing parentheses, and let you define subexpressions for regular expression operators to work with. Consider the sample expression /(?:foo){1,2}/. If the expression was /foo{1,2}/, the {1,2} characters would apply only to the last 'o' in 'foo'. With the non-capturing parentheses, the {1,2} applies to the entire word 'foo'. For more information, see Using parentheses below.

(10)x(?=y)

Matches 'x' only if 'x' is followed by 'y'. This is called a lookahead.

For example, /Jack(?=Sprat)/ matches 'Jack' only if it is followed by 'Sprat'. /Jack(?=Sprat|Frost)/ matches 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' is part of the match results.

(11)x(?!y)
Matches 'x' only if 'x' is not followed by 'y'. This is called a negated lookahead.

For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. The regular expression /\d+(?!\.)/.exec("3.141") matches '141' but not '3.141'.


(12)(?<=y)x

Matches x only if x is preceded by y.This is called a lookbehind.

For example, /(?<=Jack)Sprat/ matches "Sprat" only if it is preceded by "Jack".
/(?<=Jack|Tom)Sprat/ matches "Sprat" only if it is preceded by "Jack" or "Tom".
However, neither "Jack" nor "Tom" is part of the match results.

(Added in ES2018, not yet supported in Firefox)


(13)(?<!y)x

Matches x only if x is not preceded by y.This is called a negated lookbehind.

For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign.
/(?<!-)\d+/.exec('3') matches "3".
/(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.

(Added in ES2018, not yet supported in Firefox)


(14)x|y

Matches 'x', or 'y' (if there is no match for 'x').

For example, /green|red/ matches 'green' in "green apple" and 'red' in "red apple." The order of 'x' and 'y' matters. For example a*|b matches the empty string in "b", but b|a* matches "b" in the same string.


(15){n}

Matches exactly n occurrences of the preceding expression. N must be a positive integer.

For example, /a{2}/ doesn't match the 'a' in "candy," but it does match all of the a's in "caandy," and the first two a's in "caaandy."

(16){n,}

Matches at least n occurrences of the preceding expression. N must be a positive integer.

For example, /a{2,}/ will match "aa", "aaaa" and "aaaaa" but not "a"


(17){n,m}

Where n and m are positive integers and n <= m. Matches at least n and at most m occurrences of the preceding expression. When m is omitted, it's treated as ∞.

For example, /a{1,3}/ matches nothing in "cndy", the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy". Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more a's in it.


(18)[xyz]

Character set. This pattern type matches any one of the characters in the brackets, including escape sequences. Special characters like the dot(.) and asterisk (*) are not special inside a character set, so they don't need to be escaped. You can specify a range of characters by using a hyphen, as the following examples illustrate.

The pattern [a-d], which performs the same match as [abcd], matches the 'b' in "brisket" and the 'c' in "city". The patterns /[a-z.]+/ and /[\w.]+/ match the entire string "test.i.ng".


(19)[^xyz]

A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen. Everything that works in the normal character set also works here.

For example, [^abc] is the same as [^a-c]. They initially match 'r' in "brisket" and 'h' in "chop."


(20)[\b]

Matches a backspace (U+0008). You need to use square brackets if you want to match a literal backspace character. (Not to be confused with \b.)


(21)\b

Matches a word boundary. A word boundary matches the position between a word character followed by a non-word character, or between a non-word character followed by a word character, or the beginning of the string, or the end of the string. A word boundary is not a "character" to be matched; like an anchor, a word boundary is not included in the match. In other words, the length of a matched word boundary is zero. (Not to be confused with [\b].)

Examples using the input string "moon":
/\bm/ matches, because the \b is at the beginning of the string;
the \b in /oo\b/ does not match, because the \b is both preceded and followed by word characters;
the \b in /oon\b/ matches, because it appears at the end of the string;
the \b in /\w\b\w/ will never match anything, because it is both preceded and followed by a word character..

Note: JavaScript's regular expression engine defines a specific set of characters to be "word" characters. Any character not in that set is considered a non-word character. This set of characters is fairly limited: it consists solely of the Roman alphabet in both upper- and lower-case, decimal digits, and the underscore character. Accented characters, such as "é" or "ü" are, unfortunately, treated as non-word characters for the purposes of word boundaries, as are ideographic characters in general.


(22)\B

Matches a non-word boundary. This matches the following cases:

Before the first character of the string
After the last character of the string
Between two word characters
Between two non-word characters
The empty string
For example, /\B../ matches 'oo' in "noonday", and /y\B./ matches 'ye' in "possibly yesterday."


(23)\cX

Where X is a character ranging from A to Z. Matches a control character in a string.

For example, /\cM/ matches control-M (U+000D) in a string.


(24)\d

Matches a digit character. Equivalent to [0-9].

For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."


(25)\D

Matches a non-digit character. Equivalent to [^0-9].

For example, /\D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."


(26)\f

Matches a form feed (U+000C).


(27)\n

Matches a line feed (U+000A).


(28)\r

Matches a carriage return (U+000D).


(29)\s

Matches a white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

For example, /\s\w*/ matches ' bar' in "foo bar."


(30)\S
Matches a character other than white space. Equivalent to [^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

For example, /\S*/ matches 'foo' in "foo bar."


(31)\t

Matches a tab (U+0009).


(32)\v

Matches a vertical tab (U+000B).


(33)\w

Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_].

For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."


(34)\W

Matches any non-word character. Equivalent to [^A-Za-z0-9_].

For example, /\W/ or /[^A-Za-z0-9_]/ matches '%' in "50%."


(35)\n

Where n is a positive integer, a backreference to the last substring matching the n parenthetical in the regular expression (counting left parentheses).

For example, /apple(,)\sorange\1/ matches 'apple, orange,' in "apple, orange, cherry, peach."


(36)\0

Matches a NULL (U+0000) character. Do not follow this with another digit, because \0<digits> is an octal escape sequence. Instead use \x00.


(37)\xhh

Matches the character with the code hh (two hexadecimal digits)


(38)\uhhhh

Matches the character with the code hhhh (four hexadecimal digits).


(39)\u{hhhh}

(only when u flag is set) Matches the character with the Unicode value hhhh (hexadecimal digits).


Escaping in Regular Expressions


If you need to use any of the special characters literally (actually searching for a '*', for instance), you must escape it by putting a backslash in front of it. For instance, to search for 'a' followed by '*' followed by 'b', you'd use /a\*b/—the backslash "escapes" the '*', making it literal instead of special.

Similarly, if you're writing a regular expression literal and need to match a slash ('/'), you need to escape that (otherwise, it terminates the pattern). For instance, to search for the string "/example/" followed by one or more alphabetic characters, you'd use /\/example\/[a-z]+/i—the backslashes before each slash make them literal.

To match a literal backslash, you need to escape the backslash. For instance, to match the string "C:\" where 'C' can be any letter, you'd use /[A-Z]:\\/—the first backslash escapes the one after it, so the expression searches for a single literal backslash.

If using the RegExp constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the same expression, which searches for 'a' followed by a literal '*' followed by 'b'.

If escape strings are not already part of your pattern you can add them using String.replace:

function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}
The g after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches. It is explained in detail below in Advanced Searching With Flags.


Using parentheses in Regular Expressions

Parentheses around any part of the regular expression pattern causes that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use, as described in Using Parenthesized Substring Matches.

For example, the pattern /Chapter (\d+)\.\d*/ illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter ' followed by one or more numeric characters (\d means any numeric character and + means 1 or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character 0 or more times (\d means numeric character, * means 0 or more times). In addition, parentheses are used to remember the first matched numeric characters.

This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapter 3 and 4", because that string does not have a period after the '3'.

To match a substring without causing the matched part to be remembered, within the parentheses preface the pattern with ?:. For example, (?:\d+) matches one or more numeric characters but does not remember the matched characters.

No comments:

Post a Comment

Featured Post

[Solved] How to get current location of user using javascript?(Example code)

How to get the current location of a user using javascript? You might think to get a user's location is a difficult task, but it&...

Popular Posts