When the ^ character is used in brackets , [^], it negates the character or character set in the brackets. To match a string that starts with "c", ends with "t" but does not have an "a" in the middle:
c[^a]t
If you want to negate more than one character or a range of characters, they all go inside the brackets with the ^. To negate the letter "p" and the numbers 4 through 7:
[^p4-7]
The ^ must be inside the brackets. If it is outside of the brackets, PHP may think you want it to anchor the Regex. ^ negates all of the characters in the set, not just the first one.
Regex has more metacharacters that represent special characters and ranges of characters. The include:
| \d | Matches any digit. Same as [0-9] |
| \D | Opposite of \d. same as [^\d] |
| \r | Return |
| \n | New line |
| \s | Space |
| \S | Opposite of \s |
| \w |
"Word" character. Any alphanumeric character or underscore. Same as [_a-zA-Z0-9] |
| \W | Opposite of \w. Same as [^_a-zA-Z0-9] |
If you are matching a pattern that repeats like in the previous example, the Regex can get long so there are several short cuts you can use.
| ? | Matches zero or one of the previous items |
| Example: cats? Will match cat and cats. |
|
| + | Matches zero or more of the previous items |
| Example: file\d+\.html Will match file.html, file1.html. file123.html etc. Note that "." is escaped because it is part of the Regex. |
|
| * | Matches one or more of the previous items |
| Example: file\d*\.html Will match file1.html. file123.html etc. but not file.html. |
|
| +? | Lazy version of + |
| *? | Lazy version of * |
| {n} | Matches n instances of a pattern |
| Example: [a-zA-Z0-9]{9} Shorter way to match hex color values |
|
| {n,} | At least n instances |
| {,n} | Up to n instances |
| {n1,n1} | At least n1 instances, no more than n2 instances |
By default, the metacharacters + and * are greedy as opposed to lazy. When they make a greedy match, they match a big as match as it can instead of stopping at the first match. Consider a Regex that matches links on a web page.
preg_match_all( "@<a[.\s]*/a>)@i", $file, $links_list);
That will find any pattern that starts with "<a" followed by any number of characters and/or spaces and ends in "/a>". The results are assigned to the array $links_list. That would be OK if there was only one link but if you have more than one link, it will match from the start of the first link to the ends of the last link. Consider the following code block:
<ol type="I">
<li><a href="intro.html">Introduction</a>
<li><a href="req.html">Requirements</a>
<li><a href="syntax.html">Basic Syntax</a><br/>
Opening & closing PHP tags and comment tags
<li><a href="syntax2.php">Basic Syntax pt.2</a><br />
Special charactors and your first function.
<li><a href="variables.php">Variables</a>
<li><a href="operators.php">Operators</a>
</ol>
preg_match_all( "@<a[.\s]*/a>)@i", $file, $links_list);
would only find one match. To match each link individually, use the lazy version:
preg_match_all( "@<a[.\s]*/a>)@i", $file, $links_list);
|
|
|