Simple Matching
::Example::
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)
Search for
m
STRING1 match Finds the m in compatible
STRING2 no match There is no lower case m in this string. Searches are case sensitive unless you take special action.
a/4
STRING1 match Found in Mozilla/4.0 - any combination of characters can be used for the match
STRING2 match Found in same place as in STRING1
5 [
STRING1 no match The search is looking for a pattern of '5 [' and this does NOT exist in STRING1. Spaces are valid in searches.
STRING2 match Found in Mozilla/4.75 [en]
in
STRING1 match found in Windows
STRING2 match Found in Linux
le
STRING1 match found in compatible
STRING2 no match There is an l and an e in this string but they are not adjacent (or contiguous).
[ ]
Match anything inside the square brackets for ONE character position once and only once, for example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.
-
The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].
You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).
NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.
^
The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.
in[du]
STRING1 match finds ind in Windows
STRING2 match finds inu in Linux
x[0-9A-Z]
STRING1 no match Again the tests are case sensitive to find the xt in DigExt we would need to use [0-9a-z] or [0-9A-Zt]. We can also use this format for testing upper and lower case e.g. [Ff] will check for lower and upper case F.
STRING2 match Finds x2 in Linux2
[^A-M]in
STRING1 match Finds Win in Windows
STRING2 no match We have excluded the range A to M in our search so Linux is not found but linux (if it were present) would be found.
?
The ? (question mark) matches the preceding character 0 or 1 times only, for example, colou?r will find both color (0 times) and colour (1 time).
*
The * (asterisk or star) matches the preceding character 0 or more times, for example, tre* will find tree (2 times) and tread (1 time) and trough (0 times).
+
The + (plus) matches the previous character 1 or more times, for example, tre+ will find tree (2 times) and tread (1 time) but not trough (0 times).
{n}
Matches the preceding character, or character range, n times exactly, for example, to find a local phone number we could use [0-9]{3}-[0-9]{4} which would find any number of the form 123-4567.
No comments:
Post a Comment