mail us | mail this page
products | company | support | downloads | isp services | contact us |
|
Regular ExpressionsRegular expressions is the term used for a codified method of searching 'invented' or 'defined' by the American mathematician Stephen Kleene. The following section overviews the format and syntax of 'Regular Expressions' specifically as they are used in Apache which includes:
Humble Pie Time: In our examples we blew this expression ^([M-Z]in), we incorrectly stated that this would negate the tests [M-Z], the '^' only performs this function inside square brackets, here it is outside the square brackets and is an anchor indicating 'start from first character'. The corrected section is here (many thanks to Mirko Stojanovic for pointing it out and apologies to one and all). The syntax (language format) described is compliant with 'extended' regular expressions (EREs) defined in POSIX 1003.2 (Section 2.8). Extended Regular Expressions (EREs) (also POSIX compliant) are now commonly supported by Apache, PHP4, Javascript 1.3, Microsoft's Visual Studio, the GNU family of tools (including grep and sed) as well as many others. Extended Regular Expressions (ERE's) will support 'Basic' Regular Expressions (BRE's essentially a subset or EREs). 'grep' by the way stands for global regular expression print - well,well! and egrep - guess. Some Definitions before we startWe are going to be using the terms 'literal', 'metacharacter', 'target string', 'escape sequence' and 'search string' in this overview. Here is a definition of our terms:
Our Example Target StringsThroughout this tutorial we will use the following as our target strings:STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt) STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586) These are Browser ID Strings and appear as the Apache Environmental variable HTTP_USER_AGENT (for list of these go here. Simple MatchingWe are going to try some simple matching against our example target strings:
Brackets, Ranges and NegationBracket expressions introduce our first 'metacharacters' the square brackets which allow us to define list of things to test for rather than the single characters we have been checking up until now.
NOTE: There are some special range values here that are built-in to most regular expression software (and have to be if it claims POSIX 1003.2 compliance for either BRE or ERE). So lets try this out with our example target strings..
Positioning (or Anchors)We can control where in our target strings the matches are valid. The following is a list of 'metacharacters' that affect the position of the search:
NOTE: You will find many systems, but not all, support special macros e.g. \< match at beginning of word, \> match at end of word, \b match at the begining OR end of word , \B except at the beginning or end of a word. So lets try this lot out with our example target strings..
Iteration 'metacharacters'The following is a set of 'metacharacters' that can control the number of times a character or string is found in our searches:
So lets try them out with our example target strings..
Additional 'metacharacters'The following is a set of additional 'metacharacters' that provide additional power to our searches:
So lets try these out with our example strings.. Browser IdentificationAll we ever wanted to do was find enough about our browsers to decide what code to supply or not for our pop-out menus. We want to know:
Here in their glory are what we used (maybe you can understand them now) Notes:BrowserMatchNoCase Mozilla/4 isJS BrowserMatchNoCase MSIE isIE BrowserMatchNoCase Gecko isW3C BrowserMatchNoCase MSIE.(5\.[5-9])|([6-9]) isW3C
More StuffThis is not a complete tutorial on regular expression for more information go to our links pages under Languages/regex. There are lots of folks who get a real buzz out of making any seach a 'one liner' and they are incredibly helpful at telling you how they did it. Welcome to the wonderful (if arcane) world of Regular expressions. Special Range 'metacharacters'The following is a set of special values that denote certain common ranges. They tend to look very ugly but have the advantage that also take in account the 'locale' i.e. any variant of the local language/coding system (guess what, the whole world cannot not use ASCII 'cos the A stands for American).
These are always used inside square brackets in the form [[:alnum:]] or combined as [[:digit:]a-d] |
Copyright © 1994 - 2001 ZyTrax, Inc.
All rights reserved. Legal and Privacy |
Questions to webmaster@zytrax.com
Last modified: February 11 2002. |