Tell me some thing about mod_rewrite and url rewriting?


Top Questions

mod_rewrite
*************
Simply, mod_rewrite is used for rewriting a URL at the server level, giving the user output for that final page. So, for example, a user may ask for http://www.referads.com/example/blue/, but will really be given http://www.referads.com/example.php?colour=blue by the server. Of course, the user will be none the wiser to this little bit of chicanery.

What do I need to get mod_rewrite working?
<?php phpinfo(); ?>

Load that page up in your web browser, and perform a search for “mod_rewrite”. All being well, you’ll find it in the “Apache loaded modules” section of the page. If it isn’t there, you’ll have to contact your hosting company and politely ask them to add it to the Apache configuration.

Assuming the mod_rewrite module is loaded, then you’re good to go!
A simple mod_rewrite example

So, let’s write a simple mod_rewrite example. This isn’t going to be anything fancy; we’re just going to redirect people who ask for alice.html to the page bob.html instead. First, let’s create the Alice and Bob pages. Below is Alice’s webpage - create a similar one for Bob.

<html>
<head>
<title>Alice's webpage</title>
</head>
<body>
<p>
This is Alice's webpage
</p>
</body>
</html>

Upload both of these to your web server, and check that you can view both of them. Now comes the fun - we’re going to add a couple of lines to your .htaccess file. The .htaccess file is a text file which contains Apache directives. Any directives which you place in it will apply to the directory which the .htaccess file sits in, and any below it. To ours, we’re going to add the following:

RewriteEngine on
RewriteRule ^alice.html$ bob.html

Upload this .htaccess file to the same directory as alice.html and bob.html, and reload Alice’s page. You should see Bob’s page being displayed, but Alice’s URL. If you still see Alice’s page being displayed, then check you’ve followed the instructions correctly (you may have to clear your cache). If things still aren’t working for you, then contact your technical support people and ask them to enable mod_rewrite and the FileInfo override in their httpd.conf file for you
The structure of a RewriteRule

RewriteRule Pattern Substitution [OptionalFlags]

The general structure of a RewriteRule is fairly simple if you already understand regular expressions. This article isn’t intended to be a tutorial about regular expressions though - there are already plenty of those available. RewriteRules are broken up as follows:

RewriteRule

This is just the name of the command.
Pattern

A regular expression which will be applied to the “current” URL. If any RewriteRules have already been performed on the requested URL, then that changed URL will be the current URL.
Substitution

Substitution occurs in the same way as it does in Perl, PHP, etc.

You can include backreferences and server variable names (%{VARNAME}) in the substitution. Backreferences to this RewriteRule should be written as $N, whereas backreferences to the previous RewriteCond should be written as %N.

A special substitution is -. This substitution tells Apache to not perform any substitution. I personally find that this is useful when using the F or G flags (see below), but there are other uses as well.
OptionalFlags

This is the only part of the RewriteRule which isn’t mandatory. Any flags which you use should be surrounded in square brackets, and comma separated. The flags which I find to be most useful are:

F - Forbidden. The user will receive a 403 error.
L - Last Rule. No more rules will be proccessed if this one was successful.
R[=code] - Redirect. The user’s web browser will be visibly redirected to the substituted URL. If you use this flag, you must prefix the substitution with http://www.somesite.com/, thus making it into a true URL. If no code is given, then a HTTP reponse of 302 (temporarily moved) is sent.

A full list of flags is given in the Apache mod_rewrite manual.

A slightly more complicated mod_rewrite example

Let’s try a slightly more meaty example now. Suppose you have a web page which takes a parameter. This parameter tells the page how to be displayed, and what content to pull into it. Humans don’t tend to like remembering the additional syntax of query strings for URLs, and neither do search engines. Both sets of people seem to much prefer a straight URL, with no extra bits tacked onto the end.

In our example, you’ve created a main index page with takes a page parameter. So, a link like index.php?page=software would take you to a software page, while a link to index.php?page=interests would take you to an interests page. What we’ll do with mod_rewrite is to silently redirect users from page/software/ to index.php?page=software etc.

The following is what needs to go into your .htaccess file to accomplish that:

RewriteEngine on
RewriteRule ^page/([^/\.]+)/?$ index.php?page=$1 [L]

Let’s walk through that RewriteRule, and work out exactly what’s going on:

^page/

Sees whether the requested page starts with page/. If it doesn’t, this rule will be ignored.
([^/.]+)

Here, the enclosing brackets signify that anything that is matched will be remembered by the RewriteRule. Inside the brackets, it says “I’d like one or more characters that aren’t a forward slash or a period, please”. Whatever is found here will be captured and remembered.
/?$

Makes sure that the only thing that is found after what was just matched is a possible forward slash, and nothing else. If anything else is found, then this RewriteRule will be ignored.
index.php?page=$1

The actual page which will be loaded by Apache. $1 is magically replaced with the text which was captured previously.
[L]

Tells Apache to not process any more RewriteRules if this one was successful.

Let’s write a quick page to test that this is working. The following test script will simply echo the name of the page you asked for to the screen, so that you can check that the RewriteRule is working.

<html>
<head>
<title>Second mod_rewrite example</title>
</head>
<body>
<p>
The requested page was:
<?php echo $_GET['page']; ?>
</p>
</body>
</html>

Again, upload both the index.php page, and the .htaccess file to the same directory. Then, test it! If you put the page in http://www.somesite.com/mime_test/, then try requesting http://www.somesite.com/mime_test/page/software. The URL in your browser window will show the name of the page which you requested, but the content of the page will be created by the index.php script! This technique can obviously be extended to pass multiple query strings to a page - all you’re limited by is your imagination.
Conditional Statements and mod_rewrite

But what happens when you start getting people hotlinking to your images (or other files)? Hot linking is the act of including an image, media file, etc from someone else’s server in one of your own pages as if it were your own. Obviously, as a webmaster, there are plenty of times when you don’t want people doing that. You’ll almost certainly have seen examples where someone has linked to one image on a website, only for a completely different, “nasty” one to be shown instead. So, how is this done?

It’s pretty simple really. All it takes are a couple of RewriteCond statements in your .htaccess file.

RewriteCond statements are as they sound - conditional statements for RewriteRules. The basic format for a RewriteCond is RewriteCond test_string cond_pattern. For our purpose, we will set the test_string to be the HTTP_REFERER. If the test string is neither empty nor our own server, then we will serve an alternative (low bandwidth) image, which tells the person who is hotlinking off for stealing our bandwidth.

Here’s how we do that:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?somesite.com/.*$ [NC]
RewriteRule \.(gif|jpg|png)$ http://www.somesite.com/nasty.gif [R,L]

Here, the RewriteRule will only be performed if all the preceeding RewriteConds are fulfilled. In the second RewriteCond, [NC] simply means “No Case”, so it doesn’t matter whether the domain name was written in upper case, lower case or a mixture of the two. So, any requests for gif, jpg or png files from referers other than somesite.com will result in your “nasty” image being shown instead.

The [R,L] in the RewriteRule simply means “Redirect, Last”. So, the RewriteRule will visibly redirect output to “nasty.gif” and no more RewriteRules will be performed on this URL.

If you simply don’t want the hot linkers to see any image at all when they hot link to your images, then simply change the final line to RewriteRule \.(gif|jpg|png)$ - [F]. The - means “don’t rewrite the requested URL”, and the [F] means “Forbidden”. So, the hot linker will get a “403 Forbidden message”, and you don’t end up wasting your bandwidth.

URL Rewriting
***********
Readable URLs are nice. A well designed website will have a logical file system layout, with smart folder and file names, and as many implementation details left out as possible. In the best designed sites, readers can guess at filenames with a high level of success.
However, there are some cases when the best possible information design can’t stop your site’s URLs from being nigh-on impossible to use. For instance, you may be using a Content Management System that serves out URLs that look something like
http://www.site.com/viewcatalog.asp?category=hats&prodID=53
This is a horrible URL, but it and its brethren are becoming increasingly prevalent in these days of dynamically generated pages. There are a number of problems with an URL of this kind.
It exposes the underlying technology of the website (in this case ASP). This can give potential hackers clues as to what type of data they should send along with the query string to perform a ‘front-door’ attack on the site. Information like this shouldn’t be given away if you can help it.
Even if you’re not overly concerned with the security of your site, the technology you’re using is at best irrelevant — and at worst a source of confusion — to your readers, so it should be hidden from them if possible.
Also, if at some point in the future you decide to change the language that your site is based on (to » PHP, for instance); all your old URLs will stop working. This is a pretty serious problem, as anyone who has tackled a full-on site rewrite will attest.
The URL is littered with awkward punctuation, like the question mark and ampersand. Those & characters, in particular, are problematic because if another webmaster links to this page using that URL, the unescaped ampersands will mess up their XHTML conformance.
Some search engines won’t index pages which they think are generated dynamically. They’ll see that question mark in the URL and just turn their asses around.
Luckily, using rewriting, we can clean up this URL to something far more manageable. For example, we could map it to
http://www.site.com/catalog/hats/53/
Much better. This URL is more logical, readable and memorable, and will be picked up by all search engines. The faux-directories are short and descriptive. Importantly, it looks more permanent.
To use mod_rewrite, you supply it with the link text you want the server to match, and the real URLs that these URLs will be redirected to. The URLs to be matched can be straight file addresses, which will match one file, or they can be regular expressions, which will match many files.
Basic Rewriting
Some servers will not have » mod_rewrite enabled by default. As long as the » module is present in the installation, you can enable it simply by starting a .htaccess file with the command
RewriteEngine on
Put this .htaccess file in your root so that rewriting is enabled throughout your site. You only need to write this line once per .htaccess file.
Basic Redirects
We’ll start off with a straight redirect; as if you had moved a file to a new location and want all links to the old location to be forwarded to the new location. Though you shouldn’t really ever » move a file once it has been placed on the web; at least when you simply have to, you can do your best to stop any old links from breaking.
RewriteEngine on
RewriteRule ^old\.html$ new.html
Though this is the simplest example possible, it may throw a few people off. The structure of the ‘old’ URL is the only difficult part in this RewriteRule. There are three special characters in there.
* The caret, ^, signifies the start of an URL, under the current directory. This directory is whatever directory the .htaccess file is in. You’ll start almost all matches with a caret.
* The dollar sign, $, signifies the end of the string to be matched. You should add this in to stop your rules matching the first part of longer URLs.
* The period or dot before the file extension is a special character in regular expressions, and would mean something special if we didn’t escape it with the backslash, which tells Apache to treat it as a normal character.
So, this rule will make your server transparently redirect from old.html to the new.html page. Your reader will have no idea that it happened, and it’s pretty much instantaneous.
Forcing New Requests
Sometimes you do want your readers to know a redirect has occurred, and can do this by forcing a new HTTP request for the new page. This will make the browser load up the new page as if it was the page originally requested, and the location bar will change to show the URL of the new page. All you need to do is turn on the [R] flag, by appending it to the rule:
RewriteRule ^old\.html$ new.html [R]
Using Regular Expressions
Now we get on to the really useful stuff. The power of mod_rewrite comes at the expense of complexity. If this is your first encounter with regular expressions, you may find them to be a tough nut to crack, but the options they afford you are well worth the slog. I’ll be providing plenty of examples to guide you through the basics here.
Using regular expressions you can have your rules matching a set of URLs at a time, and mass-redirect them to their actual pages. Take this rule;
RewriteRule ^products/([0-9][0-9])/$ productinfo.php?prodID=$1
This will match any URLs that start with ‘products/’, followed by any two digits, followed by a forward slash. For example, this rule will match an URL like products/12/ or products/99/, and redirect it to the PHP page.
The parts in square brackets are called ranges. In this case we’re allowing anything in the range 0-9, which is any digit. Other ranges would be [A-Z], which is any uppercase letter; [a-z], any lowercase letter; and [A-Za-z], any letter in either case.
We have encased the regular expression part of the URL in parentheses, because we want to store whatever value was found here for later use. In this case we’re sending this value to a PHP page as an argument. Once we have a value in parentheses we can use it through what’s called a back-reference. Each of the parts you’ve placed in parentheses are given an index, starting with one. So, the first back-reference is $1, the third is $3 etc.
Thus, once the redirect is done, the page loaded in the readers’ browser will be something like productinfo.php?prodID=12 or something similar. Of course, we’re keeping this true URL secret from the reader, because it likely ain’t the prettiest thing they’ll see all day.
Adding Trailing Slashes
If your site visitor had entered something like products/12, the rule above won’t do a redirect, as the slash at the end is missing. To promote good URL writing, we’ll take care of this by doing a direct redirect to the same URL with the slash appended.
RewriteRule ^products/([0-9][0-9])$ products/$1/ [R]

Multiple redirects in the same .htaccess file can be applied in sequence, which is what we’re doing here. This rule is added before the one we did above, like so:

RewriteRule ^products/([0-9][0-9])$ products/$1/ [R]
RewriteRule ^products/([0-9][0-9])/$ productinfo.php?prodID=$1

Thus, if the user types in the URL products/12, our first rule kicks in, rewriting the URL to include the trailing slash, and doing a new request for products/12/ so the user can see that we likes our trailing slashes around here. Then the second rule has something to match, and transparently redirects this URL to productinfo.php?prodID=12. Slick.
Match Modifiers

You can expand your regular expression patterns by adding some modifier characters, which allow you to match URLs with an indefinite number of characters. In our examples above, we were only allowing two numbers after products. This isn’t the most expandable solution, as if the shop ever grew beyond these initial confines of 99 products and created the URL productinfo.php?prodID=100, our rules would cease to match this URL.

So, instead of hard-coding a set number of digits to look for, we’ll work in some room to grow by allowing any number of characters to be entered. The rule below does just that:

RewriteRule ^products/([0-9]+)$ products/$1/ [R]

Note the plus sign (+) that has snuck in there. This modifier changes whatever comes directly before it, by saying ‘one or more of the preceding character or range.’ In this case it means that the rule will match any URL that starts with products/ and ends with at least one digit. So this’ll match both products/1 and products/1000.

Other match modifiers that can be used in the same way are the asterisk, *, which means ‘zero or more of the preceding character or range’, and the question mark, ?, which means ‘zero or only one of the preceding character or range.’
Adding Guessable URLs

Using these simple commands you can set up a slew of ‘shortcut URLs’ that you think visitors will likely try to enter to get to pages they know exist on your site. For example, I’d imagine a lot of visitors try jumping straight into our stylesheets section by typing the URL http://www.yourhtmlsource.com/css/. We can catch these cases, and hopefully alert the reader to the correct address by updating their location bar once the redirect is done with these lines:

RewriteRule ^css(/)?$ /stylesheets/ [R]

The simple regular expression in this rule allows it to match the css URL with or without a trailing slash. The question mark means ‘zero or one of the preceding character or range’ — in other words either yourhtmlsource.com/css or yourhtmlsource.com/css/ will both be taken care of by this one rule.

This approach means less confusing 404 errors for your readers, and a site that seems to run a whole lot smoother all ’round.

I appreciate all hard work of webmaster of www.w3answers.com

I can truly say that I have never read so much useful information about Tell me some thing about mod_rewrite and url rewriting? | w3Answers.com. I want to express my gratitude to the webmaster of www.w3answers.com.

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
5 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Check It !

Tweet It

W3 Updates

Stay informed on our latest news!

Syndicate content