Handle Bad Page Requests in a Web Application

A common error in web applications is failed navigation. In some cases, the user has mistyped a URL; in other cases, the internal navigation of a web site is flawed. In (hopefully) rare cases, users are attempting some shenanigans.

Documenting Bad Requests

Whatever the cause of a bad request, the Istarel Workshop Application Framework (IWAF) traps those failures and invokes ApplicationDelegate->handleBadRequest(). Since I cannot know ahead of time what bad requests might be out there, I begin by logging them and then periodically reviewing the log file.

Partial Listing: /rsrc/ApplicationDelegate.php

<?php

function handleBadRequest($page_request)
{
    if (IN_PRODUCTION)
    {
        $fm = IWFileManager::defaultFileManager();
        
        $log_file = str_replace('rsrc/', '', APPL_RSRC_DIR) . 'log/bad-requests.log';
        $message  = date('Y-m-d h:i:s') . ' -- ' . $page_request . "\n";
        
        $fm->appendFile($log_file, $message);
        
        header('Location: ' . APPL_ROOT_DIR . $this->startingPoint());
        exit();
    }
}

?>

All this does is let me document any problems. For example, when I started logging the bad requests for Big Nerd Ranch, there were (generally) three different kinds of page request failures.

First, people were trying to reach the Big Nerd Ranch blog and forums by using URLs like http://www.bignerdranch.com/forums. Not an unreasonable guess, but incorrect: The forums and blog are found via subdomains of bignerdranch.com.

Second, people were trying URLs to classes, but not quite using the correct offering names. For example, the popular Cocoa class at Big Nerd Ranch is reached via http://www.bignerdranch.com/classes/cocoa_i, but many people were trying some variation of http://www.bignerdranch.com/classes/cocoa. Again, not a bad try, but completely inscrutable to the web application.

Third, there are obnoxious malcontents who think they can choose a clever URL and gain illegal access to code or data.

Apache's mod_rewrite

For URL with any kind of pattern to them, Apache's mod_rewrite is an excellent way to redirect a bad request before it becomes a bad request. My base mod_rewrite implementation simply ensures that the application framework handles all script-based requests.

RewriteEngine On
RewriteRule !\.(js|ico|gif|jpg|png|css|pdf|xml)$ index.php

The forum and blog requests very much represent a simple pattern. Namely, if the request begins with some variation of "forum" or "blog", then I should redirect the use to the appropriate subdomain. To be a bit more versatile, I will also ignore the case of any request (hence the bracketed NC). Whenever you intend to redirect a user, you want append [L] to the rewrite rule, which tells Apache: Do not apply any later rules!

RewriteEngine On
RewriteRule ^(phpbb|forum|board) http://forums.bignerdranch.com [NC,L]
RewriteRule ^blog http://weblog.bignerdranch.com [NC,L]
RewriteRule !\.(js|ico|gif|jpg|png|css|pdf|xml)$ index.php

Specific Redirects

One module I always build into my applications is a redirect facility. This enables me to have modules reachable via SEO (and user) friendly URLs. So, instead of http://www.bignerdranch.com/OfferingView?id=1 (other URLs might have far more complex query string parameters), the Cocoa page at Big Nerd Ranch is reached via http://www.bignerdranch.com/classes/cocoa_i.

I could use mod_rewrite here as well, putting a series of specific rewrite rules in place to handle mistaken URLs, where [PT] tells Apache to pass the result through to the next handler.

RewriteEngine On
RewriteRule cocoa_i$ classes/cocoa_i [NC,PT]
RewriteRule !\.(js|ico|gif|jpg|png|css|pdf|xml)$ index.php

Instead, however, for these specific kinds of redirects, I create a redirect entry in the administrator modules for the application.

Forbidden Requests

Finally, I like to utilize the forbiddenViews() method of the ApplicationDelegate, which returns an array of page requests that the framework should cause to silently die and simply return the user to the designated starting point for the application (typically the home page).

Partial Listing: /rsrc/ApplicationDelegate.php

<?php

function forbiddenViews()
{
    return array('track.php');
}

?>

Though not shown here, a page request can also be modified by the ApplicationDelegate, which allows for more sophisticated handling of inappropriate requests. That modifyPageRequest() method is also how I make use of the data from the redirect module to invoke the appropriate application module.

Deploy an Application to the Remote Server

With Git prepared on both the local workstation and on the remote server, I can now deploy my application.

Defining a Remote Repository

In order to deploy an application from my development workstation to istarelworkshop.com, I need to configure the application repository on my workstation such that is can communicate with the repository on the production server.

Here is the current configuration for the Istarel Workshop application:

markf$ cd ~/Sites/iw
markf$ cat .git/config
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true

Now I configure this repository with user and path information on the remote Git repository. (Note that the 1158 in the remote declaration is the port, for cases where ssh is not listening to its standard port 22.)

markf$ git remote add istarelworkshop 
       ssh://git@www.istarelworkshop.com:1158/var/git/iw
markf$ cat .git/config

If I look at the configuration for the local repository now, I see the definition for "istarelworkshop".

markf$ cat .git/config
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true
[remote "istarelworkshop"]
    url = ssh://git@www.istarelworkshop.com:1158/var/git/iw
    fetch = +refs/heads/*:refs/remotes/istarelworkshop/*

Cloning a Bare Repository

With the description of the remote repository done, I can now deploy the application.

markf$ git push istarelworkshop master

I am not done yet, though. The repositorities I set up on istarelworkshop.com are bare repositories; that is, they contain all the branches and history but are not a working repository. I need to clone that bare repository to the appropriate location on istarelworkshop.com. In an earlier article, I showed how to set up Apache2 virtual hosts. The first such host has its base directory at /var/www/istarelworkshop.com. I need to take the bare repository and clone it to that location.

markf$ ssh markf@www.istarelworkshop.com
iwuser$ cd /var/www
iwuser$ sudo git clone /var/git/iw istarelworkshop.com
iwuser$ sudo chown -R www-data:www-data istarelworkshop.com

Configure the Remote Application

Back when I set up the local repository, I made sure that certain key configuration files were not part of the repository. This is important because every server is different, and the key paths and identities used on my local workstation are quite different from the Ubuntu VPS I deployed. So, I need to create those configuration files on istarelworkshop.com.

iwuser$ sudo su - www-data
$ bash
www-data$ cd /var/www/istarelworkshop.com
www-data$ mkdir conf && cd conf
www-data$ vi ApplicationConstants.php

Listing: /var/www/istarelworkshop.com/conf/ApplicationConstants.php

<?php

# Fundamental Application Mode
define('IN_PRODUCTION', 1);             # 0 = Development, 1 = Production

# Application Directories
define('APPL_ROOT_DIR', '/');
define('APPL_RSRC_DIR', '/var/www/istarelworkshop.com/rsrc/');

# Istarel Workshop Frameworks Directory
if (! defined('FRAMEWORK_DIR'))
   define('FRAMEWORK_DIR', '/var/www/istarelworkshop.com/fw/');

# Supplemental Directories Required by Application Frameworks Classes
define('APPL_IMG_DIR', APPL_ROOT_DIR . 'img/');
define('APPL_LIB_DIR', APPL_ROOT_DIR . 'lib/');

# Define the Application Database attributes
define('DB_TYPE',       'pgsql');
define('DB_HOST',       'localhost');
define('DB_NAME',       'iw');
define('DB_USERNAME',   'iwdb');
define('DB_PASSWORD',   'secret');

?>

The application framework uses Apache's mod_rewrite system to power the Front Controller. I need to create an .htaccess file at the root of the repository to ensure that proper routing happens.

Listing: /var/www/istarelworkshop.com/.htaccess

RewriteEngine On
RewriteRule !\.(js|ico|gif|jpg|png|css)$ index.php

Configure Apache with Virtual Hosts

The minimal Ubuntu 8.0.4 package from A2 Hosting comes with Apache 2 preinstalled, but it does not come with mod_rewrite enabled (which is necessary for all applications built with the Istarel Workshop Application Framework).

iwuser$ sudo a2enmod rewrite
iwuser$ sudo /etc/init.d/apache2 force-reload

Virtual Hosts

I intend to host several domains on this Virtual Private Server, and Apache and Ubuntu make it very straightforward to handle mutiple domains through a mechanism called Virtual Hosts. Essentially, you use the IP Address of the server as the "host" (using the NameVirtualHost directive) and then provide a configuration file for each hosted domain which establishes its identity in the context of the main "host".

Partial Listing: /etc/apache2/apache.conf

NameVirtualHost 74.126.25.201:80
Include /etc/apache2/sites-enabled/

I want istarelworkshop.com to be the foremost hosted domain. I need to create its configuration file in /etc/apache2/sites-available.

Listing: /etc/apache2/sites-available/istarelworkshop.com

<VirtualHost 74.126.25.201:80>
   ServerName istarelworkshop.com
   ServerAlias www.istarelworkshop.com
   ServerAdmin admin@istarelworkshop.com
   
   DocumentRoot /var/www/istarelworkshop.com
   <Directory /var/www/istarelworkshop.com>
      Options Indexes FollowSymLinks MultiViews
      AllowOverride All
      Order allow,deny
      allow from all
   </Directory>
   
   ErrorLog /var/log/apache2/istarelworkshop-error.log
   LogLevel warn
   CustomLog /var/log/apache2/istarelworkshop-access.log combined
   ServerSignature On
</VirtualHost>

The AllowOverride is set to all so that Apache will look for (and respect) .htaccess files in the /var/www/istarelworkshop.com directory. That file is where the mode_rewrite rules are given.

To enable this virtual site, I need to create a symbolic link to the sites-avaialble configuration in its sibling sites-enables directory (which is what is actually referenced in the main Apache configuration file). Once Apache is restarted, any visitor navigating to www.istarelworkshop.com will be served pages from /var/www/istarelworkshop.com.

iwuser$ cd /etc/apache2/sites-enabled
iwuser$ sudo ln -s ../sites-available/istarelworkshop.com 001-istarelworkshop.com
iwuser$ sudo /etc/init.d/apache2 restart