Mod Perl Icon Mod Perl Icon Real World Scenarios Implementation


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Standalone mod_perl Enabled Apache Server

[TOC]


Installation in 10 lines

The Installation is very very simple (example of installation on Linux OS):

  % cd /usr/src
  % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
  % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
  % tar zvxf apache_x.xx.tar.gz
  % tar zvxf mod_perl-x.xx.tar.gz
  % cd mod_perl-x.xx
  % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
    DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
  % make && make test && make install
  % cd ../apache_x.x.x
  % make install

That's all!

Notes: Replace x.x.x with the real version numbers of mod_perl and apache. gnu tar uncompresses as well (with z flag).

[TOC]


Installation in 10 paragraphs

First download the sources of both packages, e.g. you can use lwp-download utility to do it. lwp-download is a part of the LWP (or libwww) package, you will need to have it installed in order for mod_perl's make test to pass. Once you install this package unless it's already installed, lwp-download will be available for you as well.

  % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
  % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz

Extract both sources. Usually I open all the sources in /usr/src/, your mileage may vary. So move the sources and chdir to the directory, you want to put the sources in. Gnu tar utility knows to uncompress too with z flag, if you have a non-gnu tar utility, it will be incapable to decompress, so you would do it in two steps: first uncompressing the packages with gzip -d apache_x.xx.tar.gz and gzip -d mod_perl-x.xx.tar.gz, second un-tarring them with tar xvf apache_x.xx.tar and tar xvf mod_perl-x.xx.tar.

  % cd /usr/src
  % tar zvxf apache_x.xx.tar.gz
  % tar zvxf mod_perl-x.xx.tar.gz

chdir to the mod_perl source directory:

  % cd mod_perl-x.xx

Now build the make file, for a basic work and first time installation the parameters in the example below are the only ones you would need. APACHE_SRC tells where the apache src directory is. If you have followed my suggestion and have extracted the both sources under the same directory (/usr/src), do:

  % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
    DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1

There are many additional parameters. You can find some of them in the configuration dedicated and other sections. While running perl Makefile.PL ... the process will check for prerequisites and tell you if something is missing, If you are missing some of the perl packages or other software -- you will have to install these before you proceed.

Now we make the project (by building the mod_perl extension and calling make in apache source directory to build a httpd), test it (by running various tests) and install the mod_perl modules.

  % make && make test && make install

Note that if make fails, neither make test nor make install will be not executed. If make test fails, make install will be not executed.

Now change to apache source directory and run make install to install apache's headers, default configuration files, to build apache directory tree and to put the httpd there.

  % cd ../apache_x.x.x
  % make install

When you execute the above command, apache installation process will tell you how to start a freshly built webserver (the path of the apachectl, more about it later) and where the configuration files are. Remember (or even better write down) both, since you will need this information very soon. On my machine the two important paths are:

  /usr/local/apache/bin/apachectl
  /usr/local/apache/conf/httpd.conf

Now the build and the installation processes are completed. Just configure httpd.conf and start the webserver.

[TOC]


Configuration Process

A basic configuration is a simple one. First configure the apache as you always do (set Port, User, Group, correct ErrorLog and other file paths and etc), start the server and make sure it works. One of the ways to start and stop the server is to use apachectl utility:

  % /usr/local/apache/bin/apachectl start
  % /usr/local/apache/bin/apachectl stop

Shut the server down, open the httpd.conf in your favorite editor and scroll to the end of the file, where we will add the mod_perl configuration directives (of course you can place them anywhere in the file).

Add the following configuration directives:

  Alias /perl/ /home/httpd/perl/

Assuming that you put all your scripts, that should be executed by mod_perl enabled server, under /home/httpd/perl/ directory.

  PerlModule Apache::Registry
  <Location /perl>
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    PerlSendHeader On
    allow from all
  </Location>

Now put a test script into /home/httpd/perl/ directory:

  test.pl
  -------
  #!/usr/bin/perl -w
  use strict;
  print "Content-type: text/html\r\n\r\n";
  print "It worked!!!\n";
  -------

Make it executable and readable by server, if your server is running as user nobody (hint: look for User directive in httpd.conf file), do the following:

  % chown nobody /home/httpd/perl/test.pl
  % chmod u+rx   /home/httpd/perl/test.pl

Test that the script is running from the command line, by executing it:

  % /home/httpd/perl/test.pl

You should see:

  Content-type: text/html
  
  It worked!!!

Now it is a time to test our mod_perl server, assuming that your config file includes Port 80, go to your favorite Netscape browser and fetch the following URL (after you have started the server):

  http://localhost/perl/test.pl

Make sure that you have a loop-back device configured, if not -- use the real server name for this test, for example:

  http://www.nowhere.com/perl/test.pl

You should see:

  It worked!!!

If something went wrong, go through the installation process again, and make sure you didn't make a mistake. If that doesn't help, read the INSTALL pod document (perlpod INSTALL) in the mod_perl distribution directory.

Now copy some of your perl/CGI scripts into a /home/httpd/perl/ directory and see them working much much faster, from the newly configured base URL (/perl/). Some of your scripts will not work out of box and will demand some minor tweaking or major rewrite to make them work properly with mod_perl enabled server. Chances are that if you are not practicing a sloppy programming techniques -- the scripts will work without any modifications at all.

The above setup is very basic, it will help you to have a mod_perl enabled server running and to get a good feeling from watching your previously slow CGIs now flying.

As with perl you can start benefit from mod_perl from the very first moment you try it. When you become more familiar with mod_perl you will want to start writing apache handlers and deploy more of the mod_perl power.

[TOC]


One Plain and One mod_perl enabled Apache Servers

Since we are going to run two apache servers we will need two different sets of configuration, log and other files. We need a special directory layout. While some of the directories can be shared between the two servers (assuming that both are built from the same source distribution), others should be separated. From now on I will refer to these two servers as httpd_docs (vanilla Apache) and httpd_perl (Apache/mod_perl).

For this illustration, we will use /usr/local as our root directory. The Apache installation directories will be stored under this root (/usr/local/bin, /usr/local/etc and etc...)

First let's prepare the sources. We will assume that all the sources go into /usr/src dir. It is better when you use two separate copies of apache sources. Since you probably will want to tune each apache version at separate and to do some modifications and recompilations as the time goes. Having two independent source trees will prove helpful, unless you use DSO, which is covered later in this section.

Make two subdirectories:

  % mkdir /usr/src/httpd_docs
  % mkdir /usr/src/httpd_perl

Put the Apache sources into a /usr/src/httpd_docs directory:

  % cd /usr/src/httpd_docs
  % gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -

If you have a gnu tar:

  % tar xvzf /tmp/apache_x.x.x.tar.gz

Replace /tmp directory with a path to a downloaded file and x.x.x with the version of the server you have.

  % cd /usr/src/httpd_docs
  
  % ls -l
  drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 apache_x.x.x/

Now we will prepare the httpd_perl server sources:

  % cd /usr/src/httpd_perl
  % gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -
  % gzip -dc /tmp/modperl-x.xx.tar.gz | tar xvf -
  
  % ls -l
  drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 apache_x.x.x/
  drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 modperl-x.xx/

Time to decide on the desired directory structure layout (where the apache files go):

  ROOT = /usr/local

The two servers can share the following directories (so we will not duplicate data):

  /usr/local/bin/
  /usr/local/lib
  /usr/local/include/
  /usr/local/man/
  /usr/local/share/

Important: we assume that both servers are built from the same Apache source version.

Servers store their specific files either in httpd_docs or httpd_perl sub-directories:

  /usr/local/etc/httpd_docs/
                 httpd_perl/
  
  /usr/local/sbin/httpd_docs/
                  httpd_perl/
  
  /usr/local/var/httpd_docs/logs/
                            proxy/
                            run/
                 httpd_perl/logs/
                            proxy/
                            run/

After completion of the compilation and the installation of the both servers, you will need to configure them. To make things clear before we proceed into details, you should configure the /usr/local/etc/httpd_docs/httpd.conf as a plain apache and Port directive to be 80 for example. And /usr/local/etc/httpd_perl/httpd.conf to configure for mod_perl server and of course whose Port should be different from the one httpd_docs server listens to (e.g. 8080). The port numbers issue will be discussed later.

The next step is to configure and compile the sources: Below are the procedures to compile both servers taking into account the directory layout I have just suggested to use.

[TOC]


Configuration and Compilation of the Sources.

Let's proceed with installation. I will use x.x.x instead of real version numbers so this document will never become obsolete :).

[TOC]


Building the httpd_docs Server

Sources Configuration:

  % cd /usr/src/httpd_docs/apache_x.x.x
  % make clean
  % env CC=gcc \
  ./configure --prefix=/usr/local \
    --sbindir=/usr/local/sbin/httpd_docs \
    --sysconfdir=/usr/local/etc/httpd_docs \
    --localstatedir=/usr/local/var/httpd_docs \
    --runtimedir=/usr/local/var/httpd_docs/run \
    --logfiledir=/usr/local/var/httpd_docs/logs \
    --proxycachedir=/usr/local/var/httpd_docs/proxy

If you need some other modules, like mod_rewrite and mod_include (SSI), add them here as well:

    --enable-module=include --enable-module=rewrite

Note: gcc -- compiles httpd by 100K+ smaller then cc on AIX OS. Remove the line env CC=gcc if you want to use the default compiler. If you want to use it and you are a (ba)?sh user you will not need the env function, t?csh users will have to keep it in.

Note: add --layout to see the resulting directories' layout without actually running the configuration process.

Sources Compilation:

  % make
  % make install

Rename httpd to http_docs

  % mv /usr/local/sbin/httpd_docs/httpd \
  /usr/local/sbin/httpd_docs/httpd_docs

Now update an apachectl utility to point to the renamed httpd via your favorite text editor or by using perl:

  % perl -p -i -e 's|httpd_docs/httpd|httpd_docs/httpd_docs|' \
  /usr/local/sbin/httpd_docs/apachectl

[TOC]


Building the httpd_perl (mod_perl enabled) Server

Before you start to configure the mod_perl sources, you should be aware that there are a few Perl modules that have to be installed before building mod_perl. You will be alerted if any required modules are missing when you run the perl Makefile.PL command line below. If you discover that some are missing, pick them from your nearest CPAN repository (if you do not know what is it, make a visit to http://www.perl.com/CPAN ) or run the CPAN interactive shell via the command line perl -MCPAN -e shell.

Make sure the sources are clean:

  % cd /usr/src/httpd_perl/apache_x.x.x
  % make clean
  % cd /usr/src/httpd_perl/mod_perl-x.xx
  % make clean

It is important to make clean since some of the versions are not binary compatible (e.g apache 1.3.3 vs 1.3.4) so any ``third-party'' C modules need to be re-compiled against the latest header files.

Here I did not find a way to compile with gcc (my perl was compiled with cc so we have to compile with the same compiler!!!

  % cd /usr/src/httpd_perl/mod_perl-x.xx

  % /usr/local/bin/perl Makefile.PL \
  APACHE_PREFIX=/usr/local/ \
  APACHE_SRC=../apache_x.x.x/src \
  DO_HTTPD=1 \
  USE_APACI=1 \
  PERL_MARK_WHERE=1 \
  PERL_STACKED_HANDLERS=1 \
  ALL_HOOKS=1 \
  APACI_ARGS=--sbindir=/usr/local/sbin/httpd_perl, \
         --sysconfdir=/usr/local/etc/httpd_perl, \
         --localstatedir=/usr/local/var/httpd_perl, \
         --runtimedir=/usr/local/var/httpd_perl/run, \
         --logfiledir=/usr/local/var/httpd_perl/logs, \
         --proxycachedir=/usr/local/var/httpd_perl/proxy

Notice that all APACI_ARGS (above) must be passed as one long line if you work with t?csh!!! However it works correctly the way it shown above with (ba)?sh (by breaking the long lines with '\'). If you work with t?csh it does not work, since t?csh passes APACI_ARGS arguments to ./configure by keeping the new lines untouched, but stripping the original '\', thus breaking the configuration process.

As with httpd_docs you might need other modules like mod_rewrite, so add them here:

         --enable-module=rewrite

Note: PERL_STACKED_HANDLERS=1 is needed for Apache::DBI

Now, build, test and install the httpd_perl.

  % make && make test && make install

Note: apache puts a stripped version of httpd at /usr/local/sbin/httpd_perl/httpd. The original version which includes debugging symbols (if you need to run a debugger on this executable) is located at /usr/src/httpd_perl/apache_x.x.x/src/httpd.

Note: You may have noticed that we did not run make install in the apache's source directory. When USE_APACI is enabled, APACHE_PREFIX will specify the --prefix option for apache's configure utility, specifying the installation path for apache. When this option is used, mod_perl's make install will also make install on the apache side, installing the httpd binary, support tools, along with the configuration, log and document trees.

If make test fails, look into t/logs and see what is in there. Also see make test fails.

While doing perl Makefile.PL ... mod_perl might complain by warning you about missing libgdbm. Users reported that it is actually crucial, and you must have it in order to successfully complete the mod_perl building process.

Now rename the httpd to httpd_perl:

  % mv /usr/local/sbin/httpd_perl/httpd \
  /usr/local/sbin/httpd_perl/httpd_perl

Update the apachectl utility to point to renamed httpd name:

  % perl -p -i -e 's|httpd_perl/httpd|httpd_perl/httpd_perl|' \
  /usr/local/sbin/httpd_perl/apachectl

[TOC]


Configuration of the servers

Now when we have completed the building process, the last stage before running the servers, is to configure them.

[TOC]


Basic httpd_docs Server's Configuration

Configuring of httpd_docs server is a very easy task. Open /usr/local/etc/httpd_docs/httpd.conf into your favorite editor (starting from version 1.3.4 of Apache - there is only one file to edit). And configure it as you always do. Make sure you configure the log files and other paths according to the directory layout we decided to use.

Start the server with:

  /usr/local/sbin/httpd_docs/apachectl start

[TOC]


Basic httpd_perl Server's Configuration

Here we will make a basic configuration of the httpd_perl server. We edit the /usr/local/etc/httpd_perl/httpd.conf file. As with httpd_docs server configuration, make sure that ErrorLog and other file's location directives are set to point to the right places, according to the chosen directory layout.

The first thing to do is to set a Port directive - it should be different from 80 since we cannot bind 2 servers to use the same port number on the same machine. Here we will use 8080. Some developers use port 81, but you can bind to it, only if you have root permissions. If you are running on multiuser machine, there is a chance someone already uses that port, or will start using it in the future - which as you understand might cause a collision. If you are the only user on your machine, basically you can pick any not used port number. Port number choosing is a controversial topic, since many organizations use firewalls, which may block some of the ports, or enable only a known ones. From my experience the most used port numbers are: 80, 81, 8000 and 8080. Personally, I prefer the port 8080. Of course with 2 server scenario you can hide the nonstandard port number from firewalls and users, by either using the mod_proxy's ProxyPass or proxy server like squid.

For more details see Publishing port numbers different from 80 , Running 1 webserver and squid in httpd accelerator mode, Running 2 webservers and squid in httpd accelerator mode and Using mod_proxy.

Now we proceed to mod_perl specific directives. A good idea will be to add them all at the end of the httpd.conf, since you are going to fiddle a lot with them at the beginning.

First, you need to specify the location where all mod_perl scripts will be located.

Add the following configuration directive:

    # mod_perl scripts will be called from
  Alias /perl/ /usr/local/myproject/perl/

From now on, all requests starting with /perl will be executed under mod_perl and will be mapped to the files in /usr/local/myproject/perl/.

Now we should configure the /perl location.

  PerlModule Apache::Registry

  <Location /perl>
    #AllowOverride None
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    allow from all
    PerlSendHeader On
  </Location>

This configuration causes all scripts that are called with a /perl path prefix to be executed under the Apache::Registry module and as a CGI (so the ExecCGI, if you omit this option the script will be printed to the user's browser as a plain text or will possibly trigger a 'Save-As' window). Apache::Registry module lets you run almost unaltered CGI/perl scripts under mod_perl. PerlModule directive is an equivalent of perl's require(). We load the Apache::Registry module before we use it in the PerlHandler in the Location configuration.

PerlSendHeader On tells the server to send an HTTP header to the browser on every script invocation. You will want to turn this off for nph (non-parsed-headers) scripts.

This is only a very basic configuration. Server Configuration section covers the rest of the details.

Now start the server with:

  /usr/local/sbin/httpd_perl/apachectl start

[TOC]


Running 2 webservers and squid in httpd accelerator mode

While I have detailed the mod_perl server installation, you are on your own with installing the squid server (See Getting Helped for more details). I run linux, so I downloaded the rpm package, installed it, configured the /etc/squid/squid.conf, fired off the server and was all set. Basically once you have the squid installed, you just need to modify the default squid.conf the way I will explain below, then you are ready to run it.

First, let's understand what do we have in hands and what do we want from squid. We have an httpd_docs and httpd_perl servers listening on ports 81 and 8080 accordingly (we have to move the httpd_docs server to port 81, since port 80 will be taken over by squid). Both reside on the same machine as squid. We want squid to listen on port 80, forward a single static object request to the port httpd_docs server listens to, and dynamic request to httpd_perl's port. Both servers return the data to the proxy server (unless it is already cached in the squid), so user never sees the other ports and never knows that there might be more then one server running. Proxy server makes all the magic behind it transparent to user. Do not confuse it with mod_rewrite, where a server redirects the request somewhere according to the rules and forgets about it. The described functionality is being known as httpd accelerator mode in proxy dialect.

You should understand that squid can be used as a straight forward proxy server, generally used at companies and ISPs to cut down the incoming traffic by caching the most popular requests. However we want to run it in the httpd accelerator mode. Two directives: httpd_accel_host and httpd_accel_port enable this mode. We will see more details in a few seconds. If you are currently using the squid in the regular proxy mode, you can extend its functionality by running both modes concurrently. To accomplish this, you extend the existent squid configuration with httpd accelerator mode's related directives or you just create one from scratch.

As stated before, squid listens now to the port 80, we have to move the httpd_docs server to listen for example to the port 81 (your mileage may vary :). So you have to modify the httpd.conf in the httpd_docs configuration directory and restart the httpd_docs server (But not before we get the squid running if you are working on the production server). And as you remember httpd_perl listens to port 8080.

Let's go through the changes we should make to the default configuration file. Since this file (/etc/squid/squid.conf) is huge (about 60k+) and we would not use 95% of it, my suggestion is to write a new one including only the modified directives.

We want to enable the redirect feature, to be able to serve requests, by more then one server (in our case we have httpd_docs and httpd_perl) servers. So we specify httpd_accel_host as virtual. This assumes that your server has multiple interfaces - Squid will bind to all of them.

  httpd_accel_host virtual

Then we define the default port - by default, if not redirected, httpd_docs will serve the pages. We assume that most requests will be of the static nature. We have our httpd_docs listening on port 81.

  httpd_accel_port 81

And as described before, squid listens to port 80.

  http_port 80

We do not use icp (icp used for cache sharing between neighbor machines), which is more relevant in the proxy mode.

  icp_port 0

hierarchy_stoplist defines a list of words which, if found in a URL, causes the object to be handled directly by this cache. In other words, use this to not query neighbor caches for certain objects. Note that I have configured the /cgi-bin and /perl aliases for my dynamic documents, if you named them in a different way, make sure to use the correct aliases here.

  hierarchy_stoplist /cgi-bin /perl

Now we tell squid not to cache dynamic pages.

  acl QUERY urlpath_regex /cgi-bin /perl
  no_cache deny QUERY

Please note that the last two directives are controversial ones. If you want your scripts to be more complying with the HTTP standards, the headers of your scripts should carry the Caching Directives according to the HTTP specs. You will find a complete tutorial about this topic in Tutorial on HTTP Headers for mod_perl users by Andreas J. Koenig (at http://perl.apache.org ). If you set the headers correctly there is no need to tell squid accelerator to NOT try to cache something. The headers I am talking about are Last-Modified and Expires. What are they good for? Squid would not bother your mod_perl server a second time if a request is (a) cachable and (b) still in the cache. Many mod_perl applications will produce identical results on identical requests at least if not much time goes by between the requests. So your squid might have a hit ratio of 50%, which means that mod_perl servers will have as twice as less work to do than before. This is only possible by setting the headers correctly.

Even if you insert user-ID and date in your page, caching can save resources when you set the expiration time to 1 second. A user might double click where a single click would do, thus sending two requests in parallel, squid could serve the second request.

But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I described. But keep in mind that one day you will want to reread this snippet and the Andreas' tutorial and squeeze even more power from your servers without investing money for additional memory and better hardware.

While testing you might want to enable the debugging options and watch the log files in /var/log/squid/. But turn it off in your production server. I list it commented out. (28 == access control routes).

  # debug_options ALL, 1, 28, 9

We need to provide a way for squid to dispatch the requests to the correct servers, static object requests should be redirected to httpd_docs (unless they are already cached), while dynamic should go to the httpd_perl server. The configuration below tells squid to fire off 10 redirect daemons at the specified path of the redirect daemon and disables rewriting of any Host: headers in redirected requests (as suggested by squid's documentation). The redirection daemon script is enlisted below.

  redirect_program /usr/lib/squid/redirect.pl
  redirect_children 10
  redirect_rewrites_host_header off

Maximum allowed request size in kilobytes. This one is pretty obvious. If you are using POST to upload files, then set this to the largest file's size plus a few extra kbytes.

  request_size 1000 KB

Then we have access permissions, which I will not explain. But you might want to read the documentation so to avoid any security flaws.

  acl all src 0.0.0.0/0.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl myserver src 127.0.0.1/255.255.255.255
  acl SSL_ports port 443 563
  acl Safe_ports port 80 81 8080 81 443 563
  acl CONNECT method CONNECT
  
  http_access allow manager localhost
  http_access allow manager myserver
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  # http_access allow all

Since squid should be run as non-root user, you need these if you are invoking the squid as root.

  cache_effective_user squid
  cache_effective_group squid

Now configure a memory size to be used for caching. A squid documentation warns that the actual size of squid can grow three times larger than the value you are going to set.

  cache_mem 20 MB

Keep pools of allocated (but unused) memory available for future use. Read more about it in the squid documents.

  memory_pools on

Now tight the runtime permissions of the cache manager CGI script (cachemgr.cgi,that comes bundled with squid) on your production server.

  cachemgr_passwd disable shutdown
  #cachemgr_passwd none all

Now the redirection daemon script (you should put it at the location you have specified by redirect_program parameter in the config file above, and make it executable by webserver of course):

  #!/usr/local/bin/perl
  
  $|=1;
  
  while (<>) {
      # redirect to mod_perl server (httpd_perl)
    print($_), next if s|(:81)?/perl/|:8080/perl/|o;

      # send it unchanged to plain apache server (http_docs)
    print;
  }

In my scenario the proxy and the apache servers are running on the same machine, that's why I just substitute the port. In the presented squid configuration, requests that passed through squid are converted to point to the localhost (which is 127.0.0.1). The above redirector can be more complex of course, but you know the perl, right?

A few notes regarding redirector script:

You must disable buffering. $|=1; does the job. If you do not disable buffering, the STDOUT will be flushed only when the buffer becomes full and its default size is about 4096 characters. So if you have an average URL of 70 chars, only after 59 (4096/70) requests the buffer will be flushed, and the requests will finally achieve the server in target. Your users will just wait till it will be filled up.

If you think that it is a very ineffective way to redirect, I'll try to prove you the opposite. The redirector runs as a daemon, it fires up N redirect daemons, so there is no problem with perl interpreter loading, exactly like mod_perl -- perl is loaded all the time and the code was already compiled, so redirect is very fast (not slower if redirector was written in C or alike). Squid keeps an open pipe to each redirect daemon, thus there is even no overhead of the expensive system calls.

Now it is time to restart the server, at linux I do it with:

  /etc/rc.d/init.d/squid restart

Now the setup is complete ...

Almost... When you try the new setup, you will be surprised and upset to discover a port 81 showing up in the URLs of the static objects (like htmls). Hey, we did not want the user to see the port 81 and use it instead of 80, since then it will bypass the squid server and the hard work we went through was just a waste of time?

The solution is to run both squid and httpd_docs at the same port. This can be accomplished by binding each one to a specific interface. Modify the httpd.conf in the httpd_docs configuration directory:

  Port 80
  BindAddress 127.0.0.1
  Listen 127.0.0.1:80

Modify the squid.conf:

  http_port 80
  tcp_incoming_address 123.123.123.3
  tcp_outgoing_address 127.0.0.1
  httpd_accel_host 127.0.0.1
  httpd_accel_port 80

Where 123.123.123.3 should be replaced with IP of your main server. Now restart squid and httpd_docs in either order you want, and voila the port number has gone.

You must also have in the /etc/hosts an entry (most chances that it's already there):

  127.0.0.1  localhost.localdomain   localhost

Now if your scripts were generating HTML including fully qualified self references, using the 8080 or other port -- you should fix them to generate links to point to port 80 (which means not using the port at all). If you do not, users will bypass squid, like if it was not there at all, by making direct requests to the mod_perl server's port.

The only question left is what to do with users who bookmarked your services and they still have the port 8080 inside the URL. Do not worry about it. The most important thing is for your scripts to return a full URLs, so if the user comes from the link with 8080 port inside, let it be. Just make sure that all the consecutive calls to your server will be rewritten correctly. During a period of time users will change their bookmarks. What can be done is to send them an email if you have one, or to leave a note on your pages asking users to update their bookmarks. You could avoid this problem if you did not publish this non-80 port in first place. See Publishing port numbers different from 80.

<META> Need to write up a section about server logging with squid. One thing I sure would like to know is how requests are logged with this setup. I have, as most everyone I imagine, log rotation, analysis, archiving scripts and they all assume a single log. Does one have different logs that have to be merged (up to 3 for each server + squid) ? Even when squid responds to a request out of its cache I'd still want the thing to be logged. </META>

See Using mod_proxy for information about X-Forwarded-For.

To save you some keystrokes, here is the whole modified squid.conf:

  http_port 80
  tcp_incoming_address 123.123.123.3
  tcp_outgoing_address 127.0.0.1
  httpd_accel_host 127.0.0.1
  httpd_accel_port 80
  
  icp_port 0
  
  hierarchy_stoplist /cgi-bin /perl
  acl QUERY urlpath_regex /cgi-bin /perl
  no_cache deny QUERY
  
  # debug_options ALL,1 28,9
  
  redirect_program /usr/lib/squid/redirect.pl
  redirect_children 10
  redirect_rewrites_host_header off
  
  request_size 1000 KB
  
  acl all src 0.0.0.0/0.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl myserver src 127.0.0.1/255.255.255.255
  acl SSL_ports port 443 563
  acl Safe_ports port 80 81 8080 81 443 563
  acl CONNECT method CONNECT
  
  http_access allow manager localhost
  http_access allow manager myserver
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  # http_access allow all
  
  cache_effective_user squid
  cache_effective_group squid
  
  cache_mem 20 MB
  
  memory_pools on
  
  cachemgr_passwd disable shutdown

Note that all directives should start at the beginning of the line.

[TOC]


Running 1 webserver and squid in httpd accelerator mode

When I was first told about squid, I thought: ``Hey, Now I can drop the httpd_docs server and to have only squid and httpd_perl servers``. Since all my static objects will be cached by squid, I do not need the light httpd_docs server. But it was a wrong assumption. Why? Because you still have the overhead of loading the objects into squid at first time, and if your site has many of them -- not all of them will be cached (unless you have devoted a huge chunk of memory to squid) and my heavy mod_perl servers will still have an overhead of serving the static objects. How one would measure the overhead? The difference between the two servers is memory consumption, everything else (e.g. I/O) should be equal. So you have to estimate the time needed for first time fetching of each static object at a peak period and thus the number of additional servers you need for serving the static objects. This will allow you to calculate additional memory requirements. I can imagine, this amount could be significant in some installations.

So I have decided to have even more administration overhead and to stick with squid, httpd_docs and httpd_perl scenario, where I can optimize and fine tune everything. Of course this can be not your case. If you are feeling that the scenario from the previous section is too complicated for you, make it simpler. Have only one server with mod_perl built in and let the squid to do most of the job that plain light apache used to do. As I have explained in the previous paragraph, you should pick this lighter setup only if you can make squid cache most of your static objects. If it cannot, your mod_perl server will do the work we do not want it to.

If you are still with me, install apache with mod_perl and squid. Then use a similar configuration from the previous section, but now httpd_docs is not there anymore. Also we do not need the redirector anymore and we specify httpd_accel_host as a name of the server and not virtual. There is no need to bind two servers on the same port, because we do not redirect and there is neither Bind nor Listen directives in the httpd.conf anymore.

The modified configuration (see the explanations in the previous section):

  httpd_accel_host put.your.hostname.here
  httpd_accel_port 8080
  http_port 80
  icp_port 0
  
  hierarchy_stoplist /cgi-bin /perl
  acl QUERY urlpath_regex /cgi-bin /perl
  no_cache deny QUERY
  
  # debug_options ALL, 1, 28, 9
  
  # redirect_program /usr/lib/squid/redirect.pl
  # redirect_children 10
  # redirect_rewrites_host_header off
  
  request_size 1000 KB
  
  acl all src 0.0.0.0/0.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl myserver src 127.0.0.1/255.255.255.255
  acl SSL_ports port 443 563
  acl Safe_ports port 80 81 8080 81 443 563
  acl CONNECT method CONNECT
  
  http_access allow manager localhost
  http_access allow manager myserver
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  # http_access allow all
  
  cache_effective_user squid
  cache_effective_group squid
  
  cache_mem 20 MB
  
  memory_pools on
  
  cachemgr_passwd disable shutdown

[TOC]


Building and Using mod_proxy

To build it into apache just add --enable-module=proxy during the apache configure stage.

Now we will talk about apache's mod_proxy and understand how it works.

The server on port 80 answers http requests directly and proxies the mod_perl enabled server in the following way:

  ProxyPass        /modperl/ http://localhost:81/modperl/
  ProxyPassReverse /modperl/ http://localhost:81/modperl/

PPR is the saving grace here, that makes apache a win over Squid. It rewrites the redirect on its way back to the original URI.

You can control the buffering feature with ProxyReceiveBufferSize directive:

  ProxyReceiveBufferSize 1048576

The above setting will set a buffer size to be of 1Mb. If it is not set explicitly, then the default buffer size is used, which depends on OS, for Linux I suspect it is somewhere below 32k. So basically to get an immediate release of the mod_perl server from stale awaiting, ProxyReceiveBufferSize should be set to a value greater than the biggest generated respond produced by any mod_perl script.

The ProxyReceiveBufferSize directive specifies an explicit buffer size for outgoing HTTP and FTP connections. It has to be greater than 512 or set to 0 to indicate that the system's default buffer size should be used.

As the name states, its buffering feature applies only to downstream data (coming from the origin server to the proxy) and not upstream (i.e. buffering the data being uploaded from the client browser to the proxy, thus freeing the httpd_perl origin server from being tied up during a large POST such as a file upload).

Apache does caching as well. It's relevant to mod_perl only if you produce proper headers, so your scripts' output can be cached. See apache documentation for more details on configuration of this capability.

Ask Bjoern Hansen has written a mod_proxy_add_forward module for apache, that sets the X-Forwarded-For field when doing a ProxyPass, similar to what squid can do. (Its location is specified in the help section). Basically, that module adds an extra HTTP header to proxying requests. You can access that header in the mod_perl-enabled server, and set the IP of the remote server. You won't need to compile anything into the back-end server, if you are using Apache::{Registry,PerlRun} just put something like the following into start-up.pl:

  sub My::ProxyRemoteAddr ($) {
    my $r = shift;
   
        # we'll only look at the X-Forwarded-For header if the requests
        # comes from our proxy at localhost
        return OK unless ($r->connection->remote_ip eq "127.0.0.1");
   
        if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) {
          $r->connection->remote_ip($ip);
        }
        
        return OK;
  }

And in httpd.conf:

  PerlPostReadRequestHandler My::ProxyRemoteAddr

Different sites have different needs. If you're using the header to set the IP address, apache believes it is dealing with (in the logging and stuff), you really don't want anyone but your own system to set the header. That's why the above ``recommended code'' checks where the request is really coming from, before changing the remote_ip.

Generally you shouldn't trust the X-Forwarded-For header. You only want to rely on X-Forwarded-For headers from proxies you control yourself. If you know how to spoof a cookie you've probably got the general idea on making HTTP headers and can spoof the X-Forwarded-For header as well. The only address *you* can count on as being a reliable value is the one from r-&gt;connection-&gt;remote_ip.

From that point on, the remote IP address is correct. You should be able to access REMOTE_ADDR as usual.

You could do the same thing with other environment variables (though I think several of them are preserved, you will want to run some tests to see which ones).

[TOC]


mod_perl server as DSO

To build the mod_perl as DSO add USE_DSO=1 to the rest of configuration parameters (to build libperl.so instead of libperl.a), like:

  perl Makefile.PL USE_DSO=1 ...

If you run ./configure from apache source do not forget to add: --enable-shared=perl

Then just add the LoadModule directive into your httpd.conf.

You will find a complete explanation in the INSTALL.apaci pod which can be found in the mod_perl distribution.

Some people reported that DSO compiled mod_perl would not run on specific OS/perl version. Also threads enabled perl reported sometimes to break the mod_perl/DSO. But it still can work for you.

[TOC]


HTTP Authentication with 2 servers + proxy

Assuming that you have a setup of one ``front-end'' server, which proxies the ``back-end'' (mod_perl) server, if you need to perform the authentication in the ``back-end'' server, it should handle all authentication itself. If apache proxies correctly, it seems like it would pass through all authentication information, making the ``front-end'' apache somewhat ``dumb'', as it does nothing, but passes through all the information.

The only possible caveat in the config file is that your Auth stuff needs to be in <Directory ...> ... </Directory> tags because if you use a <Location /...> ... </Location> the proxypass server takes the auth info for its own authentication and would not pass it on.

The same with mod_ssl, if plugged into a front-end server, all the SSL requests be encoded/decoded properly by it.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 09/25/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.