Mod Perl Icon Mod Perl Icon CGI to mod_perl Porting. mod_perl Coding guidelines.


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Document Coverage

This document is relevant to both writing a new CGI from scratch and migrating an application from plain CGI to mod_perl.

If you are in the porting stage, use it as a reference for possible problems you might encounter when running the existent CGI in the new mode.

If you are about to write a new CGI from scratch, it would be a good idea to learn most of the possible pitfalls and to avoid them in first place.

It covers also the case where the CGI script being ported does the job, but is too dirty to be easily altered to run as a mod_perl program. (Apache::PerlRun)

If your project schedule is tight, I would suggest converting to mod_perl in the following steps: Initially, run all the scripts in the Apache::PerlRun mode. Then as time allows, move them into Apache::Registry mode.

[TOC]


Before you start to code

It can be a good idea to tighten up some of your Perl programming practices, since Apache::Registry doesn't allow sloppy programming.

You might want to read:

[TOC]


Exposing Apache::Registry secrets

Let's start with some simple code and see what can go wrong with it ,detect bugs and debug them, discuss possible caveats and how to avoid them.

I will use a simple CGI script, that initializes a $counter to 0, and prints its value to the screen while incrementing it.

  counter.pl:
  ----------
  #!/usr/bin/perl -w
  use strict;
  
  print "Content-type: text/html\r\n\r\n";
  
  my $counter = 0;
  
  for (1..5) {
    increment_counter();
  }
  
  sub increment_counter{
    $counter++;
    print "Counter is equal to $counter !<BR>\n";
  }
  ----------

You would expect to see an output:

  Counter is equal to 1 !
  Counter is equal to 2 !
  Counter is equal to 3 !
  Counter is equal to 4 !
  Counter is equal to 5 !

And that's what you see when you execute this script at first time. But let's reload it a few times... See, suddenly after a few reloads the counter doesn't start its count from 5 anymore. We continue to reload and see that it keeps on growing, but not steadily 10, 10, 10, 15, 20... Weird...

  Counter is equal to 6 !
  Counter is equal to 7 !
  Counter is equal to 8 !
  Counter is equal to 9 !
  Counter is equal to 10 !

We saw two anomalies in this very simple script: Unexpected growth of counter over 5 and inconsistent growth over reloads. Let's investigate this script.

First let's peek into an error_log file... what we see is:

  Variable "$counter" will not stay shared 
  at /home/httpd/perl/conference/counter.pl line 13.

What kind of error is this? We should ask perl to help us. I'm going to enable a special diagnostics mode, by adding at the top of the script:

  use diagnostics;

Reloading again, error_log shows:

    Variable "$counter" will not stay shared at
        /home/httpd/perl/conference/counter.pl line 15 (#1)
    
    (W) An inner (nested) named subroutine is referencing a lexical
    variable defined in an outer subroutine.
    
    When the inner subroutine is called, it will probably see the value of
    the outer subroutine's variable as it was before and during the
    *first* call to the outer subroutine; in this case, after the first
    call to the outer subroutine is complete, the inner and outer
    subroutines will no longer share a common value for the variable.  In
    other words, the variable will no longer be shared.
    
    Furthermore, if the outer subroutine is anonymous and references a
    lexical variable outside itself, then the outer and inner subroutines
    will never share the given variable.
    
    This problem can usually be solved by making the inner subroutine
    anonymous, using the sub {} syntax.  When inner anonymous subs that
    reference variables in outer subroutines are called or referenced,
    they are automatically rebound to the current values of such
    variables.

Actually perl detected a closure, which is sometimes a wanted effect, but not in our case (see perldoc perlsub for more information about closures). While diagnostics.pm sometimes is handy for debugging purpose - it drastically slows down your CGI script. Make sure you remove it in your production server.

Do you see a nested named subroutine in my script? I do not!!! What is going on? I suggest to report a bug. But wait, may be a perl interpreter sees the script in a different way, may be the code goes through some changes before it actually gets executed? The easiest way to check what's actually happening is to run the script with debugger, but since we must debug it when it's being executed by the server, normal debugging process wouldn't help, for we have to invoke the debugger from within the webserver. Luckily Doug wrote an Apache::DB module and we will use it to debug my script. I'll do it non-interactively (while you can debug interactively with Apache::DB). I change my http.conf with:

  PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2"
  PerlModule Apache::DB
  <Location /perl>
    PerlFixupHandler Apache::DB
    SetHandler perl-script
    PerlHandler Apache::Registry::handler
    Options ExecCGI
    PerlSendHeader On
  </Location>

Comment out 'use diagnostics;', restart the server and call the counter.pl from your browser. On the surface nothing changed - we still see the correct output as before, but two things happened at the background: first -- the /tmp/db.out was written, with a complete trace of the code that was executed, second -- error_log file showed us the whole code that was executed as a side effect of reporting the warning we saw before: Variable "$counter" will not stay shared at (eval 52) line 15.... In any case that's the code that actually is being executed:

  package Apache::ROOT::perl::conference::counter_2epl;
  use Apache qw(exit);
  sub handler {
    BEGIN {
      $^W = 1;
    };
    $^W = 1;
    
    use strict;
    
    print "Content-type: text/html\r\n\r\n";
    
    my $counter = 0;
    
    for (1..5) {
      increment_counter();
    }
    
    sub increment_counter{
      $counter++;
      print "Counter is equal to $counter !<BR>\n";
    }
    
  }

What do we learn from this discovering? First that every cgi script is being cached under a package whose name is compounded from Apache::ROOT:: prefix and the relative part of the script's URL (perl::conference::counter_2epl) by replacing all occurrences of / with ::. That's how mod_perl knows what script should be fetched from cache - each script is just a package with a single subroutine named handler. Now you understand why diagnostics pragma talked about inner (nested) subroutine - increment_counter is actually a nested sub. In every script each subroutine is nested inside the handler subroutine.

One of the workarounds is to use global declared variables, with vars pragma.

  # !/usr/bin/perl -w
  use strict;
  use vars qw($counter);
  
  print "Content-type: text/html\r\n\r\n";
  
  $counter = 0;
  
  for (1..5) {
    increment_counter();
  }
  
  sub increment_counter{
    $counter++;
    print "Counter is equal to $counter !<BR>\n";
  }

There is no more closure effect, since there is no my() (lexically) defined variable being used in the nested subroutine.

Another approach is to use fully qualified variables, which is even better, since less memory will be used, but it adds an overhead of extra typing:

  #!/usr/bin/perl -w
  use strict;
  
  print "Content-type: text/html\r\n\r\n";
  
  $main::counter = 0;
  
  for (1..5) {
    increment_counter();
  }
  
  sub increment_counter{
    $main::counter++;
    print "Counter is equal to $main::counter !<BR>\n";
  }

Another working but not quite good solution, is always to pass the variable as an argument. It's not good when the variable can be quite big, so it adds an overhead of time and memory.

  #!/usr/bin/perl -w
  use strict;
  
  print "Content-type: text/html\r\n\r\n";
  
  my $counter = 0;
  
  for (1..5) {
    increment_counter($counter);
  }
  
  sub increment_counter{
    my $counter = shift || 0 ;
    $counter++; 
    print "Counter is equal to $counter !<BR>\n";
  }

It's important to understand that the closure effect happens only with code that Apache::Registry wraps with a declaration of the handler subroutine. If you put your code into a library or module, which the main script require()'s or use()'s, there is no such a problem. For example if we put the subroutine increment_counter() into a mylib.pl (e.g. save it in the same directory as the main script) and require() it, there will be no problem at all. (Don't forget the 1; at the end of the library or the require() might fail.)

  mylib.pl:
  ----------
  sub increment_counter{
    $counter++;
    print "Counter is equal to $counter !<BR>\n";
  }
  1;
  ----------

  counter.pl:
  ----------
  #!/usr/bin/perl -w
  use strict;
  require "./mylib.pl";
  
  print "Content-type: text/html\r\n\r\n";
  
  my $counter = 0;
  
  for (1..5) {
    increment_counter();
  }
  
  ----------

Personally, unless the script is too short, I've got used to write all the code in the external libraries, and to have only a few lines in the main script, generally to call the main function of my library. Usually i call it init(). I don't worry about closure effect anymore (unless I create it myself :).

You shouldn't be intimidated by this issue at all, since Perl is your friend. Just keep the warnings mode On and whenever you will have this effect in place, Perl will gladly tell you that by saying:

  Variable "$counter" will not stay shared at ...[snipped]

Just don't forget to check your error_log file, before going in production!

BTW, the above example was pretty boring. In my first days of using mod_perl, I wrote some cool user registration program. I'll show a very simple represenataion of this program.

  use CGI;
  $q = new CGI;
  my $name = $q->param('name')
  print_respond();
  
  sub print_respond{
    print "Content-type: text/html\n\n";
    print "Thank you, $name!";
  }

My boss and I have checked the program at the development server at it worked OK. So we decided to put it in production, everything was OK, but my boss decided to keep on checking by submitting a variations of his profile. Imagine what was the surprise when after submitting his name (let's say ``Me Boss'' :), he saw a response ``Thank you, Stas Bekman!''. What happened is that I tried the production system as well. I was new to mod_perl stuff and was so excited with the speed improve, I didn't notice the clusure problem and it hit me. At the beginning I thought that may be Apache started to confuse connection, by returning responses from other people's requests. I was wrong of course. Why didn't we notice this when we were trying the system at our development server? Keep reading and you will understand what was the problem.

Now let's return to our original example and proceed with the second mystery we have noticed. Why did we see inconsistent results over numerous reloads. That's very simple. Every time a server gets a request to process, it handles it over one of the children, generally in a round robin fashion. So if you have 10 httpd children alive, first 10 reloads might seem to be correct. Since the closure starts to effect from the second re-invocation, consequent reloads return unexpected results. Moreover children don't serve the same request always consequently, at any given moment one of the children could serve more times the same script than any other. That's why we saw that strange behavior.

And now you understand why we didn't notice the problem with the user registration system in the last example I've presented. First we didn't look at the error_log files. (As a matter of fact we did, but there were so many warnings in there, we couldn't tell what are the important ones and what aren't). Then we didn't test the system under -X flag (single mode) and we have had too many server children running to notice the problem.

A workaround is to run the server in a single server mode. You achieve this by invoking the server with -X parameter (httpd -X). Since there is no other servers (children) running - you will detect the problem on the second reload. But before that let the error_log to help you detect most of the possible errors - most of the warnings can become errors, so you better make sure to check every warning that is being detected by perl, and probably to write the code in a way, that none of the warnings will show up in the error_log. If your error_log file is being filled up with hundreds of lines on every script invocation - you will have a problem to locate and notice real problems.

Of course none of the warnings will be reported if the warning mechanism will not be turned ON. With mod_perl it is also possible to turn on warnings globally via the PerlWarn directive, just add into a httpd.conf:

    PerlWarn On

You can turn it off within your code with local $^W=0. on the local basis (or inside the block). If you write $^W=0 you disable the warning mode everywhere inside the child, $^W=1 enables it back. So if perl warns you somewhere you sure it's not a problem, you can locally disable the warning, e.g.:

  [snip]
    # we want perl to be quiet here - 
    # we don't care whether $a was initialized
  local $^W = 0;
    print $a;
  local $^W = 1;
  [snip]

Of course this is not a way to fix initialization and other problems, but sometimes it helps.

Sometimes it's very hard to understand what the warning complains about, you see the source code, but you cannot understand why some specific snippet produces warning. The mystery is in fact that the code can be called from different places, e.g when it's a subrotine.

I'll show you an example of such code.

  local $^W=1;
  good();
  bad();
  
  sub good{
    print_value("Perl");
  }
  
  sub bad{
    print_value();
  }
  
  sub print_value{
    my $var = shift;
    print "My value is $var\n";
  }

In the code above, there is a sub that prints the passed value, sub good that passes correctly the value and sub bad where we forgot to pass it. When we run the script, we get the warning:

  Use of uninitialized value at ./warning.pl line 15.

We can clearly see that there is an undefined value at the line, that attempts to print it:

  print "My value is $var\n";

But how do we know, why it was undefined? The solution is quite simple. What we need is a full stack trace which triggered the warning.

The Carp module comes to help with its cluck() function. Let's modify the script:

  use Carp ();
  local $SIG{__WARN__} = \&Carp::cluck;
  
  local $^W=1;
  good();
  bad();
  
  sub good{
    print_value("Perl");
  }
  
  sub bad{
    print_value();
  }
  
  sub print_value{
    my $var = shift;
    print "My value is $var\n";
  }

Now when we execute it, we would see:

  Use of uninitialized value at /home/httpd/perl/book/warning.pl line 17.
  Apache::ROOT::perl::book::warning_2epl::print_value() 
    called at /home/httpd/perl/book/warning.pl line 12
  Apache::ROOT::perl::book::warning_2epl::bad() 
    called at /home/httpd/perl/book/warning.pl line 5
  Apache::ROOT::perl::book::warning_2epl::handler('Apache=SCALAR(0x84b1154)') 
    called at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
  eval {...} called at 
    /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
  Apache::Registry::handler('Apache=SCALAR(0x84b1154)') 
    called at PerlHandler subroutine `Apache::Registry::handler' line 0
  eval {...} called at PerlHandler subroutine `Apache::Registry::handler' line 0

Take a moment to understand the trace, the only part that we are intersted in is the one that starts when actuall script is being called, so we can skip the Apache::Registry trace part. So we get:

  Use of uninitialized value at /home/httpd/perl/book/warning.pl line 17.
  Apache::ROOT::perl::book::warning_2epl::print_value() 
    called at /home/httpd/perl/book/warning.pl line 12
  Apache::ROOT::perl::book::warning_2epl::bad() 
    called at /home/httpd/perl/book/warning.pl line 5

which tells us that the code that triggered the warning was:

  Apache::Registry code => bad() => print_value()

We go into a bad() and indeed see that we forgot to pass the variable. Of course when you write a subroutine like print_value it could be a good idea to check the passed arguments before starting the execution. But it was ``good'' enough to show you how to ease the code debugging process.

Sure, you would say. I could find the problem by a simple inspectation of the code. You are right, but I promise you that your task would be quite complicated and time consuming for the code of thousands lines.

Notice the local() keyword, before the settings of the $SIG{__WARN__}. Since it's a global variable, forgetting to use local() will enforce this setting for all the scripts running under the same process. And if it's wanted behaviour, for example in the development server, you better do it in the startup file, where you can easily switch this feature on and off when the server restarts.

As you have noticed warnings report the line number the event happened at, so it's supposed to help to find the problematic code. The problem is that many times the line numbers are incorrect, because certain use of the eval operator and ``here'' documents are known to throw off Perl's line numbering.

META: move here the code that explains the settings of #line

While having a warning mode turned On is a must in a development server, you better turn it globally Off in a production server, since if every CGI script generates only one warning per request, and your server serves millions of requests per day - your log file will eat up all of your disk space and machine will die. My production serves has the following directive in the httpd.conf:

    PerlWarn Off

While we are talking about control flags, another even more important flag is -T which turns On the Taint mode On. Since this is very broad topic I'll not discuss it here, but if you aren't forcing all of your scripts to run under Taint mode you are looking for a trouble (always remember about malicious users). To turn it on, add to httpd.conf:

  PerlTaintCheck On

[TOC]


Sometimes it Works Sometimes it Does Not

When you start running your scripts under mod_perl, you might find yourself in situation where a script seems to work, but sometimes it screws up. And the more it runs without a restart, the more it screws up. Many times you can resolve this problem very easily. You have to test your script under a server running in a single process mode (httpd -X).

Generally the problem you have is of using global variables. Since global variables don't change from one script invocation to another unless you change them, you can find your scripts do ``fancy'' things.

The first example is amazing -- Web Services. Imagine that you enter some site you have your account on (Free Email Account?). Now you want to see what other users read.

You type in a username you want to peek at and a dummy password and try to enter the account. On some services it does works!!!

You say, why in the world does this happen? The answer is simple: Global Variables. You have entered the account of someone who happened to be served by the same server child as you. Because of sloppy programming, a global variable was not reset at the beginning of the program and voila, you can easily peek into other people's emails! Here is an example of sloppy written code:

  use vars ($authenticated);
  my $q = new CGI;
  my $username = $q->param('username');
  my $passwd   = $q->param('passwd');
  authenticate($username,$passwd);
    # failed, break out
  die "Wrong passwd" unless $authenticated == 1;
    # user is OK, fetch user's data
  show_user($username);
  
  sub authenticate{
    my ($username,$passwd) = @_;
        # some checking
    $authenticated = 1 if (SOMETHING);
  }

Do you see the catch? With the code above, I can type in any valid username and any dummy passwd and enter that user's account, if someone has successfully entered his account before me using the same child process! Since $authenticated is global - if it becomes 1 once it'll be 1 for the remainder of the child's life!!! The solution is trivial -- reset $authenticated to 0 at the beginning of the program. (Or many other different solutions). Of course this example is trivial -- but believe me it happens!

Just another little one liner that can spoil your day, assuming you forgot to reset the $allowed variable. It works perfectly OK in plain mod_cgi:

  $allowed = 1 if $username eq 'admin';

But you will let any user to admin your system with the line above (again assuming you have used the same child prior to some user request).

Another good example is usage of the /o regular expression qualifier, which compiles a regular expression once, on its first execution and never recompile it again. This problem can be difficult to detect, as after restarting the server each request you make will be served by a different child process, and thus the regex pattern for that child will be compiled fresh. Only when you make a request that happens to be served by a child which has already cached the regexp will you see the problem. Generally you miss that and when you press reload, you see that it works (with a new, fresh child) and then it doesn't (with a child that already cached the regexp and wouldn't recompile because of /o.) The example of such a case would be:

  my $pat = $q->param("keyword");
  foreach( @list ) {
    print if /$pat/o;
  }

To make sure you don't miss these bugs always test your CGI in single process. To solve this particular /o problem refer to Compiled Regular Expressions.

[TOC]


What's different about modperl

There are a few things that behave differently under mod_perl. It's good to know what they are.

[TOC]


Script's name space

Scripts under Apache::Registry do not run in package main, they run in a unique name space based on the requested URI. For example, if your URI is /perl/test.pl the package will be called Apache::ROOT::perl::test_2epl.

[TOC]


Messing with @INC

When you use(), require() or do() a file, Perl uses a @INC variable, for a list of directories to search for the file. If the file that you want to load is not located in one of the listed directories. You have to tell Perl where to find the file.

In order to require() a file located at /home/httpd/perl/examples/test.pl you would:

[TOC]


Reloading Modules and Required Files

When you develop plain CGI scripts, you can just change the code, and rerun the CGI from your browser. Since the script isn't cached in memory, the next time you call it the server starts up a new perl process, which recompiles it from scratch. The effects of any modifications you've applied are immediately present.

The situation is different with Apache::Registry, since the whole idea is to get maximum performance from the server. By default, the server won't spend the time to check whether any included library modules have been changed. It assumes that they weren't, thus saving a few milliseconds to stat() the source file (multiplied by however many modules/libraries you are use()-ing and/or require()-ing in your script.) The only check that is being done is whether your main script has been changed. So if you have only one script that doesn't use() (or require()) other perl modules (or packages), there is nothing new about it. If however, you are developing a script that includes other modules, the files you use() or require() aren't being checked whether they have been modified.

Acknowledging this, how do we get our modperl-enabled server to recognize changes in any library modules? Well, there are a couple of techniques:

[TOC]


Restarting the server

The simplest approach is to restart the server each time you apply some change to your code. See Server Restarting techniques.

[TOC]


Using Apache::StatINC for Development Process

After restarting the server about 100 times, you will be tired and will look for another solutions. Help comes from the Apache::StatINC module.

Apache::StatINC reloads %INC files when updated on disk. When Perl pulls a file via require, it stores the filename in the global hash %INC. The next time Perl tries to require the same file, it sees the file in %INC and does not reload from disk. This module's handler iterates over %INC and reloads the file if it has changed on disk.

To enable this module just add two lines to httpd.conf file.

  PerlModule Apache::StatINC
  PerlInitHandler Apache::StatINC

To be sure it really works, turn on the debug mode on your development box with PerlSetVar StatINCDebug On. You end up with something like:

  PerlModule Apache::StatINC
  <Location /perl>
    SetHandler perl-script
    PerlHandler Apache::Registry::handler
    Options ExecCGI
    PerlSendHeader On
    PerlInitHandler Apache::StatINC
    PerlSetVar StatINCDebug On
  </Location>

Beware that only the modules located in @INC are being reloaded on change, and you can change the @INC only before the server has been started (in startup file).

Whatever you do in your scripts and modules which are being required() after the server startup will not have any effect on @INC. When you do:

  use lib qw(foo/bar);

the @INC is being changed only for the time the code is being parsed and compiled. When it's over the @INC is being reset to its original value. To make sure that you have set a correct @INC fetch http://www.nowhere.com/perl-status?inc and look at the bottom of the page. (I assume you have configured the /perl-status location.)

Also, notice the following caveat: While ``.'' is in the @INC -- perl knows to require() files relative to the script directory. Once the script was parsed - the server doesn't remember the path any more! So you end up with broken entry in %INC like:

  $INC{bar.pl} eq "bar.pl"

If you want Apache::StatINC to reload your script - modify the @INC at the server startup file! or use a full path in require() call.

[TOC]


Configuration Files: Writing, Modifying and Reloading.

Checking all the modules in %INC every time can add a large overhead to server response times, and you certainly would not want Apache::StatINC module to be enabled in your production site's configuration. But sometimes you want to have a configuration file to be reloaded when this updated, without restarting the server.

This is especially important feature if you have a person that is allowed to modify some of the tool configuration, but it's very undesirable for this person to telnet to the server to restart it, for either this admin person's lack of profeccional skills or because of security reasons -- you don't want to give a root password, unless you have to.

[TOC]


Writing Configuration Files

Since we are talking about configuration files, I would like to jump on this and show you some good and bad approaches of configuration file writing.

If you have a configuration file of just a few variables, it doesn't really matter how you do it. But generally this is not a case. Configuration files tend to grow as a project grows. It's very relevant to the projects that generate HTML files, since the they tend to demand many easily configurable parameters, like headers, tails, colors and so on.

So let's start from the basic approach that is being mostly deployed by many CGI scripts writers. This approach is based on having many variables defined in a separate configuration file. For example:

  $cgi_dir = "/home/httpd/perl";
  $cgi_url = "/perl";
  $docs_dir = "/home/httpd/docs";
  $docs_url = "/";
  $img_dir = "/home/httpd/docs/images";
  $img_url = "/images";
  ... many more config params here ...
  $color_hint   = "#777777";
  $color_warn   = "#990066";
  $color_normal = "#000000";

Now when we want to use these variables in the mod_perl script we must declare them all with help of use vars in the script, because of the use strict; pragma, which demands all the variables to be declared if used in the script.

So we start the script with:

  use strict;
  use vars qw($cgi_dir $cgi_url $docs_dir $docs_url 
              ... many more config params here ....
              $color_hint  $color_warn $color_normal
             );

This is a nightmare to maintain such a script, especially if not all the features were written yet, so you keep adding and reming the names to the list. But that's not a big deal.

Since we want our code clean, we start the configuration file with use strict; as well, so we have to list the variables here as well. Second list to maintain.

The moment you will have many scripts, you will get into a problem of collisions between configuration files, where one of the best solutions is a package declaration, which makes the scripts unique (if you declare unique package names of course).

The moment you add a package declaration and think that you are done, you just realize that the nightmare has just begun. The moment you have declared the package, you cannot just require() the file and use the variables, since they now belong to a different package. So you have ether to modify all you script to use a fully qualified notation like $My::Config::cgi_url instead of just $cgi_url or to import the need variables into a script that is going to use them.

Since you don't want extra typing to make the variables fully qualified, you would go for importing approach. But your configuration package has to export them first. It means that you have to list all the variables again and now to keep at least three variable lists updated, when you do some changes in the naming of the configuration variables. And that's when you have only one script that uses the configuration file, in a general case you have many of them. So now our example of config file looks like that.

  package My::Config;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::HTML::ISA       = qw(Exporter);
    @My::HTML::EXPORT    = qw();
    @My::HTML::EXPORT_OK = qw($cgi_dir $cgi_url $docs_dir $docs_url
                              ... many more config params here ....
                              $color_hint $color_warn $color_normal);
  }
  
  use vars qw($cgi_dir $cgi_url $docs_dir $docs_url 
              ... many more config params here ....
              $color_hint  $color_warn $color_normal
             );
  
  $cgi_dir = "/home/httpd/perl";
  $cgi_url = "/perl";
  $docs_dir = "/home/httpd/docs";
  $docs_url = "/";
  $img_dir = "/home/httpd/docs/images";
  $img_url = "/images";
  ... many more config params here ...
  $color_hint   = "#777777";
  $color_warn   = "#990066";
  $color_normal = "#000000";

And in the code:

  use strict;
  use My::Config qw($cgi_dir $cgi_url $docs_dir $docs_url 
                    ... many more config params here ....
                    $color_hint  $color_warn $color_normal
                   );
  use vars       qw($cgi_dir $cgi_url $docs_dir $docs_url 
                    ... many more config params here ....
                    $color_hint  $color_warn $color_normal
                   );

But as we know this approach is especially bad in a context of mod_perl usage, since exported variables add a memory overhead. The more variables are being exported the more memory you use. Now as usual we rememeber to multiply this overhead by number of the servers we are going to run and we receive a pretty big number, which could be used to run a few more servers instead.

As a matter of fact things aren't so horrible, since you can group your variables, and call the groups by special names called tags, which can later be used as arguments to the import() or use(). You probably familiar with:

  use CGI qw(:standard :html);

We can implement it quite easily, with help of exporter_tags() and export_ok_tags() from Exporter. For example:

  BEGIN {
    use Exporter ();
    use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
    @ISA         = qw(Exporter);
    @EXPORT      = qw();
    @EXPORT_OK   = qw();
    
    %EXPORT_TAGS = (
      vars => [qw($fname $lname)],
      subs => [qw(reread_conf untaint_path)],
    );
    Exporter::export_ok_tags('vars');
    Exporter::export_ok_tags('subs');
  }

Yes, you export subroutines exactly like variables, since what's actually being exported is a symbol. The definition of these subroutines is not shown here.

In your code now you can write:

  use My::Config qw(:subs :vars);

Regarding groups of groups. Like the :all tag from CGI.pm, which is a group tag of all other groups. It will require a little more magic from your side, but you can always save your time and look up the solution inside the code of CGI.pm. It's just a matter of a little code to expand all the groups recursively.

After going through a pain of maintaining a list of variables in a big project with a huge configuration file (more than 100 variables) and many files actually using them, I have come up with a much simpler solution, using a single hash, and having all the variables kept inside. Now my configuration file looks like:

  package My::Config;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::Config::ISA       = qw(Exporter);
    @My::Config::EXPORT    = qw();
    @My::Config::EXPORT_OK = qw(%c);
  }
  
  use vars qw(%c);
  
  %c = 
    (
     dir => {
             cgi  => "/home/httpd/perl",
             docs => "/home/httpd/docs",
             img  => "/home/httpd/docs/images",
            },
     url => {
             cgi  => "/perl",
             docs => "/",
             img  => "/images",
            },
     color => {
               hint   => "#777777",
               warn   => "#990066",
               normal => "#000000",
              },
    );

A good perl style suggests keeping a comma at the end of lists. That's because additional items are tend to be added to the end of the list, and when you keep a last comma in place, you never have to remember to add one when you add a new item.

So now the script looks like:

  use strict;
  use My::Config qw(%c);
  use vars       qw(%c)
  print "Content-type: text/plain\n\n";
  print "My url docs root: $c{url}{docs}\n";

Do you see the difference? The whole mess has gone, there is only one variable to worry about.

So far so good, but let's make it even better. I would like to get rid of Exporter stuff at all. I remove all the exporting code so my config file now looks like:

  package My::Config;
  use strict;
  use vars qw(%c);
  
  %c = 
    (
     dir => {
             cgi  => "/home/httpd/perl",
             docs => "/home/httpd/docs",
             img  => "/home/httpd/docs/images",
            },
     url => {
             cgi  => "/perl",
             docs => "/",
             img  => "/images",
            },
     color => {
               hint   => "#777777",
               warn   => "#990066",
               normal => "#000000",
              },
    );

And the code

  use strict;
  use My::Config ();
  print "Content-type: text/plain\n\n";
  print "My url docs root: $My::Config::c{url}{docs}\n";

Since we still want to save us lots of typing, since now we need to use a fully qualified notation like in $My::Config::c{url}{docs}, let's use a magical perl's aliasing feature. I'll modify the code to be:

  use strict;
  use My::Config ();
  use vars qw(%c);
  *c = \%My::Config::c;
  print "Content-type: text/plain\n\n";
  print "My url docs root: $c{url}{docs}\n";

I have alised *c glob with \%My::Config::c hash reference. From now on %My::Config::c and %c are the same hash. You can read from or modify any of them, both variables are the same one.

Just one last little notice. Sometimes you see a lot of redundance in the configuration variables, like:

  $cgi_dir  = "/home/httpd/perl";
  $docs_dir = "/home/httpd/docs";
  $img_dir  = "/home/httpd/docs/images";

Now if you want to move the base path "/home/httpd" into a new place, it demands lots of typing. Of course the solution is:

  $base     = "/home/httpd";
  $cgi_dir  = "$base/perl";
  $docs_dir = "$base/docs";
  $img_dir  = "$base/docs/images";

But you cannot do the same trick with hash. This wouldn't work:

  %c =
    (
     base => "/home/httpd",
     dir => {
             cgi  => "$base/perl",
             docs => "$base/docs",
             img  => "$base/docs/images",
            },
    );

But nothing stops us from adding additional variables, which are lexically scoped with my(). The following code is correct.

  my $base = "/home/httpd";
  %c =
    (
     dir => {
             cgi  => "$base/perl",
             docs => "$base/docs",
             img  => "$base/docs/images",
            },
    );

We have just learned how to make configuration files easily maintainable, and how to save memory by avoiding variables exporting into a script's namespace.

[TOC]


Reloading Configuration Files

Now back to the task of dynamically reloading of configuration files.

First, lets see a simple case, when we just have to watch after a simple configuration file, like this one. Imagine a script that tells who is patch pumkin of the current perl release.

Sidenote: <jargon> A humourous term for the token - the object (notional or real) that gives its possessor (the ``pumpking'' or the ``pumpkineer'') exclusive access to something, e.g. applying patches to a master copy of source (for which the pumpkin is called a ``patch pumpkin'').

  use CGI ();
  use strict;
  
  my $fname = "Larry";
  my $lname = "Wall";
  my $q = new CGI;
  
  print $q->header(-type=>'text/html');
  print $q->p("$fname $lname holds the patch pumpkin
               for this perl release.");

The script has a hardcoded value for the name. It's very simple: initialize the CGI object, print the proper HTTP header and tell the world who is the current patch pumpkin.

When the patch pumkin changes we don't want to modify the script. Therefore, we put the $fname and lname variables into a configuration file.

  $fname = "Gurusamy";
  $lname = "Sarathy";
  1;

Please notice that there is no package declaration in the above file, so the code will be evaluated in the caller's package or in the main:: package if none was declared. It means that variables $fname and $lname will override (or initialize if they weren't yet) the variables with the same names in the caller's namespace. This works for global variables only -- you cannot update lexically defined (with my()) variables by this technique.

You have started the server and everything is working properly. After a while you decide to modify the configuration. How do you let your running know, that the configuration was modified, without restarting the server, remember we are in production and server restarting can be quite expensive for us. ? One of the simplest solutions is to poll the file's modification time by calling stat() before the script starts to do a real work, and if we see that the file was updated, we force a reconfiguration of the variables located in this file. We will call the function that reloads the configuration reread_conf() and it accepts a single argument, which is a relative path to the configuration file.

If your CGI script is being invoked under Apache::Registry handler, you can put the configuration file in the same directory as a script or below it and path a relative path to the file, since Apache::Registry calls a chdir() to the script's directory before it starts the script's execution. Otherwise you would have to make sure that the file will be found. do() does search the @INC libraries.

  use vars qw(%MODIFIED);
  sub reread_conf{
    my $file = shift;
    return unless $file;
    return unless -e $file and -r _;
    unless ($MODIFIED{$file} and $MODIFIED{$file} == -M _){
      my $return;
      unless ($return = do $file) {
        warn "couldn't parse $file: $@" if $@;
        warn "couldn't do $file: $!"    unless defined $return;
        warn "couldn't run $file"       unless $return;
      }
      $MODIFIED{$file} =  -M _; # Update the MODIFICATION times
    }
  } # end of reread_conf

We use do() to reload the code in this file and not require() because, do() reloads the file unconditionally, while require() will not load the file if it was already loaded in one of the previous requests, since there will be an entry in the %INC where the key is the name of the file and the value the path to it. That's how Perl keeps track of loaded files and saves overhead of reloading when it has to load the same file again. You generally doesn't notice that with plain perl scripts, but in mod_perl it's being used all the time (since the same script is being reloaded all the time, and all the require()'s files are already loaded after a first request for each process.

Nevertheless, do() keeps track of the current filename for error messages, searches the @INC libraries, updates the %INC if the file is found.

To explain all the possible warnings the script emits if something went wrong with operation, it's just for a matter of completeness. Generally you would do all these checks. If do() cannot read the file, it returns undef and sets $! to the error. If do() can read the file but cannot compile it, it returns undef and sets an error message in $@. If the file is successfully compiled, do() returns the value of the last expression evaluated.

Also the configuration file can be broken if someone has incorrectly modified it. We don't want the service to go broken, because of that. We just trap the possible failure to do() the file and ignore the changes, by the resetting the modification time. It might be a good idea to send an email to system administrator about the problem.

Notice however, that since do() updates the %INC like require() does, if you are using Apache::StatINC, it will attempt to reload this file before the reread_conf() call, so if it the file wouldn't compile the request would be aborted. This shouldn't be a problem since Apache::StatINC shouldn't be used in production (because it slows things down by stat()'ing all the files listed in %INC).

Note that we assume that the entire purpose of this function is to reload the configuration if that was changed, that's why there is no possible failure for this function. If something goes wrong we just return. This approach would be incorrect if you are going to initialize the variables thru this method on the first invocation of the script. If you do, you will want to replace each occurence of return() and warn() with die().

I used the above approach when I've had a huge configuration file that was loaded only once at the server startup, and another little configuration file that included only a few variables that could be updated by hand or through the web interface, and those variables were duplicates in the main config file.

So if webmaster breaks the syntax in this dynamic file while updating it by hand, it wouldn't affect the main configuration file (which was write-protected) and so the proper executing of the programs. Soon we will see a simple web interface which allows to modify the configuration file without actually breaking it.

A sample script using the presented subroutine would be:

  use vars qw(%MODIFIED $fname $lname);
  use CGI ();
  use strict;
  
  my $q = new CGI;
  print $q->header(-type=>'text/plain');
  my $config_file = "./config.pl";
  reread_conf($config_file);
  print $q->p("$fname $lname holds the patch pumpkin
               for this perl release.");
  
  sub reread_conf{
    my $file = shift;
    return unless $file;
    return unless -e $file and -r _;
    unless ($MODIFIED{$file} and $MODIFIED{$file} == -M _){
      my $return;
      unless ($return = do $file) {
        warn "couldn't parse $file: $@" if $@;
        warn "couldn't do $file: $!"    unless defined $return;
        warn "couldn't run $file"       unless $return;
      }
      $MODIFIED{$file} =  -M _; # Update the MODIFICATION times
    }
  } # end of reread_conf

Remember that you should be using (stat $file)[9] instead of -M $file if you are modifying the $^M variable. In some of my scripts, I reset $^M to the time of the script invocation with "$^M = time()", so I can perform -M and alike (-A, -C) file status testings relative to the script invocation time and not the time the process was started.

If your configuration file is more sophisticated and it declares a package and exports variables, the above code will work just as well. Even if you think that you will have to re-import() variables, they are just there and when do recompiles the code, the originally imported variables get updates with the values from the reloaded code.

[TOC]


Dynamically updating configuration files

The CGI script below allows a system administrator, to dynamically update configuration file through the web interface. Combining this with the code we have just showed to reload the modified files, you get a complete suite of dynamically reconfigurable system which doesn't require server restart and can be performed from any machine having just a web interface (a simple browser connected to the Internet).

Let's say you have a configuration file like:

  package MainConfig;
  
  use strict;
  use vars qw(%c);
  
  %c = (
        name     => "Larry Wall",
        release  => "5.000",
        comments => "Adding more ways to do the same thing :)",
  
        other    => "More config values",
  
        hash     => { foo  => "bar",
                    fooo => "barr",
                  },
  
        array    => [qw( a b c)],
  
       );

You want to make the variables name, release and comments dynamically configurable. Which means that you want to have a web interface with input form that allows to modify these variables. Once modified you want to update the configuration file and propogate the changes to all the currently running processes. Quite a simple task.

Let's see the main stages of this algorithm. Create a form with preset current values of the variables. Let the administrator to modify and submit the changes. Validate that the submitted information is correctly formatted (numeric fields should carry numbers, literal - words and etc). Update the configuration file. Update the modified value in the memory of the current process. Present the form as before but with updated fields if any.

The only part that seems to be complicated to implement is a configuration file update. For a few reasons. If updating the file breaks it - the whole service wouldn't work. If the file is very big and includes comments and complex data structures, parsing the file can be quite a challenge.

So let's simplify the task. We wouldn't touch the original configuration file, if all we want is to updated a few variables, why don't we create a little configuration file with just a variables that can be modified throught the web interface and overwrite it each time there is something to be changed. This way we don't have to parse the file, before updating it. And if the main configuration file is going to be changed, we don't care about it -- since we aren't dependent on it any more.

Moreover, we will have these dynamically updated variables duplicated, they will show up in both places - in the main file and in the dynamic file. We do it, to simplify the maintainance. When a new release is being installed the dynamic configuration file shouldn't exist at all. It'll be created only after a first update. The only change it requires to the main code is to add a snippet of code to load this file if it exists and was changed as we just saw.

This additional code must be executed after the main configuration file is being loaded. That way the updated variables would override the default values in the main file.

META: extend on the comments:

  # remember to run this code under taint mode
  
  use strict;
  use vars qw($q %c $dynamic_config_file %vars_to_change %validation_rules);
  
  use CGI ();
  
  use lib qw(.);
  use MainConfig ();
  *c = \%MainConfig::c;
  
  $dynamic_config_file = "./config.pl";
  
  # load the dynamic conf file if exists, and override the default
  # values from the main config file
  do $dynamic_config_file if -e $dynamic_config_file and -r _;
  
  # fields that can be changed and their titles
  %vars_to_change =
    (
     'name'     => "Patch Pumkin's Name",
     'release'  => "Current Perl Release",
     'comments' => "Release Comments",
    );
  
  %validation_rules =
    (
     'name'     => sub { $_[0] =~ /^[\w\s\.]+$/;   },
     'release'  => sub { $_[0] =~ /^\d+\.[\d_]+$/; },
     'comments' => sub { 1;                        },
    );
  
  $q = new CGI;
  print $q->header(-type=>'text/html'),
    $q->start_html();
  
  my %updates = ();
  
  # We always rewrite the dynamic config file, so we want all the
  # vars to be passed but to save the time we will only do checking
  # of vars that that was changed the rest will be retrieved from
  # the 'prev_foo' values
  foreach (keys %vars_to_change) {
    # copy var so we can modify it
    my $new_val = $q->param($_) || '';
  
    # strip a possible ^M char (DOS/WIN)
    $new_val =~ s/\cM//g;
  
    # push to hash if was changed
    $updates{$_} = $new_val
      if defined $q->param("prev_".$_) and $new_val ne $q->param("prev_".$_);
  }
  
  # Notice that we cannot trust the previous values of the variables
  # since they were presented to user as hidden form variables, and
  # of course user can mangle those. In our case we don't care since
  # it cannot make any damage, since as you will see in a minute we
  # verify each variable by the rules we define.
  
  # Process if there is something to process. Will be not called if
  # it's invoked a first time to diplay the form or when the form
  # was submitted but the values weren't modified (we know that by
  # comparing with the previous values of the variables, which are
  # the hidden fields in the form)
  
  # process and update the values if valid
  process_change_config(%updates) if %updates;
  
  # print the update form
  conf_modification_form();
  
  # update the config file but first validate that the values are correct ones
  #########################
  sub process_change_config{
    my %updates = @_; # dereference
  
      # we will list here all the malformatted vars
    my %malformatted = ();
  
    print $q->b("Trying to validate these values<BR>");
    foreach (keys %updates) {
      print "<DT><B>$_</B> => <PRE>$updates{$_}</PRE>";
  
      # now we have to handle each var to be changed very very carefully
      # since this file goes immediately into production!
      $malformatted{$_} = delete $updates{$_}
        unless $validation_rules{$_}->($updates{$_});
  
    } # end of foreach my $var (keys %updates)
  
    # print warnings if there are any invalid changes
    print $q->hr,
      $q->p($q->b(qq{Warning! These variables were attempted to be
                   changed, but found malformed, thus the original
                   value will be preserved.})
         ),
      join(",<BR>",
         map { $q->b($vars_to_change{$_}) . " : $malformatted{$_}\n"
             } keys %malformatted)
        if %malformatted;
  
    # Now complete the vars that weren't changed from the
    # $q->param('prev_var') values
    map { $updates{$_} = $q->param('prev_'.$_) unless exists $updates{$_}
        } keys %vars_to_change;
  
    # Now we have all the data that should be written into dynamic
    # config file
  
      # escape single quotes "'" while creating a file
    my $content = join "\n",
      map { $updates{$_} =~ s/(['\\])/\\$1/g;
          '$c{' . $_ . "}  =  '" . $updates{$_} . "';\n"
        } keys %updates;
  
      # now add '1;' to make require() happy
    $content .= "\n1;";
  
      # keep the dummy result in $r so it'll not complain
    eval {my $res = $content};
    if ($@) {
      print qq{Warning! Something went wrong with config file
             generation!<P> The error was : <BR><PRE>$@</PRE>};
      return;
    }
  
    print $q->hr;
  
      # overwrite the dynamic config file
    use Symbol ();
    my $fh = Symbol::gensym();
    open $fh, ">$dynamic_config_file.bak"
      or die "Can't open the $dynamic_config_file.bak for writing :$! \n";
    flock $fh,2; # exclusive lock
    seek $fh,0,2;       # rewind to the start
    truncate $fh, 0; # the file might shrink!
       print $fh $content;
    close $fh;
  
      # OK, now we make a real file
    rename "$dynamic_config_file.bak",$dynamic_config_file;
  
      # rerun it to update variables in the current process! Note that
      # it wouldn't update the variables in other processes. A special
      # code that watches the timestamps on the config file will do this
      # work for each process. Since the next invocation will update the
      # configuration anyway, why do we need to load it here? The reason
      # is simple, since we are going to fill form's input fields, with
      # the updated data.
    do $dynamic_config_file;
  
  } # end sub process_change_config
  
  ##########################
  sub conf_modification_form{
  
    print $q->center($q->h3("Update Form"));
  
    print $q->hr,
      $q->p(qq{This form allows you to dynamically update the current
             configuration. You don\'t need to restart the server in
             order for changes to take an effect}
           );
  
      # set the previous settings into the form's hidden fields, so we
      # know whether we have to do some changes or not
    map {$q->param("prev_$_",$c{$_}) } keys %vars_to_change;
  
      # raws for the table, go into the form
    my @configs = ();
  
      # prepare one textfield entries
    push @configs,
      map {
        $q->td(
             $q->b("$vars_to_change{$_}:"),
            ),
        $q->td(
             $q->textfield(-name      => $_,
                           -default   => $c{$_},
                           -override  => 1,
                           -size      => 20,
                           -maxlength => 50,
                          )
            ),
          } qw(name release);
  
      # prepare multiline textarea entries
    push @configs,
      map {
        $q->td(
             $q->b("$vars_to_change{$_}:"),
            ),
        $q->td(
             $q->textarea(-name    => $_,
                          -default => $c{$_},
                          -override  => 1,
                          -rows    => 10,
                          -columns => 50,
                          -wrap    => "HARD",
                          )
            ),
          } qw(comments);
  
    print $q->startform('POST',$q->url),"\n",
        $q->center($q->table(map {$q->Tr($_),"\n",} @configs),
                   $q->submit('','Update!'),"\n",
                  ),
        map ({$q->hidden("prev_".$_, $q->param("prev_".$_))."\n" }
             keys %vars_to_change), # hidden previous values
        $q->br,"\n",
        $q->endform,"\n",
        $q->hr,"\n",
        $q->end_html;
  
  } # end sub conf_modification_form

Once updated the script generates a file like:

  $c{release}  =  '5.6';
  
  $c{name}  =  'Gurusamy Sarathy';
  
  $c{comments}  =  'Perl rules the world!';
  
  1;

[TOC]


Reloading handlers

If you want to reload perlhandler on each invocation, the following trick will do it:

  PerlHandler "sub { do 'MyTest.pm'; MyTest::handler(shift) }"

do() will reload MyTest.pm every request.

[TOC]


Name collisions with Modules and libs

To make things clear before we go into details: each child process has its own %INC hash which is used to store information about its compiled modules. The keys of the hash are the names of the modules or parameters passed to require() (use()). The values are the full or relative paths to these modules/files. Let's say we have my-lib.pl and MyModule.pm both located at /home/httpd/perl/my/.

I'm talking about single child below!

Let's look at 3 faulty script's name space related scenarios:

Scenario 1

First, You can't have 2 identical module names running under the same server! Only the first one use()'d or require()'d will be compiled into the package, the request to the other identical module will be skipped since server will think that it's already compiled. It's already in the child's %INC. (See Watching the server section to find out how you can know what is loaded and where)

So if you have two different Foo modules in two different directories and two scripts script1.pl and script2.pl, placed like:

  ./perl/tool1/Foo.pm
  ./perl/tool1/tool1.pl
  ./perl/tool2/Foo.pm
  ./perl/tool2/tool2.pl

Where a sample code could be:

  ./perl/tool1/tool1.pl
  --------------------
  use Foo;
  print "Content-type: text/html\n\n";
  print "I'm Script number One<BR>\n";
  foo();
  --------------------

  ./perl/tool1/Foo.pm
  --------------------
  sub foo{
    print "<B>I'm Tool Number One!</B><BR>\n";
  }
  1;
  --------------------

  ./perl/tool2/tool2.pl
  --------------------
  use Foo;
  print "Content-type: text/html\n\n";
  print "I'm Script number Two<BR>\n";
  foo();
  --------------------

  ./perl/tool2/Foo.pm
  --------------------
  sub foo{
    print "<B>I'm Tool Number Two!</B><BR>\n";
  }
  1;
  --------------------

And both scripts call: use Foo; -- only the first one called will know about Foo, when you will call the second script it will not know about Foo at all - it's like you've forgotten to write use Foo;. Run the server in single server mode to detect this kind of bug immediately.

You will see the following in the error_log file:

  Undefined subroutine
  &Apache::ROOT::perl::tool2::tool2_2epl::foo called at
  /home/httpd/perl/tool2/tool2.pl line 4.

Scenario 2

The above is true for the files you require() as well (assuming that the required files do not declare a package). If you have:

  ./perl/tool1/config.pl
  ./perl/tool1/tool1.pl
  ./perl/tool2/config.pl
  ./perl/tool2/tool2.pl

And both scripts do:

  use lib qw(.);
  require "config.pl";

While the content of the scripts and config.pl files is exactly like in the example above. Only the first one will actually do the require(), all for the same reason that %INC already includes the key "config.pl"! The second scenario is not different from the first one, since there is no difference between use() and require() if you don't have to import some symbols into a calling script.

Scenario 3

What's interesting that the following scenario wouldn't work too!

  ./perl/tool/config.pl
  ./perl/tool/tool1.pl
  ./perl/tool/tool2.pl

where tool1.pl and tool2.pl both require() the same config.pl.

There are 3 solutions for that: (make sure you read the whole item 3)

Solution 1

The first two faulty scenarios can be solved by placing your library modules in a subdirectory structure so that they have different path prefixes. The file system layout will be something like:

  ./perl/tool1/Tool1/Foo.pm
  ./perl/tool1/tool1.pl
  ./perl/tool2/Tool2/Foo.pm
  ./perl/tool2/tool2.pl

And modify the scripts:

  use Tool1::Foo;
  use Tool2::Foo;

For require() (scenario number 2) use the following:

  ./perl/tool1/tool1-lib/config.pl
  ./perl/tool1/tool1.pl
  ./perl/tool2/tool2-lib/config.pl
  ./perl/tool2/tool2.pl

And each script does respectively:

  use lib qw(.);
  require "tool1-lib/config.pl";

  use lib qw(.);
  require "tool2-lib/config.pl";

But this solution isn't good, since while it might work for you now, if you add another script that wants to use the same module or config.pl file, it still wouldn't work as we saw in the third scenario. So let's see better solutions.

Solution 2

Another option is to use a full path to the script, so it'll be compiled into the name of the key in the %INC;

  require "/full/path/to/the/config.pl";

This solution solves the first two scenarios. I was surprised but it worked for the third scenario as well!

But with this solution you loose portability! (If you move the tool around in the file system you will have to change the base dir)

Solution 3

Declare a package in the required files! (Of course it should be unique to the rest of the package names you use!) The %INC will use the package name for the key! It's a good idea to build at least 2 level package names for your private modules. (e.g. MyProject::Carp and not Carp, since it will collide with existent standard package.) Even if as of the time of your coding it doesn't exist yet - it might enter the next perl distribution as a standard module and your code will become broken. Foresee problems like this and save you a future trouble.

What are the implications of package declaration?

When you use()'d or require()'d files without package declarations, it was very convenient since all the variables and subroutines were part of the main:: package, so any of them could be used as if they were part of the main script. With package declarations things get more complicated. To be correct -- not complicated, but awkward, since you will have to use Package::function() method to call a subroutine from a package Package and to access a global variable inside the same package you will have to write $Package::some_variable, you get a kind of typing overhead. You will be unable to access lexically defined variables inside Package (declared with my()).

ou still can leave your scripts unchanged if you import the names of the global variables and subs into a main:: package's namespace, like:

  use Module qw(:mysubs sub_b $var1 :myvars);

You can export both -- subroutines and global variables. This isn't a good approach since it'll consume more memory for the current process. (See perldoc Exporter for information about exporting variables and other symbols)

This solution completely covers the third scenario. By using different module names in package declarations, as explained above you solve the first two as well.

See also perlmodlib and perlmod manpages.

From the above discussion it should be clear that you cannot run a development and a production versions of the tools using the same apache server!

You have to run a separate server for each (it still can be the same machine, but the server will use a different port).

[TOC]


More package name related issues

If you have the following:

  PerlHandler Apache::Work::Foo
  PerlHandler Apache::Work::Foo::Bar

If you make a request that pulls in Apache/Work/Foo/Bar.pm first, then the Apache::Work::Foo package gets defined, so mod_perl does not try to pull in Apache/Work/Foo.pm

[TOC]


__END__ or __DATA__ tokens

Apache::Registry scripts cannot contain __END__ or __DATA__ tokens.

Why? Because Apache::Registry scripts are being wrapped into a subroutine called handler, like the script at URI /perl/test.pl:

  print "Content-type: text/plain\n\n";
  print "Hi";

When the script is being executed under Apache::Registry handler, it actually becomes:

  package Apache::ROOT::perl::test_2epl;
  use Apache qw(exit);
  sub handler {
    print "Content-type: text/plain\n\n";
    print "Hi";
  }

So if you happen to put an __END__ tag, like:

  print "Content-type: text/plain\n\n";
  print "Hi";
  __END__
  Some text that wouldn't be normally executed

it will be turned into:

  package Apache::ROOT::perl::test_2epl;
  use Apache qw(exit);
  sub handler {
    print "Content-type: text/plain\n\n";
    print "Hi";
    __END__
    Some text that wouldn't be normally executed
  }

and you try to execute this script, you will receive the following warning:

  Missing right bracket at q line 4, at end of line

And that's clear, Perl cuts everything after __END__ tag. The same thing applies to __DATA__ tag.

Also, rememeber that whatever applies to Apache::Registry scripts, in most cases applies to Apache::PerlRun scripts.

[TOC]


Output from system calls

Output of system(), exec(), and open(PIPE,"|program") calls will not be sent to the browser unless your Perl was configured with sfio.

[TOC]


Using format() and write()

The Perl tie()'d filehandle interface is not complete, format() / write() are ones of the missing pieces. If you configure Perl with sfio, write() and format() should work just fine.

[TOC]


Using exit()

Perl's exit() built-in function cannot be used in mod_perl scripts. Calling it causes the server child to exit (which makes the whole idea of using mod_perl irrelevant.) The Apache::exit() function should be used instead.

You might start your scripts by overriding the exit sub (if you use Apache::exit() directly, you will have a problem testing the script from the shell, unless you stuff use Apache (); into your code.) I use the following code:

  BEGIN {
      # Auto-detect if we are running under mod_perl or CGI.
    $USE_MOD_PERL = ( (exists $ENV{'GATEWAY_INTERFACE'}
                   and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/)
                      or exists $ENV{'MOD_PERL'} ) ? 1 : 0;
  }
  use subs (exit);
  
  # Select the correct exit way
  ########
  sub exit{
      # Apache::exit(-2) will cause the server to exit gracefully,
      # once logging happens and protocol, etc  (-2 == Apache::Constants::DONE)
    $USE_MOD_PERL ? Apache::exit(0) : CORE::exit(0);
  }

Now the correct exit() will be always chosen, whether you run the script as a CGI or from the shell.

Note that if you run the script under Apache::Registry, The Apache function exit() overrides the Perl core built-in function. While you see the exit() listed in @EXPORT_OK of Apache package, Apache::Registry makes something you don't see and imports this function for you. This means that if your script is running under Apache::Registry handler (Apache::PerlRun as well), you don't have to worry about exit().

Note that if you still use CORE::exit() in your scripts running under modperl, the child will exit, but neither proper exit nor logging will happen on the way. CORE::exit() cuts off the server's legs... If you need to properly shutdown the child , use $r-&gt;child_terminate (which sets the internal MaxRequestsPerChild so the child will exit).

You can accomplish this in two ways - in the Apache::Registry script:

  Apache->request->child_terminate;

in httpd.conf:

  PerlFixupHandler "sub { shift->child_terminate }"

[TOC]


Running from shell

Your scripts will not run from the command line (yet) unless you use CGI::Switch or CGI.pm and perl 5.004+ and do not make any direct calls to Apache-&gt;methods.

[TOC]


I/O is different

If you are using Perl 5.004 or better, most CGI scripts can run under mod_perl untouched. If you're using 5.003, Perl's built-in read() and print() functions do not work as they do under CGI. If you're using CGI.pm, use $query-&gt;print instead of plain 'ol print().

[TOC]


Special Perl Variables

A special Perl variables like $| (buffering), $^T (time), $^W (warnings), $/ (input record separator), $\ (output record separator) and many more are all global variables. It means that you cannot localize them with my(). Only local() is permitted to do that. Since the child server doesn't quit - if in one of your scripts you modify the global varible it'll be changed for the rest of the process' life and would affect all the scripts that will be executed under the same process.

Remembering this you should never write a code like this. We will exercise the input record separator variable. If you undefine this variable, a diamond operator will suck the whole file at once.

  $/ = undef; 
  open IN, "file" ....
    # slurp it all inside a variable
  $all_the_file = <IN>;

The proper way is to have a local() keyword before the special variable is being changed, like:

  local $/ = undef; 
  open IN, "file" ....
    # slurp it all inside a variable
  $all_the_file = <IN>;

But there is a little catch. local() will propogate the changed value to any of the code below it and would be in effect untill the script will be finished, if not modified in some other place.

A cleaner approach is to embrace the whole code that is being effected by the modificated variable in to a block, like:

  {
    $/ = undef; 
    open IN, "file" ....
      # slurp it all inside a variable
    $all_the_file = <IN>;
  }

That way when Perl leaves the block, it restores the original value of the $/ variable. So you should worry about this variable anywhere else in scope of your program.

[TOC]


Generating correct HTTP Headers

When writing your own handlers with Perl API the proper way to send the HTTP Header is to set the header first and then to send it. Like:

  $r->content_type('text/html');
  $r->send_http_header;
  return OK if $r->header_only;

If the client issues a HTTP HEAD request rather than the usual GET, to be compilent with the HTTP protocol we better will not send the document body, but the the HTTP header only. When Apache receives a HEAD request, it sets header_only() to true. If we see that this has happened, we return from the handler immediately with an OK status code.

Generally, you don't need the explicit content type setting, since Apache does it for you, by looking up the MIME type of the request by matching the extension of the URI in the MIME tables (from the mime.types file). So if the request URI is /welcome.html, the text/html content-type will be picked. However for CGI scripts or URIs that cannot be mapped by a known extension, you should set the appropriate type by using content_type() method.

The situation is a little bit different with Apache::Registry and alike handlers. It means that if you take a basic CGI script like:

  print "Content-type: text/plain\n\n";
  print "Hello world";

it wouldn't work, because the HTTP header will be not sent. By default, mod_perl does not send any headers by itself, however, you may wish to change this by adding:

  PerlSendHeader On

in <Location> part of your configuration. Now the response line and common headers will be sent as they are by mod_cgi. And, just as with mod_cgi, PerlSendHeader will not send the MIME type and a terminating double newline. Your script must send that itself, e.g.:

  print "Content-type: text/html\r\n\r\n";

Note, that the book always uses ``\n\n'' and not ``\r\n\r\n''. The latter is a way to send new lines as defined in HTTP standards, but as of this moment all the browsers accept the former format as well. To follow strictly the HTTP protocol you must you the ``\r\n'' format.

The PerlSendHeader On directive tells mod_perl to intercept anything that looks like a header line (such as Content-Type: text/plain) and automatically turn it into a correctly formatted HTTP/1.0 header, the same way it happens with CGI scripts running under mod_cgi. This allows you to keep your CGI scripts unmodified.

There is $ENV{PERL_SEND_HEADER} which tells whether PerlSendHeader is On or Off. You can use it in your module like:

 if($ENV{PERL_SEND_HEADER}) {
     print "Content-type: text/html\n\n";
 }
 else {
     my $r = Apache->request;
     $r->content_type('text/html');
     $r->send_http_header;
 }

If you use CGI.pm's header() function to generate HTTP headers, you do not need to activate this directive because CGI.pm detects mod_perl and calls send_http_header() for you. However, it does not hurt to use this directive anyway.

There is no free lunch -- you get the mod_cgi behavior on cost of little but still overhead of parsing the text that is being sent, and mod_perl makes the assumption that individual headers are not split across print statements.

The Apache::print() routine has to gather up the headers that your script outputs, in order to pass them to $r-&gt;send_http_header. This happens in src/modules/perl/Apache.xs (print) and Apache/Apache.pm (send_cgi_header). There is a shortcut in there, namely the assumption that each print statement contains one or more complete headers. If for example you used to generate a Set-Cookie header by multiply print() statements, like:

   print "Content-type: text/html\n";
   print "Set-Cookie: iscookietext\; ";
   print "expires=Wednesday, 09-Nov-1999 00:00:00 GMT\; ";
   print "path=\/\; ";
   print "domain=\.mmyserver.com\; ";
   print "\n\n";
   print "hello";

your generated Set-Cookie header is split over a number of print statements and gets lost. The above example wouldn't work! Try this instead:

   print "Content-type: text/html\n";
   my $cookie = "Set-Cookie: iscookietext\; ";
   $cookie .= "expires=Wednesday, 09-Nov-1999 00:00:00 GMT\; ";
   $cookie .= "path=\/\; ";
   $cookie .= "domain=\.mmyserver.com\; ";
   print $cookie;
   print "\n\n";
   print "hello";

Sometimes when you call a script you see an ugly "Content-Type: text/html" displayed at the top of the page, and of course the HTML code becomes broken. As you understand from the above discussion this generally happens when your code already send the header, that's why you see it rendered into a browser's page. This might happen when you call the CGI.pm $q-&gt;header method or mod_perl's $r-&gt;send_http_header.

If you have a complicated application where the header might be generated from many different places, depending on the calling logic, you might want to write a special subroutine that sends a header, and keeps a track whether the header has been already sent. Of course you can use a global variable to flag that the header has been already sent, but there is another elegant solution, where the closure effect is a desired feature.

Just copy the code below, including the block's curly braces. And everywhere in your code you print the header use the print_header() subroutine. $need_header is the same kind of beast as a static variable in C, so it remembers its value from call to call. The first time you will call the print_header, the value of $need_header will become zero and on the subsequent calls if any happens, the header will be not sent any more.

  {
    my $need_header = 1;
    sub print_header {
      my type = shift || "text/html";
      print("Content-type: $type\n\n),$need_header = 0 if $need_header;
    }
  }

In your code you call the above subroutine as:

  print_header();

or

  print_header("text/plain);

if you want to override the default (text/html) MIME type.

Let's make our smart method to elaborate with PerlSendHeader directive settings, to always do the right thing. It's especially important if you write an application that you are going to distribute, hopefully as an Open Source.

  {
    my $need_header = 1;
    sub print_header {
      my type = shift || "text/html";
      return unless $need_header;
      $need_header = 0;
      if($ENV{PERL_SEND_HEADER}) {
        print "Content-type: $type\n\n";
      }
      else {
        my $r = Apache->request;
        $r->content_type($type);
        $r->send_http_header;
      }
    }
  }

You can continue to improve this subroutine even further to handle additional headers, like cookies and alike.

[TOC]


NPH (Non Parsed Headers) scripts

To run a Non Parsed Header CGI script under mod_perl, simply add to your code:

  local $| = 1;

And if you normally set PerlSendHeader On, add this to your server's configuration file:

  <Files */nph-*>
    PerlSendHeader Off
  </Files>

[TOC]


BEGIN blocks

Perl executes BEGIN blocks during the compile time of code as soon as possible. The same is true under mod_perl. However, since mod_perl normally only compiles scripts and modules once -- either in the parent server or once per-child -- BEGIN blocks in that code will only be run once. As perlmod manpage explains, once a BEGIN has run, it is immediately undefined. In the mod_perl environment, this means BEGIN blocks will not be run during each incoming request unless that request happens to be one that is compiling the code.

BEGIN blocks in modules and files pulled in via require() or use() will be executed:

BEGIN blocks in Apache::Registry scripts will be executed, as above plus:

Make sure you read Evil things might happen when using PerlFreshRestart.

[TOC]


END blocks

As perlmod explains, an END subroutine is executed as late as possible, that is, when the interpreter exits. In the mod_perl environment, the interpreter does not exit until the server shutdown. However, mod_perl does make a special case for Apache::Registry scripts.

Normally, END blocks are executed by Perl during its perl_run() function, which is called once each time the Perl program is executed, e.g. once per (mod_cgi) CGI scripts. However, mod_perl only calls perl_run() once, during server startup. Any END blocks encountered during main server startup, i.e. those pulled in by the PerlRequire or by any PerlModule, are suspended and run at server shutdown, aka child_exit() (requires apache 1.3b3+).

Any END blocks that are encountered during compilation of Apache::Registry scripts are called after the script has completed (not during the cleanup phase though) including subsequent invocations when the script is cached in memory.

All other END blocks encountered during other Perl*Handler call-backs, e.g. PerlChildInitHandler, will be suspended while the process is running and called during child_exit() when the process is shutting down. Module authors might wish to use $r-&gt;register_cleanup() as an alternative to END blocks if this behavior is not desirable. $r-&gt;register_cleanup() is being called at the CleanUp processing phase of each request and thus can be used to emulate plain perl's END{} block behavior.

The last paragraph is very important for the Handling the 'User pressed Stop button' case.

[TOC]


Switches -w, -T

Normally when you run perl from the command line or have the shell invoke it with `#!', you may choose to pass perl switch arguments such as -w or -T. Most command line arguments have a equivalent special variable. For example, the $^W variable corresponds to the -w switch. Consult perlvar manpage for more details. With mod_perl it is also possible to turn on warnings globally via the PerlWarn directive:

  PerlWarn On

You can turn it off with local $^W = 0; in your scripts on the local basis (or inside the block). If you write $^W = 0; you disable the warning mode everywhere, the same with $^W = 1;.

The switch which enables taint checks does not have a special variable, so mod_perl provides the PerlTaintCheck directive to turn on taint checks. In httpd.conf, enable with:

  PerlTaintCheck On

Now, any and all code compiled inside httpd will be taint checked.

The environment variable PERL5OPT can be used to set additional perl startup flags such as -d and -D. See Apache::PerlRun .

If you have the shebang line (#!/bin/perl -Tw) in your script, -w will be honored (which means that you have turned the warn mode on for the scope of this script, -T will produce a warning if PerlTaintCheck is not On.

[TOC]


strict pragma

It's _absolutely_ mandatory (at least for development) to start all your scripts with:

  use strict;

If needed, you can always turn off the 'strict' pragma or a part of it inside the block, e.g:

  {
    no strict 'refs';
    ... some code
  }

It's more important to have strict pragma enabled under mod_perl than anywhere else. While it's not required, it is strongly recommended, it will save you more time in the long run. And, of course, clean scripts will still run under mod_cgi (plain CGI)!

[TOC]


Turning warnings ON

Have a local $^W=1 in the script or PerlWarn ON at the server configuration file. Turning the warning on will save you a lot of troubles with debugging your code. Note that all perl switches, but -w in the first magic (shebang) line of the script #!/perl -switches are being ignored by mod_perl. If you write -T you will be warned to set PerlTaintCheck ON in the config file.

If you need -- you can always turn off the warnings with local $^W=0 in your code if you have some section you don't want the perl compiler to warn in. The correct way to do this is:

  {
   local $^W=0;
    # some code
  }

It preserves the previous value of $^W when you quit the block (so if it was set before, it will return to be set at the leaving of the block.

In production code, it can be a good idea to turn warnings off. Otherwise if your code isn't very clean and spits a few lines of warnings here and there, you will end up with a huge error_log file in a short time on the heavily loaded server. Also, enabling runtime warning checking has a small performance impact -- in any script, not just under mod_perl -- so your approach should be to enable warnings during development, and then disable them when your code is production-ready. Controlling the warnings mode through the httpd.conf is much better, since you can control the behavior of all of the scripts from a central place. I have PerlWarn On on my development server and PerlWarn Off on the production machine.

diagnostics pragma can shed more light on the errors and warnings you see, but again, it's better not to use it in production, since otherwise you incur a huge overhead of the diagnostics pragma examining every bit of the code mod_perl executes. (You can run your script with -dDprof to check the overhead. See Devel::Dprof for more info).

[TOC]


diagnostics pragma

This is a Perl compiler pragma which forces verbose warning diagnostics. Put at the start of your scripts:

  use diagnostics;

This pragma turns on the -w mode, but gives you much better diagnostics of the errors and warnings encountered. Generally it explains the reason for warnings/errors you get, shows you an example of code where the same kind of warning is being triggered, and tells you the remedy.

Again, it's a bad idea to keep it in your production code, as it will spit 10 and more lines of diagnostics messages into your error_log file for every warning perl will report for the first time (per invocation). Also, it will add a significant overhead to the code's runtime. (I discovered this by using Devel::DProf!)

[TOC]


Passing ENV variables to CGI

To pass an environment variable from a configuration file, add to it:

  PerlSetEnv key val
  PerlPassEnv key

e.g.:

  PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1"

will set $ENV{PERLDB_OPTS}, and it'll be accessible in every child.

%ENV is only setup for CGI emulation. If you are using the API, you should use $r-&gt;subprocess_env, $r-&gt;notes or $r-&gt;pnotes for passing data around between handlers. %ENV is slow because it must update the underlying C environment table, which also exposes that data to systems who can view with ps.

In any case, %ENV and the tables used by those methods are all cleared after the request is served. so, no, $ENV{SESSION_ID} will not be swaped or reused by different http requests.

[TOC]


Global Variables

It's always a good idea to stay away from global variables when possible. Some variables must be global so Perl can see them, such as a module's @ISA or $VERSION variables (or fully qualified @MyModule::ISA). In common practice, a combination of strict and vars pragmas keeps modules clean and reduces a bit of noise. However, vars pragma also creates aliases as the Exporter does, which eat up more memory. When possible, try to use fully qualified names instead of use vars. Example:

  package MyPackage;
  use strict;
  @MyPackage::ISA = qw(...);
  $MyPackage::VERSION = "1.00";

vs.

  package MyPackage;
  use strict;
  use vars qw(@ISA $VERSION);
  @ISA = qw(...);
  $VERSION = "1.00";

Also see Using global variables and sharing them

[TOC]


Code has been changed, but it seems the script is running the old code

Files pulled in via use or require statements are not automatically reloaded when changed on disk. See Reloading Modules and Required Files for more info.

[TOC]


Apache and syslog

When native syslog support is enabled, the stderr stream will be redirected to /dev/null!

It has nothing to do with mod_perl (plain Apache does the same). Doug wrote a Apache::LogSTDERR module to work around this

[TOC]


Memory leakage

Scripts under mod_perl can very easily leak memory! Global variables stay around indefinitely, lexical variables (declared with my() are destroyed when they go out of scope, provided there are no references to them from outside of that scope.

Perl doesn't return the memory it acquired from the kernel. It does reuse it though!

First example demonstrates reading in a whole file:

  open IN, $file or die $!;
  $/ = undef; # will read the whole file in
  $content = <IN>;
  close IN;

If your file is 5Mb, the child who served that script will grow exactly by that size. Now if you have 20 children and all of them will serve this CGI, all of them will consume additional 20*5M = 100M of RAM! If that's the case, try to use other approaches of processing the file, if possible of course. Try to process a line at a time and print it back to the file. (If you need to modify the file itself, use a temporary file. When finished, overwrite the source file, make sure to provide a locking mechanism!)

Second example demonstrates copying variables between functions (passing variables by value). Let's use the example above, assuming we have no choice but to read the whole file before any data processing takes place. Now you have some imagine process() subroutine that processes the data and returns it back. What happens if you pass the $content by value? You have just copied another 5M and the child has grown by another 5M in size (watch your swap space!) now multiply it again by factor of 20 you have 200M of wasted RAM, which will be apparently reused but it's a waste! Whenever you think the variable can grow bigger than few Kb, pass it by reference!

Once I wrote a script that passed a content of a little flat file DataBase to a function that processed it by value -- it worked and it was processed fast, but with a time the DataBase became bigger, so passing it by value was an overkill -- I had to make a decision, whether to buy more memory or to rewrite the code. It's obvious that adding more memory will be merely a temporary solution. So it's better to plan ahead and pass the variables by reference, if a variable you are going to pass might be bigger than you think at the time of your coding process. There are a few approaches you can use to pass and use variables passed by reference. For example:

  my $content = qq{foobarfoobar};
  process(\$content);
  sub process{
    my $r_var = shift; 
    $$r_var =~ s/foo/bar/gs;
      # nothing returned - the variable $content outside has been
      # already modified
  }
  
  @{$var_lr} -- dereferences an array
  %{$var_hr} -- dereferences a hash

For more info see perldoc perlref.

Another approach would be to directly use a @_ array. Using directly the @_ array serves the job of passing by reference!

  process($content);
  sub process{
    $_[0] =~ s/foo/bar/gs;
      # nothing returned - the variable $content outside has been
      # already modified
  }

From perldoc perlsub:

      The array @_ is a local array, but its elements are aliases for
      the actual scalar parameters.  In particular, if an element
      $_[0] is updated, the corresponding argument is updated (or an
      error occurs if it is not possible to update)...

Be careful when you write this kind of subroutines, since it can confuse a potential user. It's not obvious that call like process($content); modifies the passed variable -- programmers (which are the users of your library in this case) are used to subs that either modify variables passed by reference or return the processed variable (e.g. $content=process($content);).

Third example demonstrates a work with DataBases. If you do some DB processing, many times you encounter the need to read lots of records into your program, and then print them to the browser after they are formatted. (I don't even mention the horrible case where programmers read in the whole DB and then use perl to process it!!! Use a relational DB and let the SQL do the job, so you get only the records you need!!!).

We will use DBI for this (assume that we are already connected to the DB) (refer to perldoc DBI for a complete manual of the DBI module):

  $sth->execute;
  while(@row_ary  = $sth->fetchrow_array;) {
        <do DB accumulation into some variable>
  }
  <print the output using the the data returned from the DB>

In the example above the httpd_process will grow up by the size of the variables that have been allocated for the records that matched the query. (Again remember to multiply it by the number of the children your server runs!).

A better approach is to not accumulate the records, but rather print them as they are fetched from the DB. Moreover, we will use the bind_col() and $sth-&gt;fetchrow_arrayref() (aliased to $sth-&gt;fetch()) methods, to fetch the data in the fastest possible way. The example below prints a HTML TABLE with matched data, the only memory that is being used is a @cols array to hold temporary row values:

  my @select_fields = qw(a b c);
      # create a list of cols values
  my @cols = ();
  @cols[0..$#select_fields] = ();
  $sth = $dbh->prepare($do_sql);
  $sth->execute;
    # Bind perl variables to columns.
  $sth->bind_columns(undef,\(@cols));
  print "<TABLE>";
  while($sth->fetch) {
     print "<TR>",
           map("<TD>$_</TD>", @cols),
           "</TR>";
  }
  print "</TABLE>";

Note: the above method doesn't allow you to know how many records have been matched. The workaround is to run an identical query before the code above where you use SELECT count(*) ... instead of 'SELECT * ... to get the number of matched records. It should be much faster, since you can remove any SORTBY and alike attributes.

For those who think that $sth->rows will do the job, here is the quote from the DBI manpage:

  rows();

  $rv = $sth->rows;

  Returns the number of rows affected by the last database altering
  command, or -1 if not known or not available.  Generally you can
  only rely on a row count after a do or non-select execute (for some
  specific operations like update and delete) or after fetching all
  the rows of a select statement.

  For select statements it is generally not possible to know how many
  rows will be returned except by fetching them all.  Some drivers
  will return the number of rows the application has fetched so far
  but others may return -1 until all rows have been fetched. So use of
  the rows method with select statements is not recommended.

As a bonus, I wanted to write a single sub that flexibly processes any query, accepting: conditions, call-back closure sub, select fields and restrictions.

  # Usage:
  # $o->dump(\%conditions,\&callback_closure,\@select_fields,@restrictions);
  #
  sub dump{
    my $self = shift;
    my %param = %{+shift}; # dereference hash
    my $rsub = shift;
    my @select_fields = @{+shift}; # dereference list
    my @restrict = shift || '';
  
      # create a list of cols values
    my @cols = ();
    @cols[0..$#select_fields] = ();
  
    my $do_sql = '';
    my @where = ();
  
      # make a @where list 
    map { push @where, "$_=\'$param{$_}\'" if $param{$_};} keys %param;
  
      # prepare the sql statement
    $do_sql = "SELECT ";
    $do_sql .= join(" ", @restrict) if @restrict;# append the restriction list
    $do_sql .= " " .join(",", @select_fields) ;      # append the select list 
    $do_sql .= " FROM $DBConfig{TABLE} ";         # from table
  
      # we will not add the WHERE clause if @where is empty
    $do_sql .= " WHERE " . join " AND ", @where if @where;
  
    print "SQL: $do_sql \n" if $debug;
  
    $dbh->{RaiseError} = 1;     # do this, or check every call for errors
    $sth = $dbh->prepare($do_sql);
    $sth->execute;
      # Bind perl variables to columns.
    $sth->bind_columns(undef,\(@cols));
    while($sth->fetch) {
      &$rsub(@cols);
    }
      # print the tail or "no records found" message
      # according to the previous calls
    &$rsub();
  
  } # end of sub dump

Now a callback closure sub can do lots of things. We need a closure to know what stage are we in: header, body or tail. For example, we want a callback closure for formatting the rows to print:

  my $rsub = eval {
      # make a copy of @fields list, since it might go
      # out of scope when this closure will be called
    my @fields = @fields; 
    my @query_fields = qw(user dir tool act); # no date field!!!
    my $header = 0;
    my $tail   = 0;
    my $counter = 0;
    my %cols = (); # columns name=> value hash
  
    # Closure with the following behavior:
    # 1. Header's code will be executed on the first call only and
    #    if @_ was set
    # 2. Row's printing code will be executed on every call with @_ set
    # 3. Tail's code will be executed only if Header's code was
    #    printed and @_ isn't set
    # 4. "No record found" code will be executed if Header's code
    #    wasn't executed
  
    sub {
          # Header
        if (@_ and !$header){
          print "<TABLE>\n";
          print $q->Tr(map{ $q->td($_) } @fields );
          $header = 1; 
        }
        
          # Body
        if (@_) {
          print $q->Tr(map{$q->td($_)} @_ );
          $counter++;
          return; 
        }
        
          # Tail, will be printed only at the end
        if ($header and !($tail or @_)){
          print "</TABLE>\n $counter records found";
          $tail = 1;
          return;
        }
        
          # No record found
        unless ($header){
          print $q->p($q->center($q->b("No record was found!\n")));
        }
  
      }  #  end of sub {}
  };  #  end of my $rsub = eval {

You might also want to check Limiting the size of the processes and Limiting the resources used by httpd children.

[TOC]


Filehandlers and locks leakages

When you write a script running under mod_cgi, you can get away with sloppy programming, like opening a file and letting the interpreter to close it for you when the script had finished his run:

  open IN, "in.txt" or die "Cannot open in.txt for reading : $!\n";

For mod_perl you must close() the files you opened!

  close IN;

somewhere before the end of the script, since if you forget to close(), you might get a file descriptor leakage and unlock problem (if you flock()ed on this file descriptor). Even if you do have it, but for some reason the interpreter was stopped before the cleanup call, because of various reasons, such as user aborted script ( See Handling the 'User pressed Stop button' case) the leakage is still there. In a long run your machine might get run out of file descriptors, and even worse - file might be left locked and unusable by other invocations of the same and other scripts.

What can you do? Use IO::File (and other IO::* modules), which allows you to assign the file handler to variable, which can be my() (lexically) scoped. And when this variable goes out of scope the file or other file system entity will be properly closed and unlocked (if it was locked). Lexically scoped variable will always go out of scope at the end of the script's run even if it was aborted in the middle or before the end if it was defined inside some internal block. For example:

  {
    my $fh = new IO::File("filename") or die $!;
    # read from $fh
  } # ...$fh is closed automatically at end of block, without leaks.

As I have just mentioned, you don't have to create a special block for this purpose, for a file the code is written in is a virtual block as well, so you can simply write:

  my $fh = new IO::File("filename") or die $!;
    # read from $fh
    # ...$fh is closed automatically at end of block, without leaks.

What the first technique (using { BLOCK }) makes sure is that the file will be closed the moment, the block is finished.

But even faster and lighter technique is to use Symbol.pm:

  my $fh = Symbol::gensym();
  open $fh, "filename" or die $!

Use these approaches to ensure you have no leakages, but don't be lazy to write close() statements, make it a habit.

[TOC]


The Script is too dirty, but It does the job and I can't afford rewriting it.

You still can win from using mod_perl.

One approach is to replace the Apache::Registry handler with Apache::PerlRun and define a new location (the script can reside in the same directory on the disk.

  # srm.conf
  Alias /cgi-perl/ /home/httpd/cgi/
  
  # httpd.conf
  <Location /cgi-perl>
    #AllowOverride None
    SetHandler perl-script
    PerlHandler Apache::PerlRun
    Options ExecCGI
    allow from all
    PerlSendHeader On
  </Location>

See Apache::PerlRun - a closer look

Another ``bad'', but working method is to set MaxRequestsPerChild to 1, which will force each child to exit after serving only one request, so you'll get the preloaded modules, etc., the script will be compiled each request, then killed off. This isn't good for ``high-traffic'' sites though, as the parent server will need to fork a new child each time one is killed, but you can fiddle with MaxStartServers, MinSpareServers, to make the parent spawn more servers ahead so the killed one will be immediately replaced with the fresh one. Again, probably that's not what you want.

[TOC]


Apache::PerlRun - a closer look

Apache::PerlRun gives you a benefit of preloaded perl and its modules. This module's handler emulates the CGI environment, allowing programmers to write scripts that run under CGI or mod_perl without any change. Unlike Apache::Registry, the Apache::PerlRun handler does not cache the script inside of a subroutine. Scripts will be ``compiled'' on each request. After the script has run, its name space is flushed of all variables and subroutines. Still, you don't have the overhead of loading the perl and compilation time of the standard modules (If your script is very light, but uses lots of standard modules - you will see no difference between Apache::PerlRun and Apache::Registry !).

Be aware though, that if you use packages that use internal variables that have circular references, they will be not flushed!!! Apache::PerlRun only flushes your script's name space, which does not include any other required packages' name spaces. If there's a reference to a my() scoped variable that's keeping it from being destroyed after leaving the eval scope (of Apache::PerlRun), that cleanup might not be taken care of until the server is shutdown and perl_destruct() is run, which always happens after running command line scripts. Consider this example:

  package Foo;
  sub new { bless {} }
  sub DESTROY {
    warn "Foo->DESTROY\n";
  }
  
  eval <<'EOF';
  package my_script;
  my $self = Foo->new;
  #$self->{circle} = $self;
  EOF
  
  print $@ if $@;
  print "Done with script\n";

First you'll see:

  Foo->DESTROY
  Done with script

Then, uncomment the line where $self makes a circular reference, and you'll see:

  Done with script
  Foo->DESTROY

In this case, under mod_perl you wouldn't see Foo-&gt;DESTROY until the server shutdown, or your module properly took care of things.

[TOC]


Sharing variables between processes

META: to be completed

[TOC]


Redirecting Errors to Client instead of error_log

To trap all/most Perl run-time errors and send the output to the client instead of Apache's error log add this line to your script.

  use CGI::Carp qw(fatalsToBrowser);

Refer to CGI::Carp man page for more related info.

Also you can write your custom DIE/WARN signal handler. I don't want users to see the error message, but I want it to be emailed to me if it's severe enough. The handler traps various errors and performs accordingly to the defined logic. My handler was written for the modperl environment, but works correctly when is being called from the shell. A stripped version of the code is shown here:

  # assign the DIE sighandler to call mydie(error_message) whenever a
  # die() sub is being called. Can be added anywhere in the code.
  local $SIG{'__DIE__'} = \&mydie;
  
Do not forget the C<local()>, unless you want this signal handler to
be invoked every time any scripts dies (Even those where this
treatment is undesirable)

  # and the handler itself
  sub mydie{
    my $why = shift;
  
    my $UNDER_MOD_PERL = ( (exists $ENV{'GATEWAY_INTERFACE'} 
                           and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/)
                         or exists $ENV{'MOD_PERL'} ) ? 1 : 0;
  
    chomp $why;
    my $orig_why = $why;                # an ASCII copy for email report
  
    # handle the shell execution case (so we will not get all the HTML)
    print("Error: $why\n"), exit unless $UNDER_MOD_PERL;
  
    my $should_email = 0;
    my $message = '';
  
    $why =~ s/[<&>]/"&#".ord($&).";"/ge;    # entity escape
  
    # Now we need to trap various kinds of errors, that come from CGI.pm
    # And we don't want these errors to be emailed to us, since
    # these aren't programmatical errors
    if ($orig_why =~ /Client attempted to POST (\d+) bytes/o) {
  
      $message = qq{
                  You can not POST messages bigger than 
                  @{[1024*$c{max_image_size}]} bytes.<BR>
                  You have tried to post $1 bytes<BR>
                  If you are trying to upload an image, make sure its size is not 
                  bigger than @{[1024*$c{max_image_size}]} bytes.<P>
                  Thank you!
                 };
  
    } elsif ($orig_why =~ /Malformed multipart POST/o) {
  
      $message = qq{
                  Have you tried to upload an image in the wrong way?<P>
                  To sucessfully upload an image you must use a browser that supports
                  image upload and use the 'Browse' button to select that image.
                  DO NOT type the path to the image into the upload field.<P>
                  Thank you!
                 };
  
    } elsif ($orig_why =~ /closed socket during multipart read/o) {
  
      $message = qq{
                  Have you pressed a 'STOP' button?<BR>
                  Please try again!<P>
                  Thank you!
                 };
  
    } else {
  
      $message = qq{
                    <B>There is no action to be performed on your side, since
                  the error report has been already sent to webmaster. <BR><P>
                  <B>Thank you for your patience!</B>
                 };
  
      $should_email = 1;
    }
  
  
    print qq|Content-type: text/html
  
  <HTML><BODY BGCOLOR="white">
  <B>Oops, An error has happened.</B><P>
    |;  
  
    print $message;
  
      # send email report if appropriate
    if ($should_email){
  
        # import sendmail subs
      use Mail ();
        # prepare the email error report:
      my $subject ="Error Report";
      my $body = qq|
    An error has happened:
  
    $orig_why
  
      |;
  
        # send error reports to admin and author
      send_mail($c{email}{'admin'},$c{email}{'admin'},$subject,$body);
      send_mail($c{email}{'admin'},$c{email}{'author'},$subject,$body);
      print STDERR "[".scalar localtime()."] [SIGDIE] Sending Error Email\n";
    }
  
       # print to error_log so we will know we've sent
    print STDERR "[".scalar localtime()."] [SIGDIE] $orig_why \n";
  
    exit 1;
  }                             # end of sub mydie
  

You may have noticed that I trap the CGI.pm's die() calls here, I don't see any reason why my users should see an ugly error messages, but that's the way CGI.pm written. The workaround is to trap them myself.

Please note that as of ver 2.49, CGI.pm provides a cgi_error() method to print the errors and wouldn't die() unless you want it.

[TOC]


Finding the line number the error/warning has been triggered at

Apache::Registry, Apache::PerlRun and modules that compile-via-eval confuse the line numbering. Other files that are read normally by Perl from disk have no problem with file name/line number.

If you compile with the experimental PERL_MARK_WHERE=1, it shows you almost the exact line number, where this is happening. Generally a compiler makes a shift in its line counter. You can always stuff your code with special compiler directives, to reset its counter to the value you will tell. At the beginning of the line you should write (the '#' in column 1):

  #line 298 myscript.pl
  or 
  #line 890 some_label_to_be_used_in_the_error_message

The label is optional - the filename of the script will be used by default. This specifies the line number of the following line, not the line the directive is on. You can use a little script to stuff every N lines of your code with these directives, but then you will have to rerun this script every time you add or remove code lines. The script:

    <META>
        This example was double incrementing $counter.
        I took the second increment out -- sgr.
    </META>

  #!/usr/bin/perl
  # Puts Perl line markers in a Perl program for debugging purposes.  
  # Also takes out old line markers.
  die "No filename to process.\n" unless @ARGV;
  my $filename = $ARGV[0];
  my $lines = 100;
  open IN, $filename or die "Cannot open file: $filename: $!\n";
  open OUT, ">$filename.marked"
      or die "Cannot open file: $filename.marked: $!\n";
  my $counter = 1;
  while (<IN>) {
    print OUT "#line $counter\n" unless $counter++ % $lines;
    next if $_ =~ /^#line /;
    print OUT $_;
  }
  close OUT;
  close IN;
  chmod 0755, "$filename.marked";

Also notice, that another solution is to move most of the code into a separare modules, which ensures that the line number will be reported correctly.

To have a complete trace of calls add:

  use Carp ();
  local $SIG{__WARN__} = \&Carp::cluck;

[TOC]


Forking or Executing subprocesses from mod_perl

Generally you should not fork from your mod_perl scripts, since when you do -- you are forking the entire apache web server, lock, stock and barrel. Not only is your perl code being duplicated, but so is mod_ssl, mod_rewrite, mod_log, mod_proxy, mod_spelling or whatever modules you have used in your server, all the core routines and so on.

A much wiser approach would be to spawn a sub-process, hand it the information it needs to do the task, and have it detach (close x3 + setsid()). This is wise only if the parent who spawns this process, immediately continue, you do not wait for the sub-process to complete. This approach is suitable for a situation when you want to trigger a long time taking process through the web interface, like processing some data, sending email to thousands of subscribed users and etc. Otherwise, you should convert the code into a module, and use its functions or methods to call from CGI script.

Just making a system() call defeats the whole idea behind mod_perl, perl interpreter and modules should be loaded again for this external program to run.

Basically, you would do:

  $params=FreezeThaw::freeze(
        [all data to pass to the other process]
        );
  system("program.pl $params");

and in program.pl :

  @params=FreezeThaw::thaw(shift @ARGV);
  # check that @params is ok
  close STDIN;
  close STDOUT;
  open STDERR, ">/dev/null";
  setsid(); # to detach

At this point, program.pl is running in the ``background'' while the system() returns and permits apache to get on with life.

This has obvious problems. Not the least of which is that @params must not be bigger then whatever your architecture's limit is (could depend on your shell).

Also, the communication is only one way.

However, you might want be trying to do the ``wrong thing''. If what you want is to send information to the browser and then do some post-processing, look into PerlCleanupHandler.

If you are interested in more deep level details, this is what actually happens when you fork and make a system call, like

  system("echo Hi"),exit unless fork();

What happens is that fork() gives you 2 execution paths and the child gets virtual memory sharing a copy of the program text (read only) and sharing a copy of the data space copy-on-write (remember why you pre-load modules in mod_perl?). In the above code a parent will immediately continue with the code that comes up after the fork, while the forked process will execute system("echo Hi") and then terminate itself. Note that you might need to set:

  $SIG{CHLD} = sub {wait};

or

  $SIG{CHLD} = IGNORE;

or the terminated process might become a zombie. Normally, every process has its parent, many processes a children of PID 1, the init process. Zombie, is a process that doesn't have a father. When the child quits, it reports the termination to his parent. If he doesn't know who the father is it becomes zombie. (META: Did I formulate it correctly?)

The only work is setting up the page tables for the virtual memory and the second process goes on its separate way.

Next, Perl will find /bin/echo along the search path, and invoke it directly. Perl system() is *not* system(3) [C-library]. Only when the command has shell meta-chars does Perl invoke a real shell. That's a *very* nice optimization.

Only if you do:

  system "sh -c 'echo foo'"

OS actually parses your command with a shell so you exec() a copy of /bin/sh, but since one is almost certainly already running somewhere, the system will notice that (via the disk inode reference) and replace your virtual memory page table with one pointed at the already-loaded program code plus your own data space. Then the shell parses the passed command.

Since it is echo, it will execute it as a built-in in the latter example or a /bin/echo in the former and be done, but this is only an example. You aren't calling system("echo Hi") in your mod_perl scripts, right? Since most other real things (heavy programs executed as a subprocess) would involve repeating the process to load the specified command or script (it might involve some actual demand paging from the program file if you execute new code).

The only place you see real overhead from this scheme is when the parent process is huge (unfortunately like mod_perl...) and the page table becomes large as a side effect. The whole point of mod_perl is to avoid having to fork() / exec() something on every hit, though. Perl can do just about anything by itself. However, you probably won't get in trouble until you hit about 30 forks/sec on a so-so pentium.

[TOC]


Passing and preserving custom data structures between handlers

Let's say that you wrote a few handlers to process a request, and they all need to share some custom Perl data structure. The pnotes() method comes to rescue. Taken that one of the handlers stored some data in hash %my_data, before it finished its activity:

   # First handler:
   $r->pnotes('my_info' => \%hash);

All the following handler will be able to retrive the stored data with.

   # Later handler:
   my $info = $r->pnotes('my_info');
   print $info->{foo};

The stored information will be destroyed at the end of the request.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 09/26/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.