Mod Perl Icon Mod Perl Icon Performance. Benchmarks.


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Performance: The Overall picture

Before we dive into performance issues, there is something very important to understand. It applies to any webserver, not only apache. All the efforts are made to make user's web browsing experience a swift. Among other web site usability factors, speed is one of the most crucial ones. What is a correct speed measurement? Since user is the one that interacts with web site, speed measurement is a time passed from the moment user follows a link or presses a submit button till the resulting page is being rendered by her browser. So if we trace the data packet's movement as it leaves user's machine (request sent) till the reply arrives, the packet travels through many entities on its way. It has to make its way through the network, passing many interconnection nodes, before it enters the target machine it might go through proxy (accelerator) servers, then it's being served by your server, and finally it has to make the whole way back. A webserver is only one of the elements the packet sees on its way. You could work hard to fine tune your webserver for the best performance, but a slow NIC (Network Interface Card) or slow network connection from your server might defeat it all. That's why it's important to think big and to be aware of possible bottlenecks between the server and the web. Of course there is nothing you can do if user has a slow connection on its behalf.

Moreover, you might tune your scripts and webserver to process incoming requests ultra fast, so you will need a little number of working servers, but you might find out that server processes are busy waiting for slow clients to complete the download. You will see more examples in this chapter.

My point is that a web service is like car, if one of the details or mechanisms is broken the car will not drive smoothly and it can even stop dead if pushed further without first fixing it.

[TOC]


Sharing Memory

A very important point is the sharing of memory. If your OS supports this (and most sane systems do), you might save more memory by sharing it between child processes. This is only possible when you preload code at server startup. However during a child process' life, its memory pages becomes unshared and there is no way we can control perl to make it allocate memory so (dynamic) variables land on different memory pages than constants, that's why the copy-on-write effect (will explain in a moment) will hit almost at random. If you are pre-loading many modules you might be able to balance the memory that stays shared against the time for an occasional fork by tuning the MaxRequestsPerChild to a point where you restart before too much becomes unshared. In this case the MaxRequestsPerChild is very specific to your scenario. You should do some measurements and you might see if this really makes a difference and what a reasonable number might be. Each time a child reaches this upper limit and restarts it should release the unshared copies and the new child will inherit pages that are shared until it scribbles on them.

It is very important to understand that your goal is not to have MaxRequestsPerChild to be 10000. Having a child serving 300 requests on precompiled code is already a huge speedup, so if it is 100 or 10000 it does not really matter if it saves you the RAM by sharing. Do not forget that if you preload most of your code at the server startup, the fork to spawn a new child will be very very fast, because it inherits most of the preloaded code and the perl interpreter from the parent process. But than, during the work of the child, its memory pages (which aren't really its yet, it uses the parent's pages) are getting dirty (originally inherited and shared variables are getting updated/modified) and the copy-on-write happens, which reduces the number of shared memory pages - thus enlarging the memory demands. Killing the child and respawning a new one, allows to get the pristine shared memory from the parent process again.

The conclusion is that MaxRequestsPerChild should not be too big, otherwise you loose the benefits of the memory sharing.

See Choosing MaxRequestsPerChild for more about tuning the MaxRequestsPerChild parameter.

[TOC]


Preload Perl modules at server startup

Use the PerlRequire and PerlModule directives to load commonly used modules such as CGI.pm, DBI and etc., when the server is started. On most systems, server children will be able to share the code space used by these modules. Just add the following directives into httpd.conf:

  PerlModule CGI;
  PerlModule DBI;

But even a better approach is to create a separate startup file (where you code in plain perl) and put there things like:

  use DBI;
  use Carp;

Then you require() this startup file with help of PerlRequire directive from httpd.conf, by placing it before the rest of the mod_perl configuration directives:

  PerlRequire /path/to/start-up.pl

CGI.pm is a special case. Ordinarily CGI.pm autoloads most of its functions on an as-needed basis. This speeds up the loading time by deferring the compilation phase. However, if you are using mod_perl, FastCGI or another system that uses a persistent Perl interpreter, you will want to precompile the methods at initialization time. To accomplish this, call the package function compile() like this:

    use CGI ();
    CGI->compile(':all');

The arguments to compile() are a list of method names or sets, and are identical to those accepted by the use() and import() operators. Note that in most cases you will want to replace ':all' with tag names you really use in your code, since generally only a subset of subs is actually being used.

You can also preload the Registry scripts. See Preload Registry Scripts.

[TOC]


Preload Perl modules - Real Numbers

(META: while the numbers and conclusions are mostly correct, need to rewrite the whole benchmark section using the GTop library to report the shared memory which is very important and will improve the benchmarks)

(META: Add the memory size tests when the server was compiled with EVERYTHING=1 and without it, does loading everything imposes a big change in the memory footprint? Probably the suggestion would be as follows: For a development server use EVERYTHING=1, while for a production if your server is pretty busy and/or low on memory and every bit is on account, only the required parts should be built in. BTW, remember that apache comes with many modules that are being built by default, and you might not need those!)

I have conducted a few tests to benchmark the memory usage when some modules are preloaded. The first set of tests checks the memory use with Library Perl Module preload (only CGI.pm). The second set checks the compile method of CGI.pm. The third test checks the benefit of Library Perl Module preload but a few of them (to see more memory saved) and also the effect of precompiling the Registry modules with Apache::RegistryLoader.

1. In the first test, the following script was used:

  use strict;
  use CGI ();
  my $q = new CGI;
  print $q->header;
  print $q->start_html,$q->p("Hello");

Server restarted

Before the CGI.pm preload: (No other modules preloaded)

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      87004  0.0  0.0 1060 1524      - A    16:51:14  0:00 httpd
  httpd    240864  0.0  0.0 1304 1784      - A    16:51:13  0:00 httpd

After running a script which uses CGI's methods (no imports):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root     188068  0.0  0.0 1052 1524      - A    17:04:16  0:00 httpd
  httpd     86952  0.0  1.0 2520 3052      - A    17:04:16  0:00 httpd

Observation: child httpd has grown up by 1268K

Server restarted

After the CGI.pm preload:

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root     240796  0.0  0.0 1456 1552      - A    16:55:30  0:00 httpd
  httpd     86944  0.0  0.0 1688 1800      - A    16:55:30  0:00 httpd

after running a script which uses CGI's methods (no imports):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      86872  0.0  0.0 1448 1552      - A    17:02:56  0:00 httpd
  httpd    187996  0.0  1.0 2808 2968      - A    17:02:56  0:00 httpd

Observation: child httpd has grown up by 1168K, 100K less then without preload - good!

Server restarted

After CGI.pm preloaded and compiled with CGI->compile(':all');

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      86980  0.0  0.0 2836 1524      - A    17:05:27  0:00 httpd
  httpd    188104  0.0  0.0 3064 1768      - A    17:05:27  0:00 httpd

After running a script which uses CGI's methods (no imports):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      86980  0.0  0.0 2828 1524      - A    17:05:27  0:00 httpd
  httpd    188104  0.0  1.0 4188 2940      - A    17:05:27  0:00 httpd

Observation: child httpd has grown up by 1172K No change! So what does CGI->compile(':all') help? I think it's because we never use all of the methods CGI provides - so in real use it's faster. So you might want to compile only the tags you are about to use - then you will benefit for sure.

2. I have tried the second test to find it. I run the script:

  use strict;
  use CGI qw(:all);
  print header,start_html,p("Hello");

Server restarted

After CGI.pm was preloaded and NOT compiled with CGI->compile(':all'):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      17268  0.0  0.0 1456 1552      - A    18:02:49  0:00 httpd
  httpd     86904  0.0  0.0 1688 1800      - A    18:02:49  0:00 httpd

After running a script which imports symbols (all of them):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      17268  0.0  0.0 1448 1552      - A    18:02:49  0:00 httpd
  httpd     86904  0.0  1.0 2952 3112      - A    18:02:49  0:00 httpd

Observation: child httpd has grown up by 1264K

Server restarted

After CGI.pm was preloaded and compiled with CGI->compile(':all'):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      86812  0.0  0.0 2836 1524      - A    17:59:52  0:00 httpd
  httpd     99104  0.0  0.0 3064 1768      - A    17:59:52  0:00 httpd

After running a script which imports symbols (all of them):

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      86812  0.0  0.0 2832 1436      - A    17:59:52  0:00 httpd
  httpd     99104  0.0  1.0 4884 3636      - A    17:59:52  0:00 httpd

Observation: child httpd has grown by 1868K. Why? Isn't CGI::compile(':all') supposed to make children to share the compiled code with parent? It does works as advertised, but if you pay attention in the code we have called only three CGI.pm's methods - just saying use CGI qw(:all) doesn't mean we compile the all available methods - we just import their names. So actually this test is misleading. Execute compile() only on the methods you are actually using and then you will see the difference.

3. The third script:

  use strict;
  use CGI;
  use Data::Dumper;
  use Storable;
  [and many lines of code, lots of globals - so the code is huge!]

Server restarted

Nothing preloaded at startup:

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      90962  0.0  0.0 1060 1524      - A    17:16:45  0:00 httpd
  httpd     86870  0.0  0.0 1304 1784      - A    17:16:45  0:00 httpd

Script using CGI (methods), Storable, Data::Dumper called:

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      90962  0.0  0.0 1064 1436      - A    17:16:45  0:00 httpd
  httpd     86870  0.0  1.0 4024 4548      - A    17:16:45  0:00 httpd

Observation: child httpd has grown by 2764K

Server restarted

Preloaded CGI (compiled), Storable, Data::Dumper at startup:

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      26792  0.0  0.0 3120 1528      - A    17:19:21  0:00 httpd
  httpd     91052  0.0  0.0 3340 1764      - A    17:19:21  0:00 httpd

Script using CGI (methods), Storable, Data::Dumper called

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      26792  0.0  0.0 3124 1440      - A    17:19:21  0:00 httpd
  httpd     91052  0.0  1.0 6568 5040      - A    17:19:21  0:00 httpd

Observation: child httpd has grown by 3276K. Ouch: 512K more!!!

The reason is that when you preload at the startup all of the methods, they all are being precompiled, there are many of them and they take a big chunk of memory. If you don't use the compile() method, only the functions that are being used will be compiled. Yes, it will slightly slow down the first reposnse of each process, but the actuall memory usage will be lower. BTW, if you write in the script:

  use CGI qw(all);

Only the symbols of all functions are being imported. While they are taking some space, it's smaller than the space that a compiled code of these functions might occupy.

Server restarted

All the above modules + the above script PreCompiled with Apache::RegistryLoader at startup:

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      43224  0.0  0.0 3256 1528      - A    17:23:12  0:00 httpd
  httpd     26844  0.0  0.0 3488 1776      - A    17:23:12  0:00 httpd

Script using CGI (methods), Storable, Data::Dumper called:

  USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
  root      43224  0.0  0.0 3252 1440      - A    17:23:12  0:00 httpd
  httpd     26844  0.0  1.0 6748 5092      - A    17:23:12  0:00 httpd

Observation: child httpd has grown even more 3316K ! Does not seem to be good!

Summary:

1. Library Perl Modules Preloading gave good results everywhere.

2. CGI.pm's compile() method seems to use even more memory. It's because we never use all of the methods CGI provides. Do compile() only the tags that you are going to use and you will save the overhead of the first call for each has not yet been called method, and the memory - since compiled code will be shared across all the children.

3. Apache::RegistryLoader might make scripts load faster on the first request after the child has just started but the memory usage is worse!!! See the numbers by yourself.

HW/SW used : The server is apache 1.3.2, mod_perl 1.16 running on AIX 4.1.5 RS6000 1G RAM.

[TOC]


Preload Registry Scripts

Apache::RegistryLoader compiles Apache::Registry scripts at server startup. It can be a good idea to preload the scripts you are going to use as well. So the code will be shared among the children.

Here is an example of the use of this technique. This code is included in a PerlRequire'd file, and walks the directory tree under which all registry scripts are installed. For each .pl file encountered, it calls the Apache::RegistryLoader::handler() method to preload the script in the parent server (before pre-forking the child processes):

  use File::Find 'finddepth';
  use Apache::RegistryLoader ();
  {
      my $perl_dir = "perl/";
      my $rl = Apache::RegistryLoader->new;
      finddepth(sub {
          return unless /\.pl$/;
          my $url = "/$File::Find::dir/$_";
          print "pre-loading $url\n";
  
          my $status = $rl->handler($url);
          unless($status == 200) {
              warn "pre-load of `$url' failed, status=$status\n";
          }
      }, $perl_dir);
  }

Note that we didn't use the second argument to handler() here, as module's manpage suggests. To make the loader smarter about the uri->filename translation, you might need to provide a trans() function to translate the uri to filename. URI to filename translation normally doesn't happen until HTTP request time, so the module is forced to roll its own translation. If filename is omitted and a trans() routine was not defined, the loader will try using the URI relative to ServerRoot.

You have to check whether this makes any improvement for you though, I did some testing [ Preload Perl modules - Real Numbers ], and it seems that it takes more memory than when the scripts are being called from the child - This is only a first impression and needs better investigation. If you aren't concerned about few script invocations which will take some time to respond while they load the code, you might not need it all!

See also BEGIN blocks

[TOC]


Avoid Importing Functions

When possible, avoid importing a module's functions into your name space. The aliases which are created can take up quite a bit of space. Try to use method interfaces and fully qualified Package::function or $Package::variable like names instead.

[TOC]


PerlSetupEnv Off

PerlSetupEnv Off is another optimization you might consider.

mod_perl fiddles with the environment to make it appear as if the script were being called under the CGI protocol. For example, the $ENV{QUERY_STRING} environment variable is initialized with the contents of Apache::args(), and $ENV{SERVER_NAME} is filled in from the value returned by Apache::server_hostname().

But %ENV population is expensive. Those who have moved to the Perl Apache API no longer need this extra %ENV population, can gain by turning it Off.

[TOC]


-DTWO_POT_OPTIMIZE and -DPACK_MALLOC Perl Options

Newer Perl versions also have build time options to reduce runtime memory consumption. These options might shrink down the size of your httpd by about ~150k (quite big number if you remember to multiply it by the number of chidren you use.)

-DTWO_POT_OPTIMIZE macro improves allocations of data with size close to a power of two; but this works for big allocations (starting with 16K by default). Such allocations are typical for big hashes and special-purpose scripts, especially image processing.

Perl memory allocation is by bucket with sizes close to powers of two. Because of these malloc overhead may be big, especially for data of size exactly a power of two. If PACK_MALLOC is defined, perl uses a slightly different algorithm for small allocations (up to 64 bytes long), which makes it possible to have overhead down to 1 byte for allocations which are powers of two (and appear quite often).

Expected memory savings (with 8-byte alignment in alignbytes) is about 20% for typical Perl usage. Expected slowdown due to additional malloc overhead is in fractions of a percent (hard to measure, because of the effect of saved memory on speed).

You will find these and other memory improvement details in perl5004delta.pod.

[TOC]


Shared Memory

You've probably noticed that the word shared is being repeated many times in many things related to mod_perl. Indeed, shared memory might save you a lot of money, since with sharing in place you can run many more servers than without it. See the Formula and the numbers.

How much shared memory do you have? You can see it by either using the memory utils that comes with your system or you can deploy GTop module:

  print "Shared memory of the current process: ",
    GTop->new->proc_mem($$)->share,"\n";

  print "Total shared memory: ",
    GTop->new->mem->share,"\n";

When you watch the output of the top utility, don't confuse RSS (or RES) column with SHARE column -- RES is a RESident memory, which is a size of pages currently swapped in.

[TOC]


Checking script modification times

Under Apache::Registry the requested CGI script is always being stat()'ed to check whether it was modified. It adds a very little overhead, but if you are into squeezing all the jouces from the server, you might want to save this call. If you do -- take a look at Apache::RegistryBB module.

[TOC]


How can I find if my mod_perl scripts have memory leaks

Apache::Leak (derived from Devel::Leak) should help you with this task. Example:

  use Apache::Leak;
  
  my $global = "FooAAA";
  
  leak_test {
    $$global = 1;
    ++$global;
  };

The argument to leak_test() is an anonymous sub, so you can just throw it around any code you suspect might be leaking. Beware, it will run the code twice, because the first time in, new SVs are created, but does not mean you are leaking, the second pass will give better evidence. You do not need to be inside mod_perl to use it, from the command line, the above script outputs:

  ENTER: 1482 SVs
  new c28b8 : new c2918 : 
  LEAVE: 1484 SVs
  ENTER: 1484 SVs
  new db690 : new db6a8 : 
  LEAVE: 1486 SVs
  !!! 2 SVs leaked !!!

Build a debuggable perl to see dumps of the SVs. The simple way to have both a normal perl and debuggable perl, is to follow hints in the SUPPORT doc for building libperld.a, when that is built copy the perl from that directory to your perl bin directory, but name it dperl.

Leak explanation: $$global = 1; : new global variable created FooAAA with value of 1, will not be destructed until this module is destroyed.

Apache::Leak is not very user-friendly, have a look at B::LexInfo. You'll see that what might appear to be a leak, is actually just a Perl optimization. e.g. consider this code:

  sub foo {
    my $string = shift;
  }

  foo("a string");

B::LexInfo will show you that Perl does not release the value from $string, unless you undef it. this is because Perl anticipates the memory will be needed for another string, the next time the subroutine is entered. you'll see similar for @array length, %hash keys, and scratch areas of the padlist for ops such as join(), `.', etc.

Apache::Status now includes a new StatusLexInfo option.

Apache::Leak works better if you've built a libperld.a (see SUPPORT) and given PERL_DEBUG=1 to mod_perl's Makefile.PL

[TOC]


Limiting the size of the processes

Apache::SizeLimit allows you to kill off Apache httpd processes if they grow too large. see perldoc Apache::SizeLimit for more details.

By using this module, you should be able to discontinue using the Apache configuration directive MaxRequestsPerChild, although for some folks, using both in combination does the job.

[TOC]


Limiting the resources used by httpd children

Apache::Resource uses the BSD::Resource module, which uses the C function setrlimit() to set limits on system resources such as memory and cpu usage.

To configure use:

  PerlModule Apache::Resource
    # set child memory limit in megabytes
    # (default is 64 Meg)
  PerlSetEnv PERL_RLIMIT_DATA 32:48
  
    # set child CPU limit in seconds
    # (default is 360 seconds)
  PerlSetEnv PERL_RLIMIT_CPU 120
  
  PerlChildInitHandler Apache::Resource

The following limit values are in megabytes: DATA, RSS, STACK, FSIZE, CORE, MEMLOCK; all others are treated as their natural unit. Prepend PERL_RLIMIT_ for each one you want to use. Refer to setrlimit man page on your OS for other possible resources.

If the value of the variable is of the form S:H, S is treated as the soft limit, and H is the hard limit. If it is just a single number, it is used for both soft and hard limits.

To debug add:

  <Perl>
    $Apache::Resource::Debug = 1;
    require Apache::Resource;
  </Perl>
  PerlChildInitHandler Apache::Resource

and look in the error_log to see what it's doing.

Refer to perldoc Apache::Resource and man 2 setrlimit for more info.

[TOC]


Limiting the request rate speed (robots blocking)

A limitation of using pattern matching to identify robots is that it only catches the robots that you know about, and only those that identify themselves by name. A few devious robots masquerade as users by using user agent strings that identify themselves as conventional browsers. To catch such robots, you'll have to be more sophisticated.

Apache::SpeedLimit comes for you to help, see:

http://www.modperl.com/chapters/ch6.html#Blocking_Greedy_Clients

[TOC]


Benchmarks. Impressing your Boss and Colleagues.

How much faster is mod_perl than mod_cgi (aka plain perl/CGI)? There are many ways to benchmark the two. I'll present a few examples and numbers below. Checkout the benchmark directory of mod_perl distribution for more examples.

If you are going to write your own benchmarking utility -- use Benchmark module for heavy scripts and Time::HiRes module for very fast scripts (faster than 1 sec) where you need better time precision.

There is no need to write a special benchmark though. If you want to impress your boss or colleagues, just take some heavy CGI script you have (e.g. a script that crunches some data and prints the results to STDOUT), open 2 xterms and call the same script in mod_perl mode in one xterm and in mod_cgi mode in the other. You can use lwp-get from LWP package to emulate the web agent (browser). (benchmark directory of mod_perl distribution includes such an example)

See also 2 tools for benchmarking: ApacheBench and crashme test

[TOC]


Developers Talk

Perrin Harkins writes on benchmarks or comparisons, official or unofficial:

I have used some of the platforms you mentioned and researched others. What I can tell you for sure, is that no commercially available system offers the depth, power, and ease of use that mod_perl has. Either they don't let you access the web server internals, or they make you use less productive languages than Perl, sometimes forcing you into restrictive and confusing APIs and/or GUI development environments. None of them offer the level of support available from simply posting a message to this list, at any price.

As for performance, beyond doing several important things (code-caching, pre-forking/threading, and persistent database connections) there isn't much these tools can do, and it's mostly in your hands as the developer to see that the things which really take the time (like database queries) are optimized.

The downside of all this is that most manager types seem to be unable to believe that web development software available for free could be better than the stuff that cost $25,000 per CPU. This appears to be the major reason most of the web tools companies are still in business. They send a bunch of suits to give PowerPoint presentations and hand out glossy literature to your boss, and you end up with an expensive disaster and an approaching deadline.

But I'm not bitter or anything...

Jonathan Peterson adds:

Most of the major solutions have something that they do better than the others, and each of them has faults. Microsoft's ASP has a very nice objects model, and has IMO the best data access object (better than DBI to use - but less portable) It has the worst scripting language. PHP has many of the advantages of Perl-based solutions, but is less complicated for developers. Netscape's Livewire has a good object model too, and provides good server-side Java integration - if you want to leverage Java skills, it's good. Also, it has a compiled scripting language - which is great if you aren't selling your clients the source code (and a pain otherwise).

mod_perl's advantage is that it is the most powerful. It offers the greatest degree of control with one of the more powerful languages. It also offers the greatest granularity. You can use an embedding module (eg eperl) from one place, a session module (Session) from another, and your data access module from yet another.

I think the Apache::ASP module looks very promising. It has very easy to use and adequately powerful state maintenance, a good embedding system, and a sensible object model (that emulates the Microsoft ASP one). It doesn't replicate MS's ADO for data access, but DBI is fine for that.

I have always found that the developers available make the greatest impact on the decision. If you have a team with no Perl experience, and a small or medium task, using something like PHP, or Microsoft ASP, makes more sense than driving your staff into the vertical learning curve they'll need to use mod_perl.

For very large jobs, it may be worth finding the best technical solution, and then recruiting the team with the necessary skills.

[TOC]


Benchmarking a Graphic hits counter with Persistent DB Connection

Here are the numbers from Michael Parker's mod_perl presentation at Perl Conference (Aug, 98) http://www.realtime.net/~parkerm/perl/conf98/index.htm . The script is a standard hits counter, but it logs the counts into the mysql relational DataBase:

    Benchmark: timing 100 iterations of cgi, perl...  [rate 1:28]
    
    cgi: 56 secs ( 0.33 usr 0.28 sys = 0.61 cpu) 
    perl: 2 secs ( 0.31 usr 0.27 sys = 0.58 cpu) 
    
    Benchmark: timing 1000 iterations of cgi,perl...  [rate 1:21]
     
    cgi: 567 secs ( 3.27 usr 2.83 sys = 6.10 cpu) 
    perl: 26 secs ( 3.11 usr 2.53 sys = 5.64 cpu)      
    
    Benchmark: timing 10000 iterations of cgi, perl   [rate 1:21]
     
    cgi: 6494 secs (34.87 usr 26.68 sys = 61.55 cpu) 
    perl: 299 secs (32.51 usr 23.98 sys = 56.49 cpu) 

We don't know what server configurations was used for these tests, but I guess the numbers speak for themselves.

The source code of the script is available at http://www.realtime.net/~parkerm/perl/conf98/sld006.htm .

[TOC]


Benchmarking scripts with execution times below 1 second :)

As noted before, for very fast scripts you will have to use the Time::HiRes module, its usage is similar to the Benchmark's.

  use Time::HiRes qw(gettimeofday tv_interval);
  my $start_time = [ gettimeofday ];
  &sub_that_takes_a_teeny_bit_of_time()
  my $end_time = [ gettimeofday ];
  my $elapsed = tv_interval($start_time,$end_time);
  print "the sub took $elapsed secs."

See also crashme test.

[TOC]


PerlHandler's Benchmarking

At http://perl.apache.org/dist/contrib/ you will find Apache::Timeit package which does PerlHandler's Benchmarking.

[TOC]


Tuning the Apache's configuration variables for the best performance

It's very important to make a correct configuration of the MinSpareServers, MaxSpareServers, StartServers, MaxClients, and MaxRequestsPerChild parameters. There are no defaults, the values of these variable are very important, as if too ``low'' you will under-use the system's capabilities, and if too ``high'' chances that the server will bring the machine to its knees.

All the above parameters should be specified on the basis of the resources you have. While with a plain apache server, there is no big deal if you run too many servers (not too many of course) since the processes are of ~1Mb and aren't eating a lot of your RAM. Generally the numbers are even smaller if memory sharing is taking place. The situation is different with mod_perl. I have seen mod_perl processes of 20Mb and more. Now if you have MaxClients set to 50: 50x20Mb = 1Gb - do you have 1Gb of RAM? Probably not. So how do you tune these parameters? Generally by trying different combinations and benchmarking the server. Again mod_perl processes can be of much smaller size if sharing is in place.

Before you start this task you should be armed with a proper weapon. You need a crashme utility, which will load your server with mod_perl scripts you possess. You need it to have an ability to emulate a multiuser environment and to emulate multiple clients behavior which will call the mod_perl scripts at your server simultaneously. While there are commercial solutions, you can get away with free ones which do the same job. You can use an ApacheBench ab utility that comes with apache distribution, a crashme script which uses LWP::Parallel::UserAgent or httperf (see Download page).

Another important issue is to make sure to run testing client (load generator) on a system that is more powerful than the system being tested. After all we are trying to simulate the Internet users, where many users are trying to reach your service at once -- since a number of concurrent users can be quite large, your testing machine much be very powerful and capable to generate a heavy load. Of course you should not run the clients and the server on the same machine. If you do -- your testing results would be incorrect, since clients will eat a CPU and a memory that have to be dedicated to the server, and vice versa.

See also 2 tools for benchmarking: ApacheBench and crashme test

[TOC]


Tuning with ab - ApacheBench

ab is a tool for benchmarking your Apache HTTP server. It is designed to give you an impression on how much performance your current Apache installation can give. In particular, it shows you how many requests per secs your Apache server is capable of serving. The ab tool comes bundled with apache source distribution (and it's free :).

Let's try it. We will simulate 10 users concurrently requesting a very light script at www.nowhere.com:81/test/test.pl. Each ``user'' makes 10 requests.

  % ./ab -n 100 -c 10 www.nowhere.com:81/test/test.pl

The results are:

  Concurrency Level:      10
  Time taken for tests:   0.715 seconds
  Complete requests:      100
  Failed requests:        0
  Non-2xx responses:      100
  Total transferred:      60700 bytes
  HTML transferred:       31900 bytes
  Requests per second:    139.86
  Transfer rate:          84.90 kb/s received
  
  Connection Times (ms)
                min   avg   max
  Connect:        0     0     3
  Processing:    13    67    71
  Total:         13    67    74

The only numbers we really care about are:

  Complete requests:      100
  Failed requests:        0
  Requests per second:    139.86

Let's raise the load of requests to 100 x 10 (10 users, each makes 100 requests)

  % ./ab -n 1000 -c 10 www.nowhere.com:81/perl/access/access.cgi
  Concurrency Level:      10
  Complete requests:      1000
  Failed requests:        0
  Requests per second:    139.76

As expected nothing changes -- we have the same 10 concurrent users. Now let's raise the number of concurrent users to 50:

  % ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi
  Complete requests:      1000
  Failed requests:        0
  Requests per second:    133.01

We see that the server is capable of serving 50 concurrent users at an amazing 133 req/sec! Let's find the upper boundary. Using -n 10000 -c 1000 failed to get results (Broken Pipe?). Using -n 10000 -c 500 derived 94.82 req/sec. The server's performance went down with the high load.

The above tests were performed with the following configuration:

  MinSpareServers 8
  MaxSpareServers 6
  StartServers 10
  MaxClients 50
  MaxRequestsPerChild 1500

Now let's kill a child after a single request, we will use the following configuration:

  MinSpareServers 8
  MaxSpareServers 6
  StartServers 10
  MaxClients 100
  MaxRequestsPerChild 1

Simulate 50 users each generating a total of 20 requests:

  % ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi

The benchmark timed out with the above configuration.... I watched the output of ps as I ran it, the parent process just wasn't capable of respawning the killed children at that rate...When I raised the MaxRequestsPerChild to 10 I've got 8.34 req/sec - very bad (18 times slower!) (You can't benchmark the importance of the MinSpareServers, MaxSpareServers and StartServers with this kind of test).

Now let's try to return MaxRequestsPerChild to 1500, but to lower the MaxClients to 10 and run the same test:

  MinSpareServers 8
  MaxSpareServers 6
  StartServers 10
  MaxClients 10
  MaxRequestsPerChild 1500

I've got 27.12 req/sec, which is better but still 4-5 times slower (133 with MaxClients of 50)

Summary: I have tested a few combinations of server configuration variables (MinSpareServers MaxSpareServers StartServers MaxClients MaxRequestsPerChild). And the results we have received are as follows:

MinSpareServers, MaxSpareServers and StartServers are only important for user response times (sometimes user will have to wait a bit).

The important parameters are MaxClients and MaxRequestsPerChild. MaxClients should be not to big so it will not abuse your machine's memory resources and not too small, when users will be forced to wait for the children to become free to come serve them. MaxRequestsPerChild should be as big as possible, to take the full benefit of mod_perl, but watch your server at the beginning to make sure your scripts are not leaking memory, thereby causing your server (and your service) to die very fast.

Also it is important to understand that we didn't test the response times in the tests above, but the ability of the server to respond under a heavy load of requests. If the script that was used to test was heavier, the numbers would be different but the conclusions are very similar.

The benchmarks were run with:

  HW: RS6000, 1Gb RAM
  SW: AIX 4.1.5 . mod_perl 1.16, apache 1.3.3
  Machine running only mysql, httpd docs and mod_perl servers.
  Machine was _completely_ unloaded during the benchmarking.

After each server restart when I did changes to the server's configurations, I made sure the scripts were preloaded by fetching a script at least once by every child.

It is important to notice that none of requests timed out, even if was kept in server's queue for more than 1 minute! (That is the way ab works, which is OK for the testing purposes but will be unacceptable in the real world - users will not wait for more than 5-10 secs for a request to complete, and the client (browser) will timeout in a few minutes.)

Now let's take a look at some real code whose execution time is more than a few millisecs. We will do real testing and collect the data in tables for easier viewing.

I will use the following abbreviations:

  NR    = Total Number of Request
  NC    = Concurrency
  MC    = MaxClients
  MRPC  = MaxRequestsPerChild
  RPS   = Requests per second

Running a mod_perl script with lots of mysql queries (the script under test is mysqld bounded) (http://www.nowhere.com:81/perl/access/access.cgi?do_sub=query_form), with configuration:

  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50
  MaxRequestsPerChild 5000

gives us:

     NR   NC    RPS     comment
  ------------------------------------------------
     10   10    3.33    # not a reliable statistics
    100   10    3.94    
   1000   10    4.62    
   1000   50    4.09    

Conclusions: Here I wanted to show that when the application is slow -- not due to perl loading, code compilation and execution, but bounded to some external operation like mysqld querying which made the bottleneck -- it almost does not matter what load we place on the server. The RPS (Requests per second) is almost the same (given that all the requests have been served, you have an ability to queue the clients, but be aware that something that goes to queue means a waiting client and a client (browser) that might time out!)

Now we will benchmark the same script without using the mysql (perl only bounded code) (http://www.nowhere.com:81/perl/access/access.cgi), it's the same script that just returns a HTML form, without making any SQL queries.

  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50
  MaxRequestsPerChild 5000

     NR   NC      RPS   comment
  ------------------------------------------------
     10   10    26.95   # not a reliable statistics
    100   10    30.88   
   1000   10    29.31
   1000   50    28.01
   1000  100    29.74
  10000  200    24.92
 100000  400    24.95

Conclusions: This time the script we executed was pure perl (not bounded to I/O or mysql), so we see that the server serves the requests much faster. You can see the RequestPerSecond (RPS) is almost the same for any load, but goes lower when the number of concurrent clients goes beyond the MaxClients. With 25 RPS, the client supplying a load of 400 concurrent clients will be served in 16 secs. But to get more realistic and assume the max concurrency of 100, with 30 RPS, the client will be served in 3.5 secs, which is pretty good for a highly loaded server.

Now we will use the server for its full capacity, by keeping all MaxClients alive all the time and having a big MaxRequestsPerChild, so no server will be killed during the benchmarking.

  MinSpareServers       50
  MaxSpareServers       50
  StartServers          50
  MaxClients            50
  MaxRequestsPerChild 5000
  
     NR   NC      RPS   comment
  ------------------------------------------------
    100   10    32.05
   1000   10    33.14
   1000   50    33.17
   1000  100    31.72
  10000  200    31.60

Conclusion: In this scenario there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU.

Now we will try to change the MaxClients and to watch the results: Let's reduce MC to 10.

  MinSpareServers        8
  MaxSpareServers       10
  StartServers          10
  MaxClients            10
  MaxRequestsPerChild 5000
  
     NR   NC      RPS   comment
  ------------------------------------------------
     10   10    23.87   # not a reliable statistics
    100   10    32.64 
   1000   10    32.82
   1000   50    30.43
   1000  100    25.68
   1000  500    26.95
   2000  500    32.53

Conclusions: A very little difference! Almost no change! 10 servers were able to serve almost with the same throughput as 50 servers. Why? My guess it's because of CPU throttling. It seems that 10 servers were serving requests 5 times faster than when in the test above we worked with 50 servers. In the case above each child received its CPU time slice 5 times less frequently. So having a big value for MaxClients, doesn't mean that the performance will be better. You have just seen the numbers!

Now we will start to drastically reduce the MaxRequestsPerChild:

  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50
  
     NR   NC    MRPC     RPS    comment
  ------------------------------------------------
    100   10      10    5.77 
    100   10       5    3.32
   1000   50      20    8.92
   1000   50      10    5.47
   1000   50       5    2.83
   1000  100      10    6.51

Conclusions: When we drastically reduce the MaxRequestsPerChild, the performance starts to become closer to the plain mod_cgi. Just for comparison with mod_cgi, here are the numbers of this run with mod_cgi:

  MinSpareServers        8
  MaxSpareServers       16
  StartServers          10
  MaxClients            50
  
     NR   NC    RPS     comment
  ------------------------------------------------
    100   10    1.12
   1000   50    1.14
   1000  100    1.13

Conclusion: mod_cgi is much slower :) in test NReq/NClients 100/10 the RPS in mod_cgi was of 1.12 and in mod_perl of 32, which is 30 times faster!!! In the first test each child waited about 100 secs to be served. In the second and third 1000 secs!

[TOC]


Tuning with crashme script

This is another crashme suite originally written by Michael Schilli and located at http://www.linux-magazin.de/ausgabe.1998.08/Pounder/pounder.html . I did a few modifications (mostly adding my() operands). I also allowed it to accept more than one url to test, since sometimes you want to test an overall and not just one script.

The tool provides the same results as ab above but it also allows you to set the timeout value, so requests will fail if not served within the time out period. You also get Latency (secs/Request) and Throughput (Requests/sec) numbers. It can give you a better picture and make a complete simulation of your favorite Netscape browser :).

I have noticed while running these 2 benchmarking suites - ab gave me results 2.5-3.0 times better. Both suites run on the same machine with the same load with the same parameters. But the implementations are different.

Sample output:

  URL(s):          http://www.nowhere.com:81/perl/access/access.cgi
  Total Requests:  100
  Parallel Agents: 10
  Succeeded:       100 (100.00%)
  Errors:          NONE
  Total Time:      9.39 secs
  Throughput:      10.65 Requests/sec
  Latency:         0.85 secs/Request

And the code:

  #!/usr/apps/bin/perl -w
  
  use LWP::Parallel::UserAgent;
  use Time::HiRes qw(gettimeofday tv_interval);
  use strict;
  
  ###
  # Configuration
  ###
  
  my $nof_parallel_connections = 10; 
  my $nof_requests_total = 100; 
  my $timeout = 10;
  my @urls = (
            'http://www.nowhere.com:81/perl/faq_manager/faq_manager.pl',
            'http://www.nowhere.com:81/perl/access/access.cgi',
           );
  
  
  ##################################################
  # Derived Class for latency timing
  ##################################################
  
  package MyParallelAgent;
  @MyParallelAgent::ISA = qw(LWP::Parallel::UserAgent);
  use strict;
  
  ###
  # Is called when connection is opened
  ###
  sub on_connect {
    my ($self, $request, $response, $entry) = @_;
    $self->{__start_times}->{$entry} = [Time::HiRes::gettimeofday];
  }
  
  ###
  # Are called when connection is closed
  ###
  sub on_return {
    my ($self, $request, $response, $entry) = @_;
    my $start = $self->{__start_times}->{$entry};
    $self->{__latency_total} += Time::HiRes::tv_interval($start);
  }
  
  sub on_failure {
    on_return(@_);  # Same procedure
  }
  
  ###
  # Access function for new instance var
  ###
  sub get_latency_total {
    return shift->{__latency_total};
  }
  
  ##################################################
  package main;
  ##################################################
  ###
  # Init parallel user agent
  ###
  my $ua = MyParallelAgent->new();
  $ua->agent("pounder/1.0");
  $ua->max_req($nof_parallel_connections);
  $ua->redirect(0);    # No redirects
  
  ###
  # Register all requests
  ###
  foreach (1..$nof_requests_total) {
    foreach my $url (@urls) {
      my $request = HTTP::Request->new('GET', $url);
      $ua->register($request);
    }
  }
  
  ###
  # Launch processes and check time
  ###
  my $start_time = [gettimeofday];
  my $results = $ua->wait($timeout);
  my $total_time = tv_interval($start_time);
  
  ###
  # Requests all done, check results
  ###
  
  my $succeeded     = 0;
  my %errors = ();
  
  foreach my $entry (values %$results) {
    my $response = $entry->response();
    if($response->is_success()) {
      $succeeded++; # Another satisfied customer
    } else {
      # Error, save the message
      $response->message("TIMEOUT") unless $response->code();
      $errors{$response->message}++;
    }
  }
  
  ###
  # Format errors if any from %errors 
  ###
  my $errors = join(',', map "$_ ($errors{$_})", keys %errors);
  $errors = "NONE" unless $errors;
  
  ###
  # Format results
  ###
  
  #@urls = map {($_,".")} @urls;
  my @P = (
        "URL(s)"          => join("\n\t\t ", @urls),
        "Total Requests"  => "$nof_requests_total",
        "Parallel Agents" => $nof_parallel_connections,
        "Succeeded"       => sprintf("$succeeded (%.2f%%)\n",
                                   $succeeded * 100 / $nof_requests_total),
        "Errors"          => $errors,
        "Total Time"      => sprintf("%.2f secs\n", $total_time),
        "Throughput"      => sprintf("%.2f Requests/sec\n", 
                                   $nof_requests_total / $total_time),
        "Latency"         => sprintf("%.2f secs/Request", 
                                   ($ua->get_latency_total() || 0) / 
                                   $nof_requests_total),
       );
  
  
  my ($left, $right);
  ###
  # Print out statistics
  ###
  format STDOUT =
  @<<<<<<<<<<<<<<< @*
  "$left:",        $right
  .
  
  while(($left, $right) = splice(@P, 0, 2)) {
    write;
  }

[TOC]


Choosing MaxClients

The MaxClients directive sets the limit on the number of simultaneous requests that can be supported; no more than this number of child server processes will be created. To configure more than 256 clients, you must edit the HARD_SERVER_LIMIT entry in httpd.h and recompile. In our case we want this variable to be as small as possible, this way we can virtually bound the resources used by the server children. Since we can restrict each child's process size (see Limiting the size of the processes) -- the calculation of MaxClients is pretty straightforward :

               Total RAM Dedicated to the Webserver
  MaxClients = ------------------------------------
                     MAX child's process size

So if I have 400Mb left for the webserver to run with, I can set the MaxClients to be of 40 if I know that each child is bounded to the 10Mb of memory (e.g. with Apache::SizeLimit).

Certainly you will wonder what happens to your server if there are more than MaxClients concurrent users at some moment. This situation is accompanied by the following warning message into the error.log file:

  [Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting,
  consider raising the MaxClients setting

There is no problem -- any connection attempts over the MaxClients limit will normally be queued, up to a number based on the ListenBacklog directive. Once a child process is freed at the end of a different request, the connection will then be served.

But it is an error because clients are being put in the queue rather than getting served at once, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more children. The best approach is to try not to have this condition reached at all, and if reach it often you should start to worry about it.

It's important to understand how much real memory a child occupies. Your children can share the memory between them (when OS supports that and you take action to allow the sharing happen - See Preload Perl modules at server startup). If this is the case, chances are that your MaxClients can be even higher. But it seems that it's not so simple to calculate the absolute number. (If you come up with solution please let us know!). If the shared memory was of the same size through the child's life, we could derive a much better formula:

               Total_RAM + Shared_RAM_per_Child * MaxClients
  MaxClients = ---------------------------------------------
                        Max_Process_Size - 1

which is:

                    Total_RAM - Max_Process_Size
  MaxClients = ---------------------------------------
               Max_Process_Size - Shared_RAM_per_Child

Let's roll some calculations:

  Total_RAM            = 500Mb
  Max_Process_Size     =  10Mb
  Shared_RAM_per_Child =   4Mb

              500 - 10
 MaxClients = --------- = 81
               10 - 4

With no sharing in place

                 500
 MaxClients = --------- = 50
                 10

With sharing in place you can have 60% more servers without purchasing more RAM, if you improve and keep the sharing level, let's say:

  Total_RAM            = 500Mb
  Max_Process_Size     =  10Mb
  Shared_RAM_per_Child =   8Mb

              500 - 10
 MaxClients = --------- = 245
               10 - 8

390% more servers!!! You've got the point :)

[TOC]


Choosing MaxRequestsPerChild

The MaxRequestsPerChild directive sets the limit on the number of requests that an individual child server process will handle. After MaxRequestsPerChild requests, the child process will die. If MaxRequestsPerChild is 0, then the process will live forever.

Setting MaxRequestsPerChild to a non-zero limit has two beneficial effects: it solves memory leakages and helps reduce the number of processes when the server load reduces.

The first reason is the most crucial for mod_perl, since sloppy programming will cause a child process to consume more memory after each request. If left unbounded, then after a certain number of requests the children will use up all the available memory and leave the server to die from memory starvation. Note, that sometimes standard system libraries leak memory too, especially on OSes with bad memory management (e.g. Solaris 2.5 on x86 arch). If this is your case you can set MaxRequestsPerChild to a small number, which will allow the system to reclaim the memory, greedy child process consumed, when it exits after MaxRequestsPerChild requests. But beware -- if you set this number too low, you will loose a fracture of the speed bonus you receive with mod_perl. Consider using Apache::PerlRun if this is the case. Also setting MaxSpareServers to a number close to MaxClients, will improve the response time (but your parent process will be busy respawning new children all the time!)

Another approach is to use Apache::SizeLimit (See Limiting the size of the processes). By using this module, you should be able to discontinue using the MaxRequestsPerChild, although for some folks, using both in combination does the job.

See also Preload Perl modules at server startup and Sharing Memory.

[TOC]


Choosing MinSpareServers, MaxSpareServers and StartServers

With mod_perl enabled, it might take as much as 30 seconds from the time you start the server until it is ready to serve incoming requests. This delay depends on the OS, the number of preloaded modules and the process load of the machine. So it's best to set StartServers and MinSpareServers to high numbers, so that if you get a high load just after the server has been restarted, the fresh servers will be ready to serve requests immediately. With mod_perl, it's usually a good idea to raise all 3 variables higher than normal. In order to maximize the benefits of mod_perl, you don't want to kill servers when they are idle, rather you want them to stay up and available to immediately handle new requests. I think an ideal configuration is to set MinSpareServers and MaxSpareServers to similar values, maybe even the same. Having the MaxSpareServers close to MaxClients will completely use all of your resources (if MaxClients has been chosen to take the full advantage of the resources), but it'll make sure that at any given moment your system will be capable of responding to requests with the maximum speed (given that number of concurrent requests is not higher than MaxClients.)

Let's try some numbers. For a heavily loaded web site and a dedicated machine I would think of (note 400Mb is just for example):

  Available to webserver RAM:   400Mb
  Child's memory size bounded:  10Mb
  MaxClients:                   400/10 = 40 (larger with mem sharing)
  StartServers:                 20
  MinSpareServers:              20
  MaxSpareServers:              35

However if I want to use the server for many other tasks, but make it capable of handling a high load, I'd think of:

  Available to webserver RAM:   400Mb
  Child's memory size bounded:  10Mb
  MaxClients:                   400/10 = 40
  StartServers:                 5
  MinSpareServers:              5
  MaxSpareServers:              10

(These numbers are taken off the top of my head, and it shouldn't be used as a rule, but rather as examples to show you some possible scenarios. Use this information wisely!)

[TOC]


Summary of Benchmarking to tune all 5 parameters

OK, we've run various benchmarks -- let's summarize the conclusions:

[TOC]


Persistent DB Connections

Another popular use of mod_perl is to take advantage of its ability to maintain persistent open database connections. The basic approach is as follows:

  # Apache::Registry script
  -------------------------
  use strict;
  use vars qw($dbh);
  
  $dbh ||= SomeDbPackage->connect(...);

Since $dbh is a global variable for the child, once the child has opened the connection it will use it over and over again, unless you perform disconnect().

Be careful to use different names for handlers if you open connection to different databases!

Apache::DBI allows you to make a persistent database connection. With this module enabled, every connect() request to the plain DBI module will be forwarded to the Apache::DBI module. This looks to see whether a database handle from a previous connect() request has already been opened, and if this handle is still valid using the ping method. If these two conditions are fulfilled it just returns the database handle. If there is no appropriate database handle or if the ping method fails, a new connection is established and the handle is stored for later re-use. There is no need to delete the disconnect() statements from your code. They will not do a thing, as the Apache::DBI module overloads the disconnect() method with a NOP. On child's exit there is no explicit disconnect, the child dies and so does the database connection. You may leave the use DBI; statement inside the scripts as well.

The usage is simple -- add to httpd.conf:

  PerlModule Apache::DBI

It is important, to load this module before any other DBI, DBD::* and ApacheDBI* modules!

  db.pl
  ------------
  use DBI;
  use strict;
  
  my $dbh = DBI->connect( 'DBI:mysql:database', 'user', 'password',
                          { autocommit => 0 }
                        ) || die $DBI::errstr;
  
  ...rest of the program

If you use DBI for DB connections, and you use Apache::DBI to make them persistent, it also allows you to preopen connections to DB for each child with connect_on_init() method, thus saving up a connection overhead on the very first request of every child.

  use Apache::DBI ();
  Apache::DBI->connect_on_init("DBI:mysql:test",
                               "login",
                               "passwd",
                               {
                                RaiseError => 1,
                                PrintError => 0,
                                AutoCommit => 1,
                               }
                              );

This can be used as a simple way to have apache children establish connections on server startup. This call should be in a startup file require()d by PerlRequire or inside <Perl> section. It will establish a connection when a child is started in that child process. See the Apache::DBI manpage to see the requirements for this method.

You can also benefit from persistent connections by replacing prepare() with prepare_cached(). But it can produce a little overhead (META, why?).

Another problem is with timeouts: some databases disconnect the client after a certain time of inactivity. This problem is known as morning bug. The ping() method ensures that this will not happen. Some DBD drivers don't have this method, check the Apache::DBI manpage to see how to write a ping() method.

Another approach is to change the client's connection timeout. For mysql users, starting from mysql-3.22.x you can set a wait_timeout option at mysqld server startup to change the default value. Setting it to 36 hours probably would fix the timeout problem.

[TOC]


Using $|=1 under mod_perl and better print() techniques.

As you know local $|=1; disables the buffering of the currently selected file handle (default is STDOUT). If you enable it, ap_rflush() is called after each print(), unbuffering Apache's IO.

If you are using a _bad_ style in generating output, which consist of multiple print() calls, or you just have too many of them, you will experience a degradation in performance. The severity depends on the number of the calls you make.

Many old CGIs were written in the style of:

  print "<BODY BGCOLOR=\"black\" TEXT=\"white\">";
  print "<H1>";
  print "Hello";
  print "</H1>";
  print "<A HREF=\"foo.html\"> foo </A>";
  print "</BODY>";

which reveals the following drawbacks: multiple print() calls - performance degradation with $|=1, backslashism which makes the code less readable and more difficult to format the HTML to be easily readable as CGI's output. The code below solves them all:

  print qq{
    <BODY BGCOLOR="black" TEXT="white">
      <H1>
        Hello
      </H1>
      <A HREF="foo.html"> foo </A>
    </BODY>
  };

I guess you see the difference. Be careful though, when printing a <HTML> tag. The correct way is:

  print qq{<HTML>
    <HEAD></HEAD>
    <BODY>
  }

If you try the following:

  print qq{
    <HTML>
    <HEAD></HEAD>
    <BODY>
  }

Some older browsers might not accept the output as HTML, but rather print it as a plain text, since they expect the first characters after the headers and empty line to be <HTML> and not spaces and/or additional newline and then <HTML>. Even if it works with your browser, it might not work for others.

Now let's go back to the $|=1 topic. I still disable buffering, for 2 reasons: I use few print() calls by printing out multiline HTML and not a line per print() and I want my users to see the output immediately. So if I am about to produce the results of the DB query, which might take some time to complete, I want users to get some titles ahead. This improves the usability of my site. Recall yourself: What do you like better: getting the output a bit slower, but steadily from the moment you've pressed the Submit button or having to watch the ``falling stars'' for awhile and then to receive the whole output at once, even a few millisecs faster (if the client (browser) did not time out till then).

Conclusion: Do not blindly follow suggestions, but think what is best for you in every given case.

[TOC]


More Reducing Memory Usage Tips

One of the important issues in improving the performance is reduction of memory usage - the less memory each server uses, the more server processes you can start, and thus the more performance you have (from the user's point of view - the response speed )

See Global Variables

See Memory "leakages"

[TOC]


Profiling

Profiling process helps you to determine which subroutines or just snippets of code take the longest execution time and which subroutines are being called most often. Probably you will want to optimize those, and to improve the code toward efficiency.

It is possible to profile code running under mod_perl with the Devel::DProf module, available on CPAN. However, you must have apache version 1.3b3 or higher and the PerlChildExitHandler enabled (during the httpd build process). When the server is started, Devel::DProf installs an END block to write the tmon.out file. This block will be called at the server shutdown. Here is how to start and stop a server with the profiler enabled:

  % setenv PERL5OPT -d:DProf
  % httpd -X -d `pwd` &
  ... make some requests to the server here ...
  % kill `cat logs/httpd.pid`
  % unsetenv PERL5OPT
  % dprofpp

The Devel::DProf package is a Perl code profiler. It will collect information on the execution time of a Perl script and of the subs in that script (remember that print() and map() are just like any other subroutines you write, but they are come bundled with Perl!)

Another approach is to use Apache::DProf, which hooks Devel::DProf into mod_perl. The Apache::DProf module will run a Devel::DProf profiler inside each child server and write the tmon.out file in the directory $ServerRoot/logs/dprof/$$ when the child is shutdown (where $$ is a number of the child process). All it takes is to add to httpd.conf:

  PerlModule Apache::DProf

Remember that any PerlHandler that was pulled in before Apache::DProf in the httpd.conf or <startup.pl>, would not have its code debugging info inserted. To run dprofpp, chdir to $ServerRoot/logs/dprof/$$ and run:

  % dprofpp

[TOC]


CGI.pm's object methods calls vs. function calls

Which approach is better?

  use CGI;
  my $q = new CGI;
  print $q->param('x');

versus

  use CGI (:standard);
  print param('x');

There is not any performance benefit of using the object calls rather than the function calls, but there is a real memory hit when you import all of CGI.pm's function calls into your process memory. This can be significant, particularly when there are many child daemons.

I strongly endorse Apache::Request (libapreq) - Generic Apache Request Library. Its guts are all written in C, giving it a significant memory and performance benefit.

[TOC]


Sending plain HTML as a compressed output

See Apache::GzipChain - compress HTML (or anything) in the OutputChain

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 09/26/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.