Deploying mod_perl technology to give a rocket speed to your CGI/perl scripts.
Version 1.16 Sep, 26 1999
The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Search mod_perl FAQs along with this guide
at www.perlreference.com |
Search perl.apache.org along with this guide. |
|
||
URL: http://perl.apache.org/guide Copyright © 1998, 1999 Stas Bekman. All rights reserved. |
||
|
||
Written by Stas Bekman. Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
The Apache/Perl integration project brings together the full power of the Perl programming language and the Apache HTTP server. With mod_perl it is possible to write Apache modules entirely in Perl, letting you easily do things that are more difficult or impossible in regular CGI programs, such as running sub requests. In addition, the persistent Perl interpreter embedded in the server saves the overhead of starting an external interpreter, i.e. the penalty of Perl start-up time. And not the least important feature is code caching, where modules and scripts are loaded and compiled only once, then for the rest of the server's life they are served from the cache, thus the server spends its time only on running already loaded and compiled code, which is very fast.
The primary advantages of mod_perl are power and speed. You have full
access to the inner workings of the web server and can intervene at any
stage of request-processing. This allows for customized processing of (to
name just a few of the phases) URI->filename translation,
authentication, response generation, and logging. There is very little
run-time overhead. In particular, it is not necessary to start a separate
process, as is often done with web-server extensions. The most wide-spread
such extension, the Common Gateway Interface (CGI), can be replaced
entirely with Perl code that handles the response generation phase of
request processing. mod_perl includes 2 general purpose modules for this
purpose: Apache::Registry
, which can transparently run existing perl CGI scripts and
Apache::PerlRun
, which does a similar job but allows you to run ``dirtier'' (to some
extent) scripts.
You can configure your httpd server and handlers in Perl (using
PerlSetVar
, and <Perl> sections). You can even define your own configuration directives.
Many people wonder and ask ``How much of a performance improvement does mod_perl give?''. Well, it all depends on what you are doing with mod_perl and possibly who you ask. Developers report speed boosts from 200% to 2000%. The best way to measure is to try it and see for yourself! (See http://perl.apache.org/tidbits.html and http://perl.apache.org/stories/ for the facts.)
When you run your CGI scripts by using a configuration of:
ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
you run it under a mod_cgi handler, you never define it explicitly. Apache does all the configuration work behind the scenes, when you use a ScriptAlias.
BTW, don't confuse it with a ExecCGI
configuration option, it's being enabled so the script will be executed and
not returned as a plain file. For example for mod_perl and Apache::Registry
you would use a configuration like:
<Location /perl> SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On </Location>
META: complete
META: complete
From the point of Perl API, Apache::Registry
is just yet another handler that's not conceptually different from any
other handler. It reads in the file, compiles, executes it and stores into
the cache. Since the perl interpreter keeps running from child process'
creation to its death, any code compiled by the interpreter is not removed
from memory until the child dies.
To keep the script names from collisions, it prepends
Apache::ROOT::
and the mangled path of the URI to the key of the cached script. This key
is actually a package name, the script resides in. So if you have requested
a script /perl/project/test.pl
, the scripts would be wrapped in code which starts with package
declaration of:
package Apache::ROOT::perl::project::test_e2pl;
Apache::Registry
also stores the script's last modification time. Everytime the script
changes, the cached code would be discarded and recompiled using the
modified source. However, it doesn't check any of the perl libraries the
script might use.
Apache::Registry
overrides the CORE::exit()
with <Apache::exit()>, so the CGI scripts that used the
<exit()> will run correctly. We will talk about all these details
indepth later.
The last thing Apache::Registry
does, is emulating of the mod_cgi's environment variables. Like $ENV{SERVER_NAME}
, $ENV{REMOTE_USER}
and so on. PerlSetupEnv Off disables this feature and saves some memory bits and CPU clocks.
From the point of mod_cgi (when you take the script that was running as a plain CGI under mod_cgi to run under mod_perl), there is almost no difference, but great speed improve, though much heavier memory usage (there is no free lunch :).
Just rememeber that the code is being cached, so it wouldn't cleanup the memory, like you used to with mod_cgi when the script was existing (it doesn't exit in mod_perl).
Also rememeber that any libraries that your script might
require()
or use()
wouldn't be recompiled when
changed.
Of course the book will answer all this issues in depth.
Just to show you what happens with your script, when it's being executed
under Apache::Registry
. If we take the simplest code of (URI /perl/project/test.pl
)
print "Content-type: text/html\n\n"; print "It works\n";
Apache::Registry
will convert it into the following:
package Apache::ROOT::perl::project::test_e2pl; use Apache qw(exit); sub handler { print "Content-type: text/html\n\n"; print "It works\n"; }
META: Complete
This document was written in an effort to help you start using Apache's
mod_perl extension as quickly and easily as possible. It includes
information about installation and configuration of Perl and the Apache web
server and delves deeply into issues of writing and porting existing Perl
scripts to run under mod_perl. Note that it does not attempt to enter the
big world of using the Perl API or C API. You will find pointers covering
these topics in the Getting Helped and Further Learning section of this document. This guide tries to cover the most of the Apache::Registry
and Apache::PerlRun
modules. Along with mod_perl related topics, there are many more issues
related to administrating apache servers, debugging scripts, using
databases, Perl reference, code snippets and more. The Guide's Overview will help you to find your way through the guide.
It is assumed that you know at least the basics of building and installing Perl and Apache. (If you do not, just read the INSTALL docs coming with distribution of each package.) However, in this guide you will find specific Perl and Apache installation and configuration notes, which will help you successfully complete the mod_perl installation and get the server running in a short time.
If after reading this guide and other documents listed at Help section, you feel that your question is yet not answered, please ask the apache/mod_perl mailing list to help you. But first try to browse the mailing list archive (located at http://forum.swarthmore.edu/epigone/modperl ). Often you will find the answer to your question by searching the mailing list archive, since there is a good chance someone else has already encountered the problem and found a solution. If you ignore this advice, do not be surprised if your question goes unanswered - it bores people to answer the same question more than once (twice?). This does not mean that you should avoid asking questions, just do not abuse the available help and RTFM before you call for HELP. (You have certainly heard the infamous fable of the shepherd boy and the wolves...)
If you find incorrect details or my grammar mistakes, or you want to contribute to this document please feel free to send me an email at sbekman@iname.com .
I have used the following references while writing this guide:
mod_perl FAQ by Frank Cringle at http://perl.apache.org/faq/ .
mod_perl performance tuning guide by Vivek Khera at http://perl.apache.org/tuning/ .
mod_perl plugin reference guide by Doug MacEachern at http://perl.apache.org/src/mod_perl.html .
Quick guide for moving from CGI to mod_perl at http://perl.apache.org/dist/cgi_to_mod_perl.html .
mod_perl_traps, common traps and solutions for mod_perl users at http://perl.apache.org/dist/mod_perl_traps.html .
mod_perl mailing list emails. Answers to some of the questions posted to modperl@apache.org Apache/Perl mailing list.
My personal experience with mod_perl.
As I said, I have quoted many information snippets from FAQs and emails, and I did not credit people after each quote in the guide. I did not mean to take the credit for myself, it's just that I tried to keep track of names, and became lost, so I preferred not to put credit throughout the guide, but rather to centralize it here. If you want your name to show up under your original quote, please tell me and I'll add it for you.
Major contributors:
Doug MacEachern. A big part of this guide is built upon his email replies to users' questions.
Frank Cringle. Parts of his mod_perl FAQ has been used in the guide.
Vivek Khera. For his mod_perl performance tuning guide.
Steve Reppucci, who made a thorough review of the stuff I wrote. Fixed lots of spelling and grammar errors, and made the guide readable to English speakers :)
Eric Cholet, who wrote complete sections for the guide, and pointed out the errors the guide carried away.
Ken Williams, who reviewed a lot of stuff in the guide. Many snippets from his emails are included in the guide.
Credits of course go to ( alphabetically sorted ):
I want to thank all the people who donated their time and efforts to make this amazing idea of mod_perl a reality. This includes Doug MacEachern, the author of mod_perl, and all the developers who contributed bug patches, modules and help. And of course the numerous unseen users who helped to find bugs and advocate the mod_perl around the world.
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Before you start with mod_perl installation, you should have an overall picture of this wonderful technology. There is more then one way to use a mod_perl-enabled webserver. You have to decide what mod_perl scheme you want to use. Picking the Right Strategy chapter presents various approaches and discusses their pros and cons.
Once you know what fits your requirements the best, you should proceed to Real World Scenarios Implementation. This chapter provides very detailed scenarios of the schemes discussed in the Picking the Right Strategy chapter.
The Server Installation chapter follows on to the Real World Scenarios Implementaion chapter by providing more in-depth installation details.
The Server Configuration chapter adds to the basic configurations presented in the Real World Scenarios Implementaion chapter with extended configurations and various configuration examples.
The Frequent mod_perl problems chapter just collects links to other chapters. It is an attempt to stress some of the most frequently encountered mod_perl problems. So this is the first place you should check if you have got a problem.
Probably the most important chapter is CGI to mod_perl Porting. mod_perl Coding guidelines. It explains the differences between scripts running under mod_cgi and mod_perl, and what should be done in order to make existing scripts run under mod_perl. Along with the porting notes it provides guidelines for proper mod_perl programming.
Performance. Benchmarks is the biggest and a very important chapter. It explains the details of tuning mod_perl and the scripts running under it, so you can squeeze every ounce of the power from your server. A big part of the chapter are benchmarks, the numbers that IT managers love to read. But these are different benchmarks: they are not comparing mod_perl with similar technologies, rather with different configurations of mod_perl servers, to guide you through the tuning process. I have to admit, performance tuning is a very hard task, and demands a lot of understanding and experience. But once you acquire this knowledge you can make magic with your server.
The Things obvious to others, but not to you chapter is exactly what it claims to be. Some people have been in this business too long, and many things have become too obvious to them. This is not true for a newbie, so this chapter talks about such things.
While developing your mod_perl applications, you will begin to understand
that an error_log
file is your best friend. It tells you all the intimate details of what is
happening to your scripts. But the problem is that it speaks a secret
language. To learn the alphabet and the grammar of this language, refer to
the chapter Warnings and Errors: Where and Why.
Protecting Your Site - Everything regarding security.
If you are into driving relational databases with your cgi scripts, the mod_perl and Relational Databases chapter will tell you all about the database-related goodies mod_perl has prepared for you.
If you are using good old dbm files for your databases, the mod_perl and dbm files chapter explains how to utilize them better under mod_perl.
More and more Internet Service Providers (ISPs) are evaluating the possibility of providing mod_perl services to their users. Is this possible? Is it secure? Will it work? What resources does it take? The mod_perl for ISPs. mod_perl and Virtual Hosts chapter answers all these questions. If you want to run a mod_perl- enabled server, but do not have root access, read this chapter as well, either to learn how to do it yourself, or maybe to persuade your ISP to provide this service.
If you have to administer your Apache mod_perl server the Controlling and Monitoring the Server chapter is for you. Among the topics are: server restarting and monitoring techniques, preventing the server from eating up all your disk space in a matter of minutes, and more.
The mod_perl Status. Peeking into the Server's Perl Innards
chapter shows you the ways you can peek at what is going on in a
mod_perl-enabled server while it is running. Like looking at the value of
some global variable, what database connections are open, looking up what
modules were loaded and their paths, what is the value of @INC
, and much more.
Every programmer needs to know how to debug her program. It is an _easy_
task with plain Perl. Just invoke the program with the -d
flag and debug it. Is it possible to do the same under mod_perl? After all
you cannot debug every CGI script by executing it from the command line:
some scripts will not run from the command line. The Debugging mod_perl chapter proves debugging under mod_perl is possible and real.
Sometimes browsers that interact with our servers have bugs, which cause big headaches for CGI developers. Preventing these bugs from happening is discussed in the Workarounds for some known bugs in browsers chapter.
Many modules were written to extend the mod_perl's core functionality. Some important modules are covered in the Apache::* modules chapter.
Some folks decide to go with mod_perl, but they are missing a basic understanding of Perl, which is absolutely not tolerated by mod_perl. If you are such a person, there is nothing to be ashamed of; we all went through this. Get a good Perl book and start reading. The Perl Reference chapter gives some basic perl lessons, delivering the knowledge without which you cannot start to program mod_perl scripts.
The Code Snippets chapter is just a collection of code snippets I have found useful while writing the scripts.
The Choosing an Operating System and Hardware chapter gives you an idea on how to choose the SW and HW for the webserver.
The <mod_perl Advocacy|advocacy/> tries to help to make it easier to advocate mod_perl around the world.
The Getting Helped and Further Learning chapter refers you to other related information resources, like learning Perl programming and SQL, understanding security, building databases, and more.
Appendix A: Downloading software and documentation includes pointers to the software that was explained and/or mentioned in this guide.
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This document is relevant to both writing a new CGI from scratch and migrating an application from plain CGI to mod_perl.
If you are in the porting stage, use it as a reference for possible problems you might encounter when running the existent CGI in the new mode.
If you are about to write a new CGI from scratch, it would be a good idea to learn most of the possible pitfalls and to avoid them in first place.
It covers also the case where the CGI script being ported does the job, but
is too dirty to be easily altered to run as a mod_perl program. (Apache::PerlRun
)
If your project schedule is tight, I would suggest converting to mod_perl
in the following steps: Initially, run all the scripts in the
Apache::PerlRun
mode. Then as time allows, move them into
Apache::Registry
mode.
It can be a good idea to tighten up some of your Perl programming
practices, since Apache::Registry
doesn't allow sloppy programming.
You might want to read:
This page describes the mechanics of creating, compiling, releasing, and maintaining Perl modules. http://world.std.com/~swmcd/steven/perl/module_mechanics.html
The information is very relevant to a mod_perl developer.
``Writing Apache Modules with Perl and C'' is a ``must have'' book!
See the details at http://www.modperl.com .
Let's start with some simple code and see what can go wrong with it ,detect bugs and debug them, discuss possible caveats and how to avoid them.
I will use a simple CGI script, that initializes a $counter
to 0, and prints its value to the screen while incrementing it.
counter.pl: ---------- #!/usr/bin/perl -w use strict; print "Content-type: text/html\r\n\r\n"; my $counter = 0; for (1..5) { increment_counter(); } sub increment_counter{ $counter++; print "Counter is equal to $counter !<BR>\n"; } ----------
You would expect to see an output:
Counter is equal to 1 ! Counter is equal to 2 ! Counter is equal to 3 ! Counter is equal to 4 ! Counter is equal to 5 !
And that's what you see when you execute this script at first time. But let's reload it a few times... See, suddenly after a few reloads the counter doesn't start its count from 5 anymore. We continue to reload and see that it keeps on growing, but not steadily 10, 10, 10, 15, 20... Weird...
Counter is equal to 6 ! Counter is equal to 7 ! Counter is equal to 8 ! Counter is equal to 9 ! Counter is equal to 10 !
We saw two anomalies in this very simple script: Unexpected growth of counter over 5 and inconsistent growth over reloads. Let's investigate this script.
First let's peek into an error_log
file... what we see is:
Variable "$counter" will not stay shared at /home/httpd/perl/conference/counter.pl line 13.
What kind of error is this? We should ask perl to help us. I'm going to enable a special diagnostics mode, by adding at the top of the script:
use diagnostics;
Reloading again, error_log
shows:
Variable "$counter" will not stay shared at /home/httpd/perl/conference/counter.pl line 15 (#1) (W) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine. When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared. Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable. This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, they are automatically rebound to the current values of such variables.
Actually perl detected a closure, which is sometimes a wanted effect, but not in our case (see perldoc perlsub
for more information about closures). While diagnostics.pm
sometimes is handy for debugging purpose - it drastically slows down your
CGI script. Make sure you remove it in your production server.
Do you see a nested named subroutine in my script? I do not!!! What is going on? I suggest to report a bug. But
wait, may be a perl interpreter sees the script in a different way, may be
the code goes through some changes before it actually gets executed? The
easiest way to check what's actually happening is to run the script with
debugger, but since we must debug it when it's being executed by the
server, normal debugging process wouldn't help, for we have to invoke the
debugger from within the webserver. Luckily Doug wrote an
Apache::DB
module and we will use it to debug my script. I'll do it non-interactively
(while you can debug interactively with
Apache::DB
). I change my http.conf
with:
PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2" PerlModule Apache::DB <Location /perl> PerlFixupHandler Apache::DB SetHandler perl-script PerlHandler Apache::Registry::handler Options ExecCGI PerlSendHeader On </Location>
Comment out 'use diagnostics;
', restart the server and call the
counter.pl
from your browser. On the surface nothing changed - we still see the
correct output as before, but two things happened at the background: first
-- the /tmp/db.out
was written, with a complete trace of the code that was executed, second -- error_log
file showed us the whole code that was executed as a side effect of
reporting the warning we saw before: Variable "$counter" will not
stay shared at (eval 52) line 15...
. In any case that's the code that actually is being executed:
package Apache::ROOT::perl::conference::counter_2epl; use Apache qw(exit); sub handler { BEGIN { $^W = 1; }; $^W = 1; use strict; print "Content-type: text/html\r\n\r\n"; my $counter = 0; for (1..5) { increment_counter(); } sub increment_counter{ $counter++; print "Counter is equal to $counter !<BR>\n"; } }
What do we learn from this discovering? First that every cgi script is
being cached under a package whose name is compounded from
Apache::ROOT::
prefix and the relative part of the script's URL (perl::conference::counter_2epl
) by replacing all occurrences of
/
with ::
. That's how mod_perl knows what script should be fetched from cache - each
script is just a package with a single subroutine named handler
. Now you understand why diagnostics
pragma talked about inner (nested) subroutine - increment_counter
is actually a nested sub. In every script each subroutine is nested inside
the handler
subroutine.
One of the workarounds is to use global declared variables, with
vars
pragma.
# !/usr/bin/perl -w use strict; use vars qw($counter); print "Content-type: text/html\r\n\r\n"; $counter = 0; for (1..5) { increment_counter(); } sub increment_counter{ $counter++; print "Counter is equal to $counter !<BR>\n"; }
There is no more closure
effect, since there is no my()
(lexically) defined variable being used in the nested subroutine.
Another approach is to use fully qualified variables, which is even better, since less memory will be used, but it adds an overhead of extra typing:
#!/usr/bin/perl -w use strict; print "Content-type: text/html\r\n\r\n"; $main::counter = 0; for (1..5) { increment_counter(); } sub increment_counter{ $main::counter++; print "Counter is equal to $main::counter !<BR>\n"; }
Another working but not quite good solution, is always to pass the variable as an argument. It's not good when the variable can be quite big, so it adds an overhead of time and memory.
#!/usr/bin/perl -w use strict; print "Content-type: text/html\r\n\r\n"; my $counter = 0; for (1..5) { increment_counter($counter); } sub increment_counter{ my $counter = shift || 0 ; $counter++; print "Counter is equal to $counter !<BR>\n"; }
It's important to understand that the closure effect happens only with code
that Apache::Registry
wraps with a declaration of the
handler
subroutine. If you put your code into a library or module, which the main
script require()'s
or use()'s,
there is no such a
problem. For example if we put the subroutine increment_counter()
into a mylib.pl
(e.g. save it in the same directory as the main script) and
require()
it, there will be no problem at all. (Don't forget
the 1;
at the end of the library or the require()
might fail.)
mylib.pl: ---------- sub increment_counter{ $counter++; print "Counter is equal to $counter !<BR>\n"; } 1; ----------
counter.pl: ---------- #!/usr/bin/perl -w use strict; require "./mylib.pl"; print "Content-type: text/html\r\n\r\n"; my $counter = 0; for (1..5) { increment_counter(); } ----------
Personally, unless the script is too short, I've got used to write all the
code in the external libraries, and to have only a few lines in the main
script, generally to call the main function of my library. Usually i call
it init()
. I don't worry about closure effect anymore (unless I create it myself :).
You shouldn't be intimidated by this issue at all, since Perl is your friend. Just keep the warnings mode On and whenever you will have this effect in place, Perl will gladly tell you that by saying:
Variable "$counter" will not stay shared at ...[snipped]
Just don't forget to check your error_log file, before going in production!
BTW, the above example was pretty boring. In my first days of using mod_perl, I wrote some cool user registration program. I'll show a very simple represenataion of this program.
use CGI; $q = new CGI; my $name = $q->param('name') print_respond(); sub print_respond{ print "Content-type: text/html\n\n"; print "Thank you, $name!"; }
My boss and I have checked the program at the development server at it worked OK. So we decided to put it in production, everything was OK, but my boss decided to keep on checking by submitting a variations of his profile. Imagine what was the surprise when after submitting his name (let's say ``Me Boss'' :), he saw a response ``Thank you, Stas Bekman!''. What happened is that I tried the production system as well. I was new to mod_perl stuff and was so excited with the speed improve, I didn't notice the clusure problem and it hit me. At the beginning I thought that may be Apache started to confuse connection, by returning responses from other people's requests. I was wrong of course. Why didn't we notice this when we were trying the system at our development server? Keep reading and you will understand what was the problem.
Now let's return to our original example and proceed with the second mystery we have noticed. Why did we see inconsistent results over numerous reloads. That's very simple. Every time a server gets a request to process, it handles it over one of the children, generally in a round robin fashion. So if you have 10 httpd children alive, first 10 reloads might seem to be correct. Since the closure starts to effect from the second re-invocation, consequent reloads return unexpected results. Moreover children don't serve the same request always consequently, at any given moment one of the children could serve more times the same script than any other. That's why we saw that strange behavior.
And now you understand why we didn't notice the problem with the user
registration system in the last example I've presented. First we didn't
look at the error_log files. (As a matter of fact we did, but there were so
many warnings in there, we couldn't tell what are the important ones and
what aren't). Then we didn't test the system under
-X
flag (single mode) and we have had too many server children running to
notice the problem.
A workaround is to run the server in a single server mode. You achieve this
by invoking the server with -X
parameter (httpd -X
). Since there is no other servers (children) running - you will detect the
problem on the second reload. But before that let the error_log
to help you detect most of the possible errors - most of the warnings can
become errors, so you better make sure to check every warning that is being
detected by perl, and probably to write the code in a way, that none of the
warnings will show up in the error_log
. If your
error_log
file is being filled up with hundreds of lines on every script invocation -
you will have a problem to locate and notice real problems.
Of course none of the warnings will be reported if the warning mechanism
will not be turned ON. With mod_perl it is also possible to turn on
warnings globally via the PerlWarn directive, just add into a
httpd.conf
:
PerlWarn On
You can turn it off within your code with local $^W=0
. on the local basis (or inside the block). If you write $^W=0
you disable the warning mode everywhere inside the child, $^W=1
enables it back. So if perl warns you somewhere you sure it's not a
problem, you can locally disable the warning, e.g.:
[snip] # we want perl to be quiet here - # we don't care whether $a was initialized local $^W = 0; print $a; local $^W = 1; [snip]
Of course this is not a way to fix initialization and other problems, but sometimes it helps.
Sometimes it's very hard to understand what the warning complains about, you see the source code, but you cannot understand why some specific snippet produces warning. The mystery is in fact that the code can be called from different places, e.g when it's a subrotine.
I'll show you an example of such code.
local $^W=1; good(); bad(); sub good{ print_value("Perl"); } sub bad{ print_value(); } sub print_value{ my $var = shift; print "My value is $var\n"; }
In the code above, there is a sub that prints the passed value, sub
good
that passes correctly the value and sub bad
where we forgot to pass it. When we run the script, we get the warning:
Use of uninitialized value at ./warning.pl line 15.
We can clearly see that there is an undefined value at the line, that attempts to print it:
print "My value is $var\n";
But how do we know, why it was undefined? The solution is quite simple. What we need is a full stack trace which triggered the warning.
The Carp
module comes to help with its cluck()
function. Let's modify the script:
use Carp (); local $SIG{__WARN__} = \&Carp::cluck; local $^W=1; good(); bad(); sub good{ print_value("Perl"); } sub bad{ print_value(); } sub print_value{ my $var = shift; print "My value is $var\n"; }
Now when we execute it, we would see:
Use of uninitialized value at /home/httpd/perl/book/warning.pl line 17. Apache::ROOT::perl::book::warning_2epl::print_value() called at /home/httpd/perl/book/warning.pl line 12 Apache::ROOT::perl::book::warning_2epl::bad() called at /home/httpd/perl/book/warning.pl line 5 Apache::ROOT::perl::book::warning_2epl::handler('Apache=SCALAR(0x84b1154)') called at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139 eval {...} called at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139 Apache::Registry::handler('Apache=SCALAR(0x84b1154)') called at PerlHandler subroutine `Apache::Registry::handler' line 0 eval {...} called at PerlHandler subroutine `Apache::Registry::handler' line 0
Take a moment to understand the trace, the only part that we are intersted
in is the one that starts when actuall script is being called, so we can
skip the Apache::Registry
trace part. So we get:
Use of uninitialized value at /home/httpd/perl/book/warning.pl line 17. Apache::ROOT::perl::book::warning_2epl::print_value() called at /home/httpd/perl/book/warning.pl line 12 Apache::ROOT::perl::book::warning_2epl::bad() called at /home/httpd/perl/book/warning.pl line 5
which tells us that the code that triggered the warning was:
Apache::Registry code => bad() => print_value()
We go into a bad()
and indeed see that we forgot to pass the variable. Of course when you
write a subroutine like print_value
it could be a good idea to check the passed arguments before starting the
execution. But it was ``good'' enough to show you how to ease the code
debugging process.
Sure, you would say. I could find the problem by a simple inspectation of the code. You are right, but I promise you that your task would be quite complicated and time consuming for the code of thousands lines.
Notice the local()
keyword, before the settings of the
$SIG{__WARN__}
. Since it's a global variable, forgetting to use
local()
will enforce this setting for all the scripts running under the same
process. And if it's wanted behaviour, for example in the development
server, you better do it in the startup file, where you can easily switch
this feature on and off when the server restarts.
As you have noticed warnings report the line number the event happened at, so it's supposed to help to find the problematic code. The problem is that many times the line numbers are incorrect, because certain use of the eval operator and ``here'' documents are known to throw off Perl's line numbering.
META: move here the code that explains the settings of #line
While having a warning mode turned On is a must in a development server, you better turn it globally Off in a production server, since if every CGI script generates only one
warning per request, and your server serves millions of requests per day -
your log file will eat up all of your disk space and machine will die. My
production serves has the following directive in the httpd.conf
:
PerlWarn Off
While we are talking about control flags, another even more important flag
is -T
which turns On the Taint mode On. Since this is very broad topic I'll not discuss it here, but if you
aren't forcing all of your scripts to run under Taint mode you are looking for a trouble (always remember about malicious users).
To turn it on, add to
httpd.conf
:
PerlTaintCheck On
When you start running your scripts under mod_perl, you might find yourself
in situation where a script seems to work, but sometimes it screws up. And
the more it runs without a restart, the more it screws up. Many times you
can resolve this problem very easily. You have to test your script under a
server running in a single process mode (httpd -X
).
Generally the problem you have is of using global variables. Since global variables don't change from one script invocation to another unless you change them, you can find your scripts do ``fancy'' things.
The first example is amazing -- Web Services. Imagine that you enter some site you have your account on (Free Email Account?). Now you want to see what other users read.
You type in a username you want to peek at and a dummy password and try to enter the account. On some services it does works!!!
You say, why in the world does this happen? The answer is simple: Global Variables. You have entered the account of someone who happened to be served by the same server child as you. Because of sloppy programming, a global variable was not reset at the beginning of the program and voila, you can easily peek into other people's emails! Here is an example of sloppy written code:
use vars ($authenticated); my $q = new CGI; my $username = $q->param('username'); my $passwd = $q->param('passwd'); authenticate($username,$passwd); # failed, break out die "Wrong passwd" unless $authenticated == 1; # user is OK, fetch user's data show_user($username); sub authenticate{ my ($username,$passwd) = @_; # some checking $authenticated = 1 if (SOMETHING); }
Do you see the catch? With the code above, I can type in any valid username and any dummy passwd and enter that user's account, if someone has successfully entered his account before me using the same child process! Since $authenticated is global - if it becomes 1 once it'll be 1 for the remainder of the child's life!!! The solution is trivial -- reset $authenticated to 0 at the beginning of the program. (Or many other different solutions). Of course this example is trivial -- but believe me it happens!
Just another little one liner that can spoil your day, assuming you forgot to reset the $allowed variable. It works perfectly OK in plain mod_cgi:
$allowed = 1 if $username eq 'admin';
But you will let any user to admin your system with the line above (again assuming you have used the same child prior to some user request).
Another good example is usage of the /o regular expression qualifier, which compiles a regular expression once, on its first execution and never recompile it again. This problem can be difficult to detect, as after restarting the server each request you make will be served by a different child process, and thus the regex pattern for that child will be compiled fresh. Only when you make a request that happens to be served by a child which has already cached the regexp will you see the problem. Generally you miss that and when you press reload, you see that it works (with a new, fresh child) and then it doesn't (with a child that already cached the regexp and wouldn't recompile because of /o.) The example of such a case would be:
my $pat = $q->param("keyword"); foreach( @list ) { print if /$pat/o; }
To make sure you don't miss these bugs always test your CGI in single process. To solve this particular /o problem refer to Compiled Regular Expressions.
There are a few things that behave differently under mod_perl. It's good to know what they are.
Scripts under Apache::Registry
do not run in package main, they run in a unique name space based on the requested URI. For example,
if your URI is /perl/test.pl the package will be called
Apache::ROOT::perl::test_2epl.
When you use(),
require()
or do()
a
file, Perl uses a @INC
variable, for a list of directories to search for the file. If the file
that you want to load is not located in one of the listed directories. You
have to tell Perl where to find the file.
In order to require()
a file located at
/home/httpd/perl/examples/test.pl
you would:
Use a relative path to one of the directories in the @INC
. Let's say that one of the directories is /home/httpd/perl
.
require("examples/test.pl");
or modify the @INC
on the fly:
use lib qw("examples"); require("test.pl");
That's when the script might be called from any place. If you always
execute the script from the directory it resides in (examples
here), you can do:
use lib qw("."); require("test.pl");
But the latest two snippets would fail the reload by
Apache::StatINC
module (which helps to automatically reload scripts while developing code)
since when running under mod_perl, @INC
is being freezed, once the server is up and cannot be updated. The only
chance to temperarely modify the @INC
is while the script is being loaded and compiled for the first time, and
when it's done, its value will be reset to the original one. Your only
chance to change the
@INC
is to modify it in the startup or Apache configuration files.
You can write a full path to the script:
require("/home/httpd/perl/examples/test.pl")
or
use lib qw("/home/httpd/perl/examples/"); require("test.pl");
the latter will fail the reload of Apache::StatINC
as above. The former will set a correct entry into a %INC
variable.
This approach is quite discouraging, since if you want to move this script around -- it's quite difficult since you have to modify the path all the time. And it can be pretty painful when you move scripts from development to production server.
But wait, Graham Barr & Nick Ing-Simmons made a present for you by
writing the FindBin
module, if you know about this module, you don't need to write a hardcoded
path. The following snippet does all the work for you:
use FindBin (); use lib "$FindBin::Bin"; use MyModule;
It works exactly like with hardcoded path, and it sets a correct
%INC
entry as if you have used the hardcoded path while using
require().
When you develop plain CGI scripts, you can just change the code, and rerun the CGI from your browser. Since the script isn't cached in memory, the next time you call it the server starts up a new perl process, which recompiles it from scratch. The effects of any modifications you've applied are immediately present.
The situation is different with Apache::Registry
, since the whole idea is to get maximum performance from the server. By
default, the server won't spend the time to check whether any included
library modules have been changed. It assumes that they weren't, thus
saving a few milliseconds to stat()
the source file (multiplied by however many modules/libraries you are use()
-ing and/or require()
-ing in your script.) The only check that is being done is whether your
main script has been changed. So if you have only one script that doesn't
use()
(or require()
) other perl modules (or packages), there is nothing new about it. If
however, you are developing a script that includes other modules, the files
you use()
or require()
aren't being checked whether they have been modified.
Acknowledging this, how do we get our modperl-enabled server to recognize changes in any library modules? Well, there are a couple of techniques:
The simplest approach is to restart the server each time you apply some change to your code. See Server Restarting techniques.
After restarting the server about 100 times, you will be tired and will
look for another solutions. Help comes from the
Apache::StatINC
module.
Apache::StatINC
reloads %INC
files when updated on disk. When Perl pulls a file via require, it stores
the filename in the global hash %INC
. The next time Perl tries to require the same file, it sees the file in %INC
and does not reload from disk. This module's handler iterates over %INC
and reloads the file if it has changed on disk.
To enable this module just add two lines to httpd.conf
file.
PerlModule Apache::StatINC PerlInitHandler Apache::StatINC
To be sure it really works, turn on the debug mode on your development box
with PerlSetVar StatINCDebug On
. You end up with something like:
PerlModule Apache::StatINC <Location /perl> SetHandler perl-script PerlHandler Apache::Registry::handler Options ExecCGI PerlSendHeader On PerlInitHandler Apache::StatINC PerlSetVar StatINCDebug On </Location>
Beware that only the modules located in @INC
are being reloaded on change, and you can change the @INC
only before the server has been started (in startup file).
Whatever you do in your scripts and modules which are being required()
after the server startup will not have any effect on @INC
. When you do:
use lib qw(foo/bar);
the @INC
is being changed only for the time the code is being parsed and compiled.
When it's over the @INC
is being reset to its original value. To make sure that you have set a
correct @INC
fetch http://www.nowhere.com/perl-status?inc
and look at the bottom of the page. (I assume you have configured the /perl-status location.)
Also, notice the following caveat: While ``.
'' is in the @INC
-- perl knows to require()
files relative to the script directory. Once the script was parsed - the
server doesn't remember the path any more! So you end up with broken entry
in %INC
like:
$INC{bar.pl} eq "bar.pl"
If you want Apache::StatINC to reload your script - modify the @INC
at the server startup file! or use a full path in require()
call.
Checking all the modules in %INC every time can add a large overhead to server response times, and you
certainly would not want
Apache::StatINC
module to be enabled in your production site's configuration. But sometimes
you want to have a configuration file to be reloaded when this updated,
without restarting the server.
This is especially important feature if you have a person that is allowed to modify some of the tool configuration, but it's very undesirable for this person to telnet to the server to restart it, for either this admin person's lack of profeccional skills or because of security reasons -- you don't want to give a root password, unless you have to.
Since we are talking about configuration files, I would like to jump on this and show you some good and bad approaches of configuration file writing.
If you have a configuration file of just a few variables, it doesn't really matter how you do it. But generally this is not a case. Configuration files tend to grow as a project grows. It's very relevant to the projects that generate HTML files, since the they tend to demand many easily configurable parameters, like headers, tails, colors and so on.
So let's start from the basic approach that is being mostly deployed by many CGI scripts writers. This approach is based on having many variables defined in a separate configuration file. For example:
$cgi_dir = "/home/httpd/perl"; $cgi_url = "/perl"; $docs_dir = "/home/httpd/docs"; $docs_url = "/"; $img_dir = "/home/httpd/docs/images"; $img_url = "/images"; ... many more config params here ... $color_hint = "#777777"; $color_warn = "#990066"; $color_normal = "#000000";
Now when we want to use these variables in the mod_perl script we must
declare them all with help of use vars
in the script, because of the use strict;
pragma, which demands all the variables to be declared if used in the
script.
So we start the script with:
use strict; use vars qw($cgi_dir $cgi_url $docs_dir $docs_url ... many more config params here .... $color_hint $color_warn $color_normal );
This is a nightmare to maintain such a script, especially if not all the features were written yet, so you keep adding and reming the names to the list. But that's not a big deal.
Since we want our code clean, we start the configuration file with
use strict;
as well, so we have to list the variables here as well. Second list to
maintain.
The moment you will have many scripts, you will get into a problem of collisions between configuration files, where one of the best solutions is a package declaration, which makes the scripts unique (if you declare unique package names of course).
The moment you add a package declaration and think that you are done, you
just realize that the nightmare has just begun. The moment you have
declared the package, you cannot just require()
the file and
use the variables, since they now belong to a different package. So you
have ether to modify all you script to use a fully qualified notation like $My::Config::cgi_url
instead of just $cgi_url
or to import the need variables into a script that is going to use them.
Since you don't want extra typing to make the variables fully qualified, you would go for importing approach. But your configuration package has to export them first. It means that you have to list all the variables again and now to keep at least three variable lists updated, when you do some changes in the naming of the configuration variables. And that's when you have only one script that uses the configuration file, in a general case you have many of them. So now our example of config file looks like that.
package My::Config; use strict; BEGIN { use Exporter (); @My::HTML::ISA = qw(Exporter); @My::HTML::EXPORT = qw(); @My::HTML::EXPORT_OK = qw($cgi_dir $cgi_url $docs_dir $docs_url ... many more config params here .... $color_hint $color_warn $color_normal); } use vars qw($cgi_dir $cgi_url $docs_dir $docs_url ... many more config params here .... $color_hint $color_warn $color_normal ); $cgi_dir = "/home/httpd/perl"; $cgi_url = "/perl"; $docs_dir = "/home/httpd/docs"; $docs_url = "/"; $img_dir = "/home/httpd/docs/images"; $img_url = "/images"; ... many more config params here ... $color_hint = "#777777"; $color_warn = "#990066"; $color_normal = "#000000";
And in the code:
use strict; use My::Config qw($cgi_dir $cgi_url $docs_dir $docs_url ... many more config params here .... $color_hint $color_warn $color_normal ); use vars qw($cgi_dir $cgi_url $docs_dir $docs_url ... many more config params here .... $color_hint $color_warn $color_normal );
But as we know this approach is especially bad in a context of mod_perl usage, since exported variables add a memory overhead. The more variables are being exported the more memory you use. Now as usual we rememeber to multiply this overhead by number of the servers we are going to run and we receive a pretty big number, which could be used to run a few more servers instead.
As a matter of fact things aren't so horrible, since you can group your
variables, and call the groups by special names called tags, which can
later be used as arguments to the import()
or
use().
You probably familiar with:
use CGI qw(:standard :html);
We can implement it quite easily, with help of exporter_tags()
and export_ok_tags()
from Exporter
. For example:
BEGIN { use Exporter (); use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); @ISA = qw(Exporter); @EXPORT = qw(); @EXPORT_OK = qw(); %EXPORT_TAGS = ( vars => [qw($fname $lname)], subs => [qw(reread_conf untaint_path)], ); Exporter::export_ok_tags('vars'); Exporter::export_ok_tags('subs'); }
Yes, you export subroutines exactly like variables, since what's actually being exported is a symbol. The definition of these subroutines is not shown here.
In your code now you can write:
use My::Config qw(:subs :vars);
Regarding groups of groups. Like the :all
tag from CGI.pm
, which is a group tag of all other groups. It will require a little more
magic from your side, but you can always save your time and look up the
solution inside the code of CGI.pm
. It's just a matter of a little code to expand all the groups recursively.
After going through a pain of maintaining a list of variables in a big project with a huge configuration file (more than 100 variables) and many files actually using them, I have come up with a much simpler solution, using a single hash, and having all the variables kept inside. Now my configuration file looks like:
package My::Config; use strict; BEGIN { use Exporter (); @My::Config::ISA = qw(Exporter); @My::Config::EXPORT = qw(); @My::Config::EXPORT_OK = qw(%c); } use vars qw(%c); %c = ( dir => { cgi => "/home/httpd/perl", docs => "/home/httpd/docs", img => "/home/httpd/docs/images", }, url => { cgi => "/perl", docs => "/", img => "/images", }, color => { hint => "#777777", warn => "#990066", normal => "#000000", }, );
A good perl style suggests keeping a comma at the end of lists. That's because additional items are tend to be added to the end of the list, and when you keep a last comma in place, you never have to remember to add one when you add a new item.
So now the script looks like:
use strict; use My::Config qw(%c); use vars qw(%c) print "Content-type: text/plain\n\n"; print "My url docs root: $c{url}{docs}\n";
Do you see the difference? The whole mess has gone, there is only one variable to worry about.
So far so good, but let's make it even better. I would like to get rid of Exporter
stuff at all. I remove all the exporting code so my config file now looks
like:
package My::Config; use strict; use vars qw(%c); %c = ( dir => { cgi => "/home/httpd/perl", docs => "/home/httpd/docs", img => "/home/httpd/docs/images", }, url => { cgi => "/perl", docs => "/", img => "/images", }, color => { hint => "#777777", warn => "#990066", normal => "#000000", }, );
And the code
use strict; use My::Config (); print "Content-type: text/plain\n\n"; print "My url docs root: $My::Config::c{url}{docs}\n";
Since we still want to save us lots of typing, since now we need to use a
fully qualified notation like in $My::Config::c{url}{docs}
, let's use a magical perl's aliasing feature. I'll modify the code to be:
use strict; use My::Config (); use vars qw(%c); *c = \%My::Config::c; print "Content-type: text/plain\n\n"; print "My url docs root: $c{url}{docs}\n";
I have alised *c
glob with \%My::Config::c
hash reference. From now on %My::Config::c
and %c
are the same hash. You can read from or modify any of them, both variables
are the same one.
Just one last little notice. Sometimes you see a lot of redundance in the configuration variables, like:
$cgi_dir = "/home/httpd/perl"; $docs_dir = "/home/httpd/docs"; $img_dir = "/home/httpd/docs/images";
Now if you want to move the base path "/home/httpd"
into a new place, it demands lots of typing. Of course the solution is:
$base = "/home/httpd"; $cgi_dir = "$base/perl"; $docs_dir = "$base/docs"; $img_dir = "$base/docs/images";
But you cannot do the same trick with hash. This wouldn't work:
%c = ( base => "/home/httpd", dir => { cgi => "$base/perl", docs => "$base/docs", img => "$base/docs/images", }, );
But nothing stops us from adding additional variables, which are lexically
scoped with my().
The following code is correct.
my $base = "/home/httpd"; %c = ( dir => { cgi => "$base/perl", docs => "$base/docs", img => "$base/docs/images", }, );
We have just learned how to make configuration files easily maintainable, and how to save memory by avoiding variables exporting into a script's namespace.
Now back to the task of dynamically reloading of configuration files.
First, lets see a simple case, when we just have to watch after a simple configuration file, like this one. Imagine a script that tells who is patch pumkin of the current perl release.
Sidenote: <jargon> A humourous term for the token - the object (notional or real) that gives its possessor (the ``pumpking'' or the ``pumpkineer'') exclusive access to something, e.g. applying patches to a master copy of source (for which the pumpkin is called a ``patch pumpkin'').
use CGI (); use strict; my $fname = "Larry"; my $lname = "Wall"; my $q = new CGI; print $q->header(-type=>'text/html'); print $q->p("$fname $lname holds the patch pumpkin for this perl release.");
The script has a hardcoded value for the name. It's very simple: initialize the CGI object, print the proper HTTP header and tell the world who is the current patch pumpkin.
When the patch pumkin changes we don't want to modify the script.
Therefore, we put the $fname
and lname
variables into a configuration file.
$fname = "Gurusamy"; $lname = "Sarathy"; 1;
Please notice that there is no package declaration in the above file, so
the code will be evaluated in the caller's package or in the
main::
package if none was declared. It means that variables
$fname
and $lname
will override (or initialize if they weren't yet) the variables with the
same names in the caller's namespace. This works for global variables only
-- you cannot update lexically defined (with my())
variables
by this technique.
You have started the server and everything is working properly. After a
while you decide to modify the configuration. How do you let your running
know, that the configuration was modified, without restarting the server,
remember we are in production and server restarting can be quite expensive
for us. ? One of the simplest solutions is to poll the file's modification
time by calling stat()
before the script starts to do a real
work, and if we see that the file was updated, we force a reconfiguration
of the variables located in this file. We will call the function that
reloads the configuration reread_conf()
and it accepts a
single argument, which is a relative path to the configuration file.
If your CGI script is being invoked under Apache::Registry
handler, you can put the configuration file in the same directory as a
script or below it and path a relative path to the file, since
Apache::Registry
calls a chdir()
to the script's directory before it starts the
script's execution. Otherwise you would have to make sure that the file
will be found. do()
does search the @INC
libraries.
use vars qw(%MODIFIED); sub reread_conf{ my $file = shift; return unless $file; return unless -e $file and -r _; unless ($MODIFIED{$file} and $MODIFIED{$file} == -M _){ my $return; unless ($return = do $file) { warn "couldn't parse $file: $@" if $@; warn "couldn't do $file: $!" unless defined $return; warn "couldn't run $file" unless $return; } $MODIFIED{$file} = -M _; # Update the MODIFICATION times } } # end of reread_conf
We use do()
to reload the code in this file and not
require()
because, do()
reloads the file
unconditionally, while require()
will not load the file if it
was already loaded in one of the previous requests, since there will be an
entry in the %INC
where the key is the name of the file and the value the path to it. That's
how Perl keeps track of loaded files and saves overhead of reloading when
it has to load the same file again. You generally doesn't notice that with
plain perl scripts, but in mod_perl it's being used all the time (since the
same script is being reloaded all the time, and all the
require()'s
files are already loaded after a first request for
each process.
Nevertheless, do()
keeps track of the current filename for
error messages, searches the @INC
libraries, updates the %INC
if the file is found.
To explain all the possible warnings the script emits if something went
wrong with operation, it's just for a matter of completeness. Generally you
would do all these checks. If do()
cannot read the file, it returns undef
and sets $!
to the error. If do()
can read the file but cannot compile it, it returns undef
and sets an error message in $@
. If the file is successfully compiled, do()
returns the value of the last expression evaluated.
Also the configuration file can be broken if someone has incorrectly
modified it. We don't want the service to go broken, because of that. We
just trap the possible failure to do()
the file and ignore the
changes, by the resetting the modification time. It might be a good idea to
send an email to system administrator about the problem.
Notice however, that since do()
updates the %INC
like require()
does, if you are using Apache::StatINC
, it will attempt to reload this file before the reread_conf()
call, so if it the file wouldn't compile the request would be aborted. This
shouldn't be a problem since Apache::StatINC
shouldn't be used in production (because it slows things down by
stat()'ing
all the files listed in %INC
).
Note that we assume that the entire purpose of this function is to reload
the configuration if that was changed, that's why there is no possible
failure for this function. If something goes wrong we just return. This
approach would be incorrect if you are going to initialize the variables
thru this method on the first invocation of the script. If you do, you will
want to replace each occurence of return()
and
warn()
with die().
I used the above approach when I've had a huge configuration file that was loaded only once at the server startup, and another little configuration file that included only a few variables that could be updated by hand or through the web interface, and those variables were duplicates in the main config file.
So if webmaster breaks the syntax in this dynamic file while updating it by hand, it wouldn't affect the main configuration file (which was write-protected) and so the proper executing of the programs. Soon we will see a simple web interface which allows to modify the configuration file without actually breaking it.
A sample script using the presented subroutine would be:
use vars qw(%MODIFIED $fname $lname); use CGI (); use strict; my $q = new CGI; print $q->header(-type=>'text/plain'); my $config_file = "./config.pl"; reread_conf($config_file); print $q->p("$fname $lname holds the patch pumpkin for this perl release."); sub reread_conf{ my $file = shift; return unless $file; return unless -e $file and -r _; unless ($MODIFIED{$file} and $MODIFIED{$file} == -M _){ my $return; unless ($return = do $file) { warn "couldn't parse $file: $@" if $@; warn "couldn't do $file: $!" unless defined $return; warn "couldn't run $file" unless $return; } $MODIFIED{$file} = -M _; # Update the MODIFICATION times } } # end of reread_conf
Remember that you should be using (stat $file)[9]
instead of -M
$file
if you are modifying the $^M
variable. In some of my scripts, I reset $^M
to the time of the script invocation with
"$^M = time()"
, so I can perform -M
and alike (-A
, -C
) file status testings relative to the script invocation time and not the
time the process was started.
If your configuration file is more sophisticated and it declares a package and exports variables, the above code will work just as well. Even if you think that you will have to re-import() variables, they are just there and when do recompiles the code, the originally imported variables get updates with the values from the reloaded code.
The CGI script below allows a system administrator, to dynamically update configuration file through the web interface. Combining this with the code we have just showed to reload the modified files, you get a complete suite of dynamically reconfigurable system which doesn't require server restart and can be performed from any machine having just a web interface (a simple browser connected to the Internet).
Let's say you have a configuration file like:
package MainConfig; use strict; use vars qw(%c); %c = ( name => "Larry Wall", release => "5.000", comments => "Adding more ways to do the same thing :)", other => "More config values", hash => { foo => "bar", fooo => "barr", }, array => [qw( a b c)], );
You want to make the variables name
, release
and comments
dynamically configurable. Which means that you want to have a web interface
with input form that allows to modify these variables. Once modified you
want to update the configuration file and propogate the changes to all the
currently running processes. Quite a simple task.
Let's see the main stages of this algorithm. Create a form with preset current values of the variables. Let the administrator to modify and submit the changes. Validate that the submitted information is correctly formatted (numeric fields should carry numbers, literal - words and etc). Update the configuration file. Update the modified value in the memory of the current process. Present the form as before but with updated fields if any.
The only part that seems to be complicated to implement is a configuration file update. For a few reasons. If updating the file breaks it - the whole service wouldn't work. If the file is very big and includes comments and complex data structures, parsing the file can be quite a challenge.
So let's simplify the task. We wouldn't touch the original configuration file, if all we want is to updated a few variables, why don't we create a little configuration file with just a variables that can be modified throught the web interface and overwrite it each time there is something to be changed. This way we don't have to parse the file, before updating it. And if the main configuration file is going to be changed, we don't care about it -- since we aren't dependent on it any more.
Moreover, we will have these dynamically updated variables duplicated, they will show up in both places - in the main file and in the dynamic file. We do it, to simplify the maintainance. When a new release is being installed the dynamic configuration file shouldn't exist at all. It'll be created only after a first update. The only change it requires to the main code is to add a snippet of code to load this file if it exists and was changed as we just saw.
This additional code must be executed after the main configuration file is being loaded. That way the updated variables would override the default values in the main file.
META: extend on the comments:
# remember to run this code under taint mode use strict; use vars qw($q %c $dynamic_config_file %vars_to_change %validation_rules); use CGI (); use lib qw(.); use MainConfig (); *c = \%MainConfig::c; $dynamic_config_file = "./config.pl"; # load the dynamic conf file if exists, and override the default # values from the main config file do $dynamic_config_file if -e $dynamic_config_file and -r _; # fields that can be changed and their titles %vars_to_change = ( 'name' => "Patch Pumkin's Name", 'release' => "Current Perl Release", 'comments' => "Release Comments", ); %validation_rules = ( 'name' => sub { $_[0] =~ /^[\w\s\.]+$/; }, 'release' => sub { $_[0] =~ /^\d+\.[\d_]+$/; }, 'comments' => sub { 1; }, ); $q = new CGI; print $q->header(-type=>'text/html'), $q->start_html(); my %updates = (); # We always rewrite the dynamic config file, so we want all the # vars to be passed but to save the time we will only do checking # of vars that that was changed the rest will be retrieved from # the 'prev_foo' values foreach (keys %vars_to_change) { # copy var so we can modify it my $new_val = $q->param($_) || ''; # strip a possible ^M char (DOS/WIN) $new_val =~ s/\cM//g; # push to hash if was changed $updates{$_} = $new_val if defined $q->param("prev_".$_) and $new_val ne $q->param("prev_".$_); } # Notice that we cannot trust the previous values of the variables # since they were presented to user as hidden form variables, and # of course user can mangle those. In our case we don't care since # it cannot make any damage, since as you will see in a minute we # verify each variable by the rules we define. # Process if there is something to process. Will be not called if # it's invoked a first time to diplay the form or when the form # was submitted but the values weren't modified (we know that by # comparing with the previous values of the variables, which are # the hidden fields in the form) # process and update the values if valid process_change_config(%updates) if %updates; # print the update form conf_modification_form(); # update the config file but first validate that the values are correct ones ######################### sub process_change_config{ my %updates = @_; # dereference # we will list here all the malformatted vars my %malformatted = (); print $q->b("Trying to validate these values<BR>"); foreach (keys %updates) { print "<DT><B>$_</B> => <PRE>$updates{$_}</PRE>"; # now we have to handle each var to be changed very very carefully # since this file goes immediately into production! $malformatted{$_} = delete $updates{$_} unless $validation_rules{$_}->($updates{$_}); } # end of foreach my $var (keys %updates) # print warnings if there are any invalid changes print $q->hr, $q->p($q->b(qq{Warning! These variables were attempted to be changed, but found malformed, thus the original value will be preserved.}) ), join(",<BR>", map { $q->b($vars_to_change{$_}) . " : $malformatted{$_}\n" } keys %malformatted) if %malformatted; # Now complete the vars that weren't changed from the # $q->param('prev_var') values map { $updates{$_} = $q->param('prev_'.$_) unless exists $updates{$_} } keys %vars_to_change; # Now we have all the data that should be written into dynamic # config file # escape single quotes "'" while creating a file my $content = join "\n", map { $updates{$_} =~ s/(['\\])/\\$1/g; '$c{' . $_ . "} = '" . $updates{$_} . "';\n" } keys %updates; # now add '1;' to make require() happy $content .= "\n1;"; # keep the dummy result in $r so it'll not complain eval {my $res = $content}; if ($@) { print qq{Warning! Something went wrong with config file generation!<P> The error was : <BR><PRE>$@</PRE>}; return; } print $q->hr; # overwrite the dynamic config file use Symbol (); my $fh = Symbol::gensym(); open $fh, ">$dynamic_config_file.bak" or die "Can't open the $dynamic_config_file.bak for writing :$! \n"; flock $fh,2; # exclusive lock seek $fh,0,2; # rewind to the start truncate $fh, 0; # the file might shrink! print $fh $content; close $fh; # OK, now we make a real file rename "$dynamic_config_file.bak",$dynamic_config_file; # rerun it to update variables in the current process! Note that # it wouldn't update the variables in other processes. A special # code that watches the timestamps on the config file will do this # work for each process. Since the next invocation will update the # configuration anyway, why do we need to load it here? The reason # is simple, since we are going to fill form's input fields, with # the updated data. do $dynamic_config_file; } # end sub process_change_config ########################## sub conf_modification_form{ print $q->center($q->h3("Update Form")); print $q->hr, $q->p(qq{This form allows you to dynamically update the current configuration. You don\'t need to restart the server in order for changes to take an effect} ); # set the previous settings into the form's hidden fields, so we # know whether we have to do some changes or not map {$q->param("prev_$_",$c{$_}) } keys %vars_to_change; # raws for the table, go into the form my @configs = (); # prepare one textfield entries push @configs, map { $q->td( $q->b("$vars_to_change{$_}:"), ), $q->td( $q->textfield(-name => $_, -default => $c{$_}, -override => 1, -size => 20, -maxlength => 50, ) ), } qw(name release); # prepare multiline textarea entries push @configs, map { $q->td( $q->b("$vars_to_change{$_}:"), ), $q->td( $q->textarea(-name => $_, -default => $c{$_}, -override => 1, -rows => 10, -columns => 50, -wrap => "HARD", ) ), } qw(comments); print $q->startform('POST',$q->url),"\n", $q->center($q->table(map {$q->Tr($_),"\n",} @configs), $q->submit('','Update!'),"\n", ), map ({$q->hidden("prev_".$_, $q->param("prev_".$_))."\n" } keys %vars_to_change), # hidden previous values $q->br,"\n", $q->endform,"\n", $q->hr,"\n", $q->end_html; } # end sub conf_modification_form
Once updated the script generates a file like:
$c{release} = '5.6'; $c{name} = 'Gurusamy Sarathy'; $c{comments} = 'Perl rules the world!'; 1;
If you want to reload perlhandler on each invocation, the following trick will do it:
PerlHandler "sub { do 'MyTest.pm'; MyTest::handler(shift) }"
do()
will reload MyTest.pm
every request.
To make things clear before we go into details: each child process has its
own %INC
hash which is used to store information about its compiled modules. The
keys of the hash are the names of the modules or parameters passed to require()
(use()
). The values are the full or relative paths to these modules/files. Let's
say we have
my-lib.pl
and MyModule.pm
both located at /home/httpd/perl/my/
.
require "my-lib.pl"; use MyModule.pm; print $INC{"my-lib.pl"},"\n"; print $INC{"MyModule.pm"},"\n";
prints:
/home/httpd/perl/my/my-lib.pl /home/httpd/perl/my/MyModule.pm
Adding use lib
:
use lib qw(.); require "my-lib.pl"; use MyModule.pm; print $INC{"my-lib.pl"},"\n"; print $INC{"MyModule.pm"},"\n";
prints:
my-lib.pl MyModule.pm
require "my-lib.pl"; use MyModule.pm; print $INC{"my-lib.pl"},"\n"; print $INC{"MyModule.pm"},"\n";
wouldn't work, since perl cannot find the modules.
Adding use lib
:
use lib qw(.); require "my-lib.pl"; use MyModule.pm; print $INC{"my-lib.pl"},"\n"; print $INC{"MyModule.pm"},"\n";
prints:
my-lib.pl MyModule.pm
I'm talking about single child below!
Let's look at 3 faulty script's name space related scenarios:
First, You can't have 2 identical module names running under the same
server! Only the first one use()
'd or require()
'd will be compiled into the package, the request to the other identical
module will be skipped since server will think that it's already compiled.
It's already in the child's %INC
. (See Watching the server section to find out how you can know what is loaded and where)
So if you have two different Foo
modules in two different directories and two scripts script1.pl
and script2.pl
, placed like:
./perl/tool1/Foo.pm ./perl/tool1/tool1.pl ./perl/tool2/Foo.pm ./perl/tool2/tool2.pl
Where a sample code could be:
./perl/tool1/tool1.pl -------------------- use Foo; print "Content-type: text/html\n\n"; print "I'm Script number One<BR>\n"; foo(); --------------------
./perl/tool1/Foo.pm -------------------- sub foo{ print "<B>I'm Tool Number One!</B><BR>\n"; } 1; --------------------
./perl/tool2/tool2.pl -------------------- use Foo; print "Content-type: text/html\n\n"; print "I'm Script number Two<BR>\n"; foo(); --------------------
./perl/tool2/Foo.pm -------------------- sub foo{ print "<B>I'm Tool Number Two!</B><BR>\n"; } 1; --------------------
And both scripts call: use Foo;
-- only the first one called will know about Foo
, when you will call the second script it will not know about Foo
at all - it's like you've forgotten to write use
Foo;
. Run the server in single server mode to detect this kind of bug immediately.
You will see the following in the error_log file:
Undefined subroutine &Apache::ROOT::perl::tool2::tool2_2epl::foo called at /home/httpd/perl/tool2/tool2.pl line 4.
The above is true for the files you require()
as well (assuming that the required files do not declare a package). If you
have:
./perl/tool1/config.pl ./perl/tool1/tool1.pl ./perl/tool2/config.pl ./perl/tool2/tool2.pl
And both scripts do:
use lib qw(.); require "config.pl";
While the content of the scripts and config.pl
files is exactly like in the example above. Only the first one will
actually do the
require()
, all for the same reason that %INC
already includes the key "config.pl"! The second scenario is not different from the first one, since there is
no difference between use()
and
require()
if you don't have to import some symbols into a calling script.
What's interesting that the following scenario wouldn't work too!
./perl/tool/config.pl ./perl/tool/tool1.pl ./perl/tool/tool2.pl
where tool1.pl
and tool2.pl
both require()
the same
config.pl
.
There are 3 solutions for that: (make sure you read the whole item 3)
The first two faulty scenarios can be solved by placing your library modules in a subdirectory structure so that they have different path prefixes. The file system layout will be something like:
./perl/tool1/Tool1/Foo.pm ./perl/tool1/tool1.pl ./perl/tool2/Tool2/Foo.pm ./perl/tool2/tool2.pl
And modify the scripts:
use Tool1::Foo; use Tool2::Foo;
For require()
(scenario number 2) use the following:
./perl/tool1/tool1-lib/config.pl ./perl/tool1/tool1.pl ./perl/tool2/tool2-lib/config.pl ./perl/tool2/tool2.pl
And each script does respectively:
use lib qw(.); require "tool1-lib/config.pl";
use lib qw(.); require "tool2-lib/config.pl";
But this solution isn't good, since while it might work for you now, if you
add another script that wants to use the same module or
config.pl
file, it still wouldn't work as we saw in the third scenario. So let's see
better solutions.
Another option is to use a full path to the script, so it'll be compiled
into the name of the key in the %INC
;
require "/full/path/to/the/config.pl";
This solution solves the first two scenarios. I was surprised but it worked for the third scenario as well!
But with this solution you loose portability! (If you move the tool around in the file system you will have to change the base dir)
Declare a package in the required files! (Of course it should be unique to
the rest of the package names you use!) The %INC
will use the package name for the key! It's a good idea to build at least 2
level package names for your private modules. (e.g. MyProject::Carp
and not Carp
, since it will collide with existent standard package.) Even if as of the
time of your coding it doesn't exist yet - it might enter the next perl
distribution as a standard module and your code will become broken. Foresee
problems like this and save you a future trouble.
What are the implications of package declaration?
When you use()'d
or require()'d
files without package declarations, it was very convenient since all the
variables and subroutines were part of the main::
package, so any of them could be used as if they were part of the main
script. With package declarations things get more complicated. To be
correct -- not complicated, but awkward, since you will have to use
Package::function()
method to call a subroutine from a package
Package
and to access a global variable inside the same package you will have to
write $Package::some_variable
, you get a kind of typing overhead. You will be unable to access lexically
defined variables inside Package
(declared with my()
).
ou still can leave your scripts unchanged if you import the names of the global variables and subs into a main:: package's namespace, like:
use Module qw(:mysubs sub_b $var1 :myvars);
You can export both -- subroutines and global variables. This isn't a good
approach since it'll consume more memory for the current process. (See perldoc Exporter
for information about exporting variables and other symbols)
This solution completely covers the third scenario. By using different module names in package declarations, as explained above you solve the first two as well.
See also perlmodlib
and perlmod
manpages.
From the above discussion it should be clear that you cannot run a development and a production versions of the tools using the same apache server!
You have to run a separate server for each (it still can be the same machine, but the server will use a different port).
If you have the following:
PerlHandler Apache::Work::Foo PerlHandler Apache::Work::Foo::Bar
If you make a request that pulls in Apache/Work/Foo/Bar.pm
first, then the Apache::Work::Foo
package gets defined, so mod_perl does not try to pull in Apache/Work/Foo.pm
Apache::Registry
scripts cannot contain __END__
or __DATA__
tokens.
Why? Because Apache::Registry
scripts are being wrapped into a subroutine called handler
, like the script at URI /perl/test.pl
:
print "Content-type: text/plain\n\n"; print "Hi";
When the script is being executed under Apache::Registry
handler, it actually becomes:
package Apache::ROOT::perl::test_2epl; use Apache qw(exit); sub handler { print "Content-type: text/plain\n\n"; print "Hi"; }
So if you happen to put an __END__
tag, like:
print "Content-type: text/plain\n\n"; print "Hi"; __END__ Some text that wouldn't be normally executed
it will be turned into:
package Apache::ROOT::perl::test_2epl; use Apache qw(exit); sub handler { print "Content-type: text/plain\n\n"; print "Hi"; __END__ Some text that wouldn't be normally executed }
and you try to execute this script, you will receive the following warning:
Missing right bracket at q line 4, at end of line
And that's clear, Perl cuts everything after __END__
tag. The same thing applies to __DATA__
tag.
Also, rememeber that whatever applies to Apache::Registry
scripts, in most cases applies to Apache::PerlRun
scripts.
Output of system()
, exec()
, and open(PIPE,"|program")
calls will not be sent to the browser unless your Perl was configured with
sfio
.
The Perl tie()'d
filehandle interface is not complete,
format()
/ write()
are ones of the missing
pieces. If you configure Perl with
sfio
, write()
and format()
should work just fine.
Perl's exit() built-in function cannot be used in mod_perl scripts. Calling it causes the server child to exit (which makes the whole idea of using mod_perl irrelevant.) The Apache::exit() function should be used instead.
You might start your scripts by overriding the exit sub (if you use
Apache::exit() directly, you will have a problem testing the script from the shell, unless
you stuff use Apache ();
into your code.) I use the following code:
BEGIN { # Auto-detect if we are running under mod_perl or CGI. $USE_MOD_PERL = ( (exists $ENV{'GATEWAY_INTERFACE'} and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/) or exists $ENV{'MOD_PERL'} ) ? 1 : 0; } use subs (exit); # Select the correct exit way ######## sub exit{ # Apache::exit(-2) will cause the server to exit gracefully, # once logging happens and protocol, etc (-2 == Apache::Constants::DONE) $USE_MOD_PERL ? Apache::exit(0) : CORE::exit(0); }
Now the correct exit()
will be always chosen, whether you run the script as a CGI or from the
shell.
Note that if you run the script under Apache::Registry
, The
Apache function exit()
overrides the Perl core built-in
function. While you see the exit()
listed in @EXPORT_OK
of Apache package, Apache::Registry
makes something you don't see and imports this function for you. This means
that if your script is running under Apache::Registry
handler (Apache::PerlRun
as well), you don't have to worry about exit().
Note that if you still use CORE::exit()
in your scripts running under modperl, the child will exit, but neither
proper exit nor logging will happen on the way. CORE::exit()
cuts off the server's legs... If you need to properly shutdown the child ,
use
$r->child_terminate
(which sets the internal
MaxRequestsPerChild
so the child will exit).
You can accomplish this in two ways - in the Apache::Registry
script:
Apache->request->child_terminate;
in httpd.conf:
PerlFixupHandler "sub { shift->child_terminate }"
Your scripts will not run from the command line (yet) unless you use CGI::Switch
or CGI.pm
and perl 5.004+ and do not make any direct calls to Apache->methods
.
If you are using Perl 5.004 or better, most CGI scripts can run under
mod_perl untouched. If you're using 5.003, Perl's built-in read()
and print()
functions do not work as they do under CGI. If you're using CGI.pm
, use $query->print
instead of plain 'ol
print()
.
A special Perl variables like $|
(buffering), $^T
(time), $^W
(warnings), $/
(input record separator), $\
(output record separator) and many more are all global variables. It means
that you cannot localize them with my().
Only
local()
is permitted to do that. Since the child server
doesn't quit - if in one of your scripts you modify the global varible
it'll be changed for the rest of the process' life and would affect all the
scripts that will be executed under the same process.
Remembering this you should never write a code like this. We will exercise the input record separator variable. If you undefine this variable, a diamond operator will suck the whole file at once.
$/ = undef; open IN, "file" .... # slurp it all inside a variable $all_the_file = <IN>;
The proper way is to have a local()
keyword before the special
variable is being changed, like:
local $/ = undef; open IN, "file" .... # slurp it all inside a variable $all_the_file = <IN>;
But there is a little catch. local()
will propogate the
changed value to any of the code below it and would be in effect untill the
script will be finished, if not modified in some other place.
A cleaner approach is to embrace the whole code that is being effected by the modificated variable in to a block, like:
{ $/ = undef; open IN, "file" .... # slurp it all inside a variable $all_the_file = <IN>; }
That way when Perl leaves the block, it restores the original value of the $/
variable. So you should worry about this variable anywhere else in scope of
your program.
When writing your own handlers with Perl API the proper way to send the HTTP Header is to set the header first and then to send it. Like:
$r->content_type('text/html'); $r->send_http_header; return OK if $r->header_only;
If the client issues a HTTP HEAD
request rather than the usual
GET
, to be compilent with the HTTP protocol we better will not send the
document body, but the the HTTP header only. When Apache receives a HEAD
request, it sets header_only() to true. If we see that this has happened, we return from the handler
immediately with an OK status code.
Generally, you don't need the explicit content type setting, since Apache
does it for you, by looking up the MIME type of the request by matching the
extension of the URI in the MIME tables (from the
mime.types
file). So if the request URI is /welcome.html
, the
text/html
content-type will be picked. However for CGI scripts or URIs that cannot be
mapped by a known extension, you should set the appropriate type by using content_type()
method.
The situation is a little bit different with Apache::Registry
and alike handlers. It means that if you take a basic CGI script like:
print "Content-type: text/plain\n\n"; print "Hello world";
it wouldn't work, because the HTTP header will be not sent. By default, mod_perl does not send any headers by itself, however, you may wish to change this by adding:
PerlSendHeader On
in <Location
> part of your configuration. Now the response line and common headers
will be sent as they are by mod_cgi. And, just as with mod_cgi, PerlSendHeader
will not send the MIME type and a terminating double newline. Your script
must send that itself, e.g.:
print "Content-type: text/html\r\n\r\n";
Note, that the book always uses ``\n\n'' and not ``\r\n\r\n''. The latter is a way to send new lines as defined in HTTP standards, but as of this moment all the browsers accept the former format as well. To follow strictly the HTTP protocol you must you the ``\r\n'' format.
The PerlSendHeader On directive tells mod_perl to intercept anything that looks like a header
line (such as Content-Type:
text/plain
) and automatically turn it into a correctly formatted HTTP/1.0 header, the
same way it happens with CGI scripts running under mod_cgi. This allows you
to keep your CGI scripts unmodified.
There is $ENV{PERL_SEND_HEADER}
which tells whether
PerlSendHeader
is On
or Off
. You can use it in your module like:
if($ENV{PERL_SEND_HEADER}) { print "Content-type: text/html\n\n"; } else { my $r = Apache->request; $r->content_type('text/html'); $r->send_http_header; }
If you use CGI.pm's header() function to generate HTTP headers, you do not need to activate this directive because CGI.pm detects mod_perl and calls send_http_header() for you. However, it does not hurt to use this directive anyway.
There is no free lunch -- you get the mod_cgi behavior on cost of little but still overhead of parsing the text that is being sent, and mod_perl makes the assumption that individual headers are not split across print statements.
The Apache::print()
routine has to gather up the headers that your script outputs, in order to
pass them to $r->send_http_header
. This happens in src/modules/perl/Apache.xs
(print
) and
Apache/Apache.pm
(send_cgi_header
). There is a shortcut in there, namely the assumption that each print
statement contains one or more complete headers. If for example you used to
generate a
Set-Cookie
header by multiply print()
statements, like:
print "Content-type: text/html\n"; print "Set-Cookie: iscookietext\; "; print "expires=Wednesday, 09-Nov-1999 00:00:00 GMT\; "; print "path=\/\; "; print "domain=\.mmyserver.com\; "; print "\n\n"; print "hello";
your generated Set-Cookie
header is split over a number of print statements and gets lost. The above
example wouldn't work! Try this instead:
print "Content-type: text/html\n"; my $cookie = "Set-Cookie: iscookietext\; "; $cookie .= "expires=Wednesday, 09-Nov-1999 00:00:00 GMT\; "; $cookie .= "path=\/\; "; $cookie .= "domain=\.mmyserver.com\; "; print $cookie; print "\n\n"; print "hello";
Sometimes when you call a script you see an ugly "Content-Type:
text/html"
displayed at the top of the page, and of course the HTML code becomes
broken. As you understand from the above discussion this generally happens
when your code already send the header, that's why you see it rendered into
a browser's page. This might happen when you call the CGI.pm
$q->header
method or mod_perl's
$r->send_http_header
.
If you have a complicated application where the header might be generated from many different places, depending on the calling logic, you might want to write a special subroutine that sends a header, and keeps a track whether the header has been already sent. Of course you can use a global variable to flag that the header has been already sent, but there is another elegant solution, where the closure effect is a desired feature.
Just copy the code below, including the block's curly braces. And
everywhere in your code you print the header use the print_header()
subroutine. $need_header
is the same kind of beast as a static variable in C, so it remembers its
value from call to call. The first time you will call the print_header
, the value of $need_header
will become zero and on the subsequent calls if any happens, the header
will be not sent any more.
{ my $need_header = 1; sub print_header { my type = shift || "text/html"; print("Content-type: $type\n\n),$need_header = 0 if $need_header; } }
In your code you call the above subroutine as:
print_header();
or
print_header("text/plain);
if you want to override the default (text/html) MIME type.
Let's make our smart method to elaborate with PerlSendHeader directive settings, to always do the right thing. It's especially important if you write an application that you are going to distribute, hopefully as an Open Source.
{ my $need_header = 1; sub print_header { my type = shift || "text/html"; return unless $need_header; $need_header = 0; if($ENV{PERL_SEND_HEADER}) { print "Content-type: $type\n\n"; } else { my $r = Apache->request; $r->content_type($type); $r->send_http_header; } } }
You can continue to improve this subroutine even further to handle additional headers, like cookies and alike.
To run a Non Parsed Header CGI script under mod_perl, simply add to your code:
local $| = 1;
And if you normally set PerlSendHeader On
, add this to your server's configuration file:
<Files */nph-*> PerlSendHeader Off </Files>
Perl executes BEGIN
blocks during the compile time of code as soon as possible. The same is
true under mod_perl. However, since mod_perl normally only compiles scripts
and modules once -- either in the parent server or once per-child -- BEGIN
blocks in that code will only be run once. As perlmod manpage
explains, once a BEGIN
has run, it is immediately undefined. In the mod_perl environment, this
means BEGIN
blocks will not be run during each incoming request unless that request
happens to be one that is compiling the code.
BEGIN
blocks in modules and files pulled in via require()
or
use()
will be executed:
Only once, if pulled in by the parent process.
Once per-child process if not pulled in by the parent process.
An additional time, once per-child process if the module is pulled in off a
disk again via Apache::StatINC
.
An additional time, in the parent process on each restart if
PerlFreshRestart
is On
.
Unpredictable if you fiddle with %INC
yourself.
BEGIN
blocks in Apache::Registry
scripts will be executed, as above plus:
Only once, if pulled in by the parent process via
Apache::RegistryLoader
- once per-child process if not pulled in by the parent process.
An additional time, once per-child process if the script file has changed on disk.
An additional time, in the parent process on each restart if pulled in by
the parent process via Apache::RegistryLoader
and
PerlFreshRestart
is On
.
Make sure you read Evil things might happen when using PerlFreshRestart.
As perlmod explains, an END
subroutine is executed as late as possible, that is, when the interpreter
exits. In the mod_perl environment, the interpreter does not exit until the
server shutdown. However, mod_perl does make a special case for
Apache::Registry
scripts.
Normally, END
blocks are executed by Perl during its perl_run()
function, which is called once each time the Perl program is executed, e.g.
once per (mod_cgi) CGI scripts. However, mod_perl only calls
perl_run() once, during server startup. Any END
blocks encountered during main server startup, i.e. those pulled in by the
PerlRequire
or by any PerlModule
, are suspended and run at server shutdown, aka child_exit()
(requires apache 1.3b3+).
Any END
blocks that are encountered during compilation of
Apache::Registry
scripts are called after the script has
completed (not during the cleanup phase though) including subsequent invocations when
the script is cached in memory.
All other END
blocks encountered during other Perl*Handler
call-backs, e.g. PerlChildInitHandler
, will be suspended while the process is running and called during child_exit()
when the process is shutting down. Module authors might wish to use
$r->register_cleanup()
as an alternative to END
blocks if this behavior is not desirable. $r->register_cleanup()
is being called at the CleanUp processing phase of each request and thus
can be used to emulate plain perl's END{}
block behavior.
The last paragraph is very important for the Handling the 'User pressed Stop button' case.
Normally when you run perl from the command line or have the shell invoke
it with `#!', you may choose to pass perl switch arguments such as -w or -T. Most command line arguments have a equivalent special variable. For
example, the $^W variable corresponds to the
-w switch. Consult perlvar
manpage for more details. With mod_perl it is also possible to turn on
warnings globally via the PerlWarn directive:
PerlWarn On
You can turn it off with local $^W = 0;
in your scripts on the local basis (or inside the block). If you write $^W = 0;
you disable the warning mode everywhere, the same with $^W = 1;
.
The switch which enables taint checks does not have a special variable, so mod_perl provides the PerlTaintCheck directive to turn on taint checks. In httpd.conf, enable with:
PerlTaintCheck On
Now, any and all code compiled inside httpd will be taint checked.
The environment variable PERL5OPT can be used to set additional perl startup flags such as -d and -D. See Apache::PerlRun .
If you have the shebang line (#!/bin/perl -Tw
) in your script, -w
will be honored (which means that you have turned the warn mode on for the
scope of this script, -T will produce a warning if
PerlTaintCheck
is not On
.
It's _absolutely_ mandatory (at least for development) to start all your scripts with:
use strict;
If needed, you can always turn off the 'strict' pragma or a part of it inside the block, e.g:
{ no strict 'refs'; ... some code }
It's more important to have strict
pragma enabled under mod_perl than anywhere else. While it's not required,
it is strongly recommended, it will save you more time in the long run.
And, of course, clean scripts will still run under mod_cgi (plain CGI)!
Have a local $^W=1
in the script or PerlWarn ON
at the server configuration file. Turning the warning on will save you a
lot of troubles with debugging your code. Note that all perl switches, but
-w
in the first magic (shebang) line of the script #!/perl
-switches
are being ignored by mod_perl. If you write -T
you will be warned to set PerlTaintCheck ON
in the config file.
If you need -- you can always turn off the warnings with local
$^W=0
in your code if you have some section you don't want the perl compiler to
warn in. The correct way to do this is:
{ local $^W=0; # some code }
It preserves the previous value of $^W
when you quit the block (so if it was set before, it will return to be set
at the leaving of the block.
In production code, it can be a good idea to turn warnings off. Otherwise
if your code isn't very clean and spits a few lines of warnings here and
there, you will end up with a huge error_log
file in a short time on the heavily loaded server. Also, enabling runtime
warning checking has a small performance impact -- in any script, not just
under mod_perl -- so your approach should be to enable warnings during
development, and then disable them when your code is production-ready.
Controlling the warnings mode through the
httpd.conf
is much better, since you can control the behavior of all of the scripts
from a central place. I have PerlWarn On
on my development server and PerlWarn Off
on the production machine.
diagnostics
pragma can shed more light on the errors and warnings you see, but again,
it's better not to use it in production, since otherwise you incur a huge
overhead of the diagnostics pragma examining every bit of the code mod_perl
executes. (You can run your script with -dDprof
to check the overhead. See Devel::Dprof
for more info).
This is a Perl compiler pragma which forces verbose warning diagnostics. Put at the start of your scripts:
use diagnostics;
This pragma turns on the -w
mode, but gives you much better diagnostics of the errors and warnings
encountered. Generally it explains the reason for warnings/errors you get,
shows you an example of code where the same kind of warning is being
triggered, and tells you the remedy.
Again, it's a bad idea to keep it in your production code, as it will spit
10 and more lines of diagnostics messages into your error_log file for
every warning perl will report for the first time (per invocation). Also,
it will add a significant overhead to the code's runtime. (I discovered
this by using Devel::DProf
!)
To pass an environment variable from a configuration file, add to it:
PerlSetEnv key val PerlPassEnv key
e.g.:
PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1"
will set $ENV{PERLDB_OPTS}
, and it'll be accessible in every child.
%ENV
is only setup for CGI emulation. If you are using the API, you should use $r->subprocess_env
, $r->notes
or
$r->pnotes
for passing data around between handlers. %ENV
is slow because it must update the underlying C environment table, which
also exposes that data to systems who can view with ps
.
In any case, %ENV
and the tables used by those methods are all cleared after the request is
served. so, no, $ENV{SESSION_ID}
will not be swaped or reused by different http requests.
It's always a good idea to stay away from global variables when possible.
Some variables must be global so Perl can see them, such as a module's @ISA
or $VERSION
variables (or fully qualified
@MyModule::ISA). In common practice, a combination of strict
and
vars
pragmas keeps modules clean and reduces a bit of noise. However, vars
pragma also creates aliases as the Exporter
does, which eat up more memory. When possible, try to use fully qualified
names instead of use vars. Example:
package MyPackage; use strict; @MyPackage::ISA = qw(...); $MyPackage::VERSION = "1.00";
vs.
package MyPackage; use strict; use vars qw(@ISA $VERSION); @ISA = qw(...); $VERSION = "1.00";
Also see Using global variables and sharing them
Files pulled in via use or require statements are not automatically reloaded when changed on disk. See Reloading Modules and Required Files for more info.
When native syslog support is enabled, the stderr stream will be redirected
to /dev/null
!
It has nothing to do with mod_perl (plain Apache does the same). Doug wrote a Apache::LogSTDERR module to work around this
Scripts under mod_perl can very easily leak memory! Global variables stay
around indefinitely, lexical variables (declared with my()
are destroyed when they go out of scope, provided there are no references
to them from outside of that scope.
Perl doesn't return the memory it acquired from the kernel. It does reuse it though!
First example demonstrates reading in a whole file:
open IN, $file or die $!; $/ = undef; # will read the whole file in $content = <IN>; close IN;
If your file is 5Mb, the child who served that script will grow exactly by that size. Now if you have 20 children and all of them will serve this CGI, all of them will consume additional 20*5M = 100M of RAM! If that's the case, try to use other approaches of processing the file, if possible of course. Try to process a line at a time and print it back to the file. (If you need to modify the file itself, use a temporary file. When finished, overwrite the source file, make sure to provide a locking mechanism!)
Second example demonstrates copying variables between functions (passing variables by
value). Let's use the example above, assuming we have no choice but to read
the whole file before any data processing takes place. Now you have some
imagine process()
subroutine that processes the data and returns it back. What happens if you
pass the
$content
by value? You have just copied another 5M and the child has grown by
another 5M in size (watch your swap space!) now multiply it again by factor
of 20 you have 200M of wasted RAM, which will be apparently reused but it's
a waste! Whenever you think the variable can grow bigger than few Kb, pass
it by reference!
Once I wrote a script that passed a content of a little flat file DataBase to a function that processed it by value -- it worked and it was processed fast, but with a time the DataBase became bigger, so passing it by value was an overkill -- I had to make a decision, whether to buy more memory or to rewrite the code. It's obvious that adding more memory will be merely a temporary solution. So it's better to plan ahead and pass the variables by reference, if a variable you are going to pass might be bigger than you think at the time of your coding process. There are a few approaches you can use to pass and use variables passed by reference. For example:
my $content = qq{foobarfoobar}; process(\$content); sub process{ my $r_var = shift; $$r_var =~ s/foo/bar/gs; # nothing returned - the variable $content outside has been # already modified } @{$var_lr} -- dereferences an array %{$var_hr} -- dereferences a hash
For more info see perldoc perlref
.
Another approach would be to directly use a @_
array. Using directly the @_
array serves the job of passing by reference!
process($content); sub process{ $_[0] =~ s/foo/bar/gs; # nothing returned - the variable $content outside has been # already modified }
From perldoc perlsub
:
The array @_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not possible to update)...
Be careful when you write this kind of subroutines, since it can confuse a
potential user. It's not obvious that call like
process($content);
modifies the passed variable -- programmers (which are the users of your
library in this case) are used to subs that either modify variables passed
by reference or return the processed variable (e.g. $content=process($content);
).
Third example demonstrates a work with DataBases. If you do some DB processing, many times you encounter the need to read lots of records into your program, and then print them to the browser after they are formatted. (I don't even mention the horrible case where programmers read in the whole DB and then use perl to process it!!! Use a relational DB and let the SQL do the job, so you get only the records you need!!!).
We will use DBI
for this (assume that we are already connected to the DB) (refer to perldoc DBI
for a complete manual of the DBI
module):
$sth->execute; while(@row_ary = $sth->fetchrow_array;) { <do DB accumulation into some variable> } <print the output using the the data returned from the DB>
In the example above the httpd_process will grow up by the size of the variables that have been allocated for the records that matched the query. (Again remember to multiply it by the number of the children your server runs!).
A better approach is to not accumulate the records, but rather print them
as they are fetched from the DB. Moreover, we will use the
bind_col()
and $sth->fetchrow_arrayref()
(aliased to
$sth->fetch()
) methods, to fetch the data in the fastest possible way. The example below
prints a HTML TABLE with matched data, the only memory that is being used
is a @cols
array to hold temporary row values:
my @select_fields = qw(a b c); # create a list of cols values my @cols = (); @cols[0..$#select_fields] = (); $sth = $dbh->prepare($do_sql); $sth->execute; # Bind perl variables to columns. $sth->bind_columns(undef,\(@cols)); print "<TABLE>"; while($sth->fetch) { print "<TR>", map("<TD>$_</TD>", @cols), "</TR>"; } print "</TABLE>";
Note: the above method doesn't allow you to know how many records have been
matched. The workaround is to run an identical query before the code above
where you use SELECT count(*) ...
instead of 'SELECT *
...
to get the number of matched records. It should be much faster, since you
can remove any SORTBY and alike attributes.
For those who think that $sth->rows will do the job, here is the quote from the DBI
manpage:
rows();
$rv = $sth->rows;
Returns the number of rows affected by the last database altering command, or -1 if not known or not available. Generally you can only rely on a row count after a do or non-select execute (for some specific operations like update and delete) or after fetching all the rows of a select statement.
For select statements it is generally not possible to know how many rows will be returned except by fetching them all. Some drivers will return the number of rows the application has fetched so far but others may return -1 until all rows have been fetched. So use of the rows method with select statements is not recommended.
As a bonus, I wanted to write a single sub that flexibly processes any query, accepting: conditions, call-back closure sub, select fields and restrictions.
# Usage: # $o->dump(\%conditions,\&callback_closure,\@select_fields,@restrictions); # sub dump{ my $self = shift; my %param = %{+shift}; # dereference hash my $rsub = shift; my @select_fields = @{+shift}; # dereference list my @restrict = shift || ''; # create a list of cols values my @cols = (); @cols[0..$#select_fields] = (); my $do_sql = ''; my @where = (); # make a @where list map { push @where, "$_=\'$param{$_}\'" if $param{$_};} keys %param; # prepare the sql statement $do_sql = "SELECT "; $do_sql .= join(" ", @restrict) if @restrict;# append the restriction list $do_sql .= " " .join(",", @select_fields) ; # append the select list $do_sql .= " FROM $DBConfig{TABLE} "; # from table # we will not add the WHERE clause if @where is empty $do_sql .= " WHERE " . join " AND ", @where if @where; print "SQL: $do_sql \n" if $debug; $dbh->{RaiseError} = 1; # do this, or check every call for errors $sth = $dbh->prepare($do_sql); $sth->execute; # Bind perl variables to columns. $sth->bind_columns(undef,\(@cols)); while($sth->fetch) { &$rsub(@cols); } # print the tail or "no records found" message # according to the previous calls &$rsub(); } # end of sub dump
Now a callback closure sub can do lots of things. We need a closure to know what stage are we in: header, body or tail. For example, we want a callback closure for formatting the rows to print:
my $rsub = eval { # make a copy of @fields list, since it might go # out of scope when this closure will be called my @fields = @fields; my @query_fields = qw(user dir tool act); # no date field!!! my $header = 0; my $tail = 0; my $counter = 0; my %cols = (); # columns name=> value hash # Closure with the following behavior: # 1. Header's code will be executed on the first call only and # if @_ was set # 2. Row's printing code will be executed on every call with @_ set # 3. Tail's code will be executed only if Header's code was # printed and @_ isn't set # 4. "No record found" code will be executed if Header's code # wasn't executed sub { # Header if (@_ and !$header){ print "<TABLE>\n"; print $q->Tr(map{ $q->td($_) } @fields ); $header = 1; } # Body if (@_) { print $q->Tr(map{$q->td($_)} @_ ); $counter++; return; } # Tail, will be printed only at the end if ($header and !($tail or @_)){ print "</TABLE>\n $counter records found"; $tail = 1; return; } # No record found unless ($header){ print $q->p($q->center($q->b("No record was found!\n"))); } } # end of sub {} }; # end of my $rsub = eval {
You might also want to check Limiting the size of the processes and Limiting the resources used by httpd children.
When you write a script running under mod_cgi, you can get away with sloppy programming, like opening a file and letting the interpreter to close it for you when the script had finished his run:
open IN, "in.txt" or die "Cannot open in.txt for reading : $!\n";
For mod_perl you must close()
the files you opened!
close IN;
somewhere before the end of the script, since if you forget to
close()
, you might get a file descriptor leakage and unlock problem (if you flock()ed
on this file descriptor). Even if you do have it, but for some reason the
interpreter was stopped before the cleanup call, because of various
reasons, such as user aborted script ( See
Handling the 'User pressed Stop button' case) the leakage is still there. In a long run your machine might get run out
of file descriptors, and even worse - file might be left locked and
unusable by other invocations of the same and other scripts.
What can you do? Use IO::File
(and other IO::*
modules), which allows you to assign the file handler to variable, which
can be
my()
(lexically) scoped. And when this variable goes out of scope the file or
other file system entity will be properly closed and unlocked (if it was
locked). Lexically scoped variable will always go out of scope at the end
of the script's run even if it was aborted in the middle or before the end
if it was defined inside some internal block. For example:
{ my $fh = new IO::File("filename") or die $!; # read from $fh } # ...$fh is closed automatically at end of block, without leaks.
As I have just mentioned, you don't have to create a special block for this purpose, for a file the code is written in is a virtual block as well, so you can simply write:
my $fh = new IO::File("filename") or die $!; # read from $fh # ...$fh is closed automatically at end of block, without leaks.
What the first technique (using { BLOCK }
) makes sure is that the file will be closed the moment, the block is
finished.
But even faster and lighter technique is to use Symbol.pm
:
my $fh = Symbol::gensym(); open $fh, "filename" or die $!
Use these approaches to ensure you have no leakages, but don't be lazy to
write close()
statements, make it a habit.
You still can win from using mod_perl.
One approach is to replace the Apache::Registry
handler with
Apache::PerlRun
and define a new location (the script can reside in the same directory on
the disk.
# srm.conf Alias /cgi-perl/ /home/httpd/cgi/ # httpd.conf <Location /cgi-perl> #AllowOverride None SetHandler perl-script PerlHandler Apache::PerlRun Options ExecCGI allow from all PerlSendHeader On </Location>
See Apache::PerlRun - a closer look
Another ``bad'', but working method is to set MaxRequestsPerChild
to 1, which will force each child to exit after serving only one request,
so you'll get the preloaded modules, etc., the script will be compiled each
request, then killed off. This isn't good for ``high-traffic'' sites
though, as the parent server will need to fork a new child each time one is
killed, but you can fiddle with MaxStartServers
,
MinSpareServers
, to make the parent spawn more servers ahead so the killed one will be
immediately replaced with the fresh one. Again, probably that's not what
you want.
Apache::PerlRun
gives you a benefit of preloaded perl and its modules. This module's
handler emulates the CGI environment, allowing programmers to write scripts
that run under CGI or mod_perl without any change. Unlike Apache::Registry
, the Apache::PerlRun
handler does not cache the script inside of a subroutine. Scripts will be
``compiled'' on each request. After the script has run, its name space is
flushed of all variables and subroutines. Still, you don't have the
overhead of loading the perl and compilation time of the standard modules
(If your script is very light, but uses lots of standard modules - you will
see no difference between
Apache::PerlRun
and Apache::Registry
!).
Be aware though, that if you use packages that use internal variables that
have circular references, they will be not flushed!!!
Apache::PerlRun
only flushes your script's name space, which does not include any other
required packages' name spaces. If there's a reference to a my()
scoped variable that's keeping it from being destroyed after leaving the
eval scope (of Apache::PerlRun
), that cleanup might not be taken care of until the server is shutdown and
perl_destruct()
is run, which always happens after running command line scripts. Consider
this example:
package Foo; sub new { bless {} } sub DESTROY { warn "Foo->DESTROY\n"; } eval <<'EOF'; package my_script; my $self = Foo->new; #$self->{circle} = $self; EOF print $@ if $@; print "Done with script\n";
First you'll see:
Foo->DESTROY Done with script
Then, uncomment the line where $self
makes a circular reference, and you'll see:
Done with script Foo->DESTROY
In this case, under mod_perl you wouldn't see Foo->DESTROY
until the server shutdown, or your module properly took care of things.
META: to be completed
Global variables initialized at the server startup, through the Perl startup file, can be shared between processes, untill modified by some of the processes. e.g. when you write:
$My::debug = 1;
all processes will read the same value. If one of the processes changes
that value to 0
, it'll be still equal to 1
for any process, but the one who actually did the change. When the process
modifies the variable, it becomes process' private copy.
IPC::Shareable
can be used to share variables between children.
libmm
other methods?
To trap all/most Perl run-time errors and send the output to the client instead of Apache's error log add this line to your script.
use CGI::Carp qw(fatalsToBrowser);
Refer to CGI::Carp
man page for more related info.
Also you can write your custom DIE/WARN signal handler. I don't want users to see the error message, but I want it to be emailed to me if it's severe enough. The handler traps various errors and performs accordingly to the defined logic. My handler was written for the modperl environment, but works correctly when is being called from the shell. A stripped version of the code is shown here:
# assign the DIE sighandler to call mydie(error_message) whenever a # die() sub is being called. Can be added anywhere in the code. local $SIG{'__DIE__'} = \&mydie; Do not forget the C<local()>, unless you want this signal handler to be invoked every time any scripts dies (Even those where this treatment is undesirable)
# and the handler itself sub mydie{ my $why = shift; my $UNDER_MOD_PERL = ( (exists $ENV{'GATEWAY_INTERFACE'} and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/) or exists $ENV{'MOD_PERL'} ) ? 1 : 0; chomp $why; my $orig_why = $why; # an ASCII copy for email report # handle the shell execution case (so we will not get all the HTML) print("Error: $why\n"), exit unless $UNDER_MOD_PERL; my $should_email = 0; my $message = ''; $why =~ s/[<&>]/"&#".ord($&).";"/ge; # entity escape # Now we need to trap various kinds of errors, that come from CGI.pm # And we don't want these errors to be emailed to us, since # these aren't programmatical errors if ($orig_why =~ /Client attempted to POST (\d+) bytes/o) { $message = qq{ You can not POST messages bigger than @{[1024*$c{max_image_size}]} bytes.<BR> You have tried to post $1 bytes<BR> If you are trying to upload an image, make sure its size is not bigger than @{[1024*$c{max_image_size}]} bytes.<P> Thank you! }; } elsif ($orig_why =~ /Malformed multipart POST/o) { $message = qq{ Have you tried to upload an image in the wrong way?<P> To sucessfully upload an image you must use a browser that supports image upload and use the 'Browse' button to select that image. DO NOT type the path to the image into the upload field.<P> Thank you! }; } elsif ($orig_why =~ /closed socket during multipart read/o) { $message = qq{ Have you pressed a 'STOP' button?<BR> Please try again!<P> Thank you! }; } else { $message = qq{ <B>There is no action to be performed on your side, since the error report has been already sent to webmaster. <BR><P> <B>Thank you for your patience!</B> }; $should_email = 1; } print qq|Content-type: text/html <HTML><BODY BGCOLOR="white"> <B>Oops, An error has happened.</B><P> |; print $message; # send email report if appropriate if ($should_email){ # import sendmail subs use Mail (); # prepare the email error report: my $subject ="Error Report"; my $body = qq| An error has happened: $orig_why |; # send error reports to admin and author send_mail($c{email}{'admin'},$c{email}{'admin'},$subject,$body); send_mail($c{email}{'admin'},$c{email}{'author'},$subject,$body); print STDERR "[".scalar localtime()."] [SIGDIE] Sending Error Email\n"; } # print to error_log so we will know we've sent print STDERR "[".scalar localtime()."] [SIGDIE] $orig_why \n"; exit 1; } # end of sub mydie
You may have noticed that I trap the CGI.pm's die()
calls
here, I don't see any reason why my users should see an ugly error
messages, but that's the way CGI.pm written. The workaround is to trap them
myself.
Please note that as of ver 2.49, CGI.pm provides a cgi_error()
method to print the errors and wouldn't die()
unless you want
it.
Apache::Registry
, Apache::PerlRun
and modules that compile-via-eval confuse the line numbering. Other files
that are read normally by Perl from disk have no problem with file
name/line number.
If you compile with the experimental PERL_MARK_WHERE=1, it shows you almost the exact line number, where this is happening. Generally a compiler makes a shift in its line counter. You can always stuff your code with special compiler directives, to reset its counter to the value you will tell. At the beginning of the line you should write (the '#' in column 1):
#line 298 myscript.pl or #line 890 some_label_to_be_used_in_the_error_message
The label is optional - the filename of the script will be used by default. This specifies the line number of the following line, not the line the directive is on. You can use a little script to stuff every N lines of your code with these directives, but then you will have to rerun this script every time you add or remove code lines. The script:
<META> This example was double incrementing $counter. I took the second increment out -- sgr. </META>
#!/usr/bin/perl # Puts Perl line markers in a Perl program for debugging purposes. # Also takes out old line markers. die "No filename to process.\n" unless @ARGV; my $filename = $ARGV[0]; my $lines = 100; open IN, $filename or die "Cannot open file: $filename: $!\n"; open OUT, ">$filename.marked" or die "Cannot open file: $filename.marked: $!\n"; my $counter = 1; while (<IN>) { print OUT "#line $counter\n" unless $counter++ % $lines; next if $_ =~ /^#line /; print OUT $_; } close OUT; close IN; chmod 0755, "$filename.marked";
Also notice, that another solution is to move most of the code into a separare modules, which ensures that the line number will be reported correctly.
To have a complete trace of calls add:
use Carp (); local $SIG{__WARN__} = \&Carp::cluck;
Generally you should not fork from your mod_perl scripts, since when you do -- you are forking the entire apache web server, lock, stock and barrel. Not only is your perl code being duplicated, but so is mod_ssl, mod_rewrite, mod_log, mod_proxy, mod_spelling or whatever modules you have used in your server, all the core routines and so on.
A much wiser approach would be to spawn a sub-process, hand it the
information it needs to do the task, and have it detach (close x3 +
setsid()
). This is wise only if the parent who spawns this process, immediately
continue, you do not wait for the sub-process to complete. This approach is
suitable for a situation when you want to trigger a long time taking
process through the web interface, like processing some data, sending email
to thousands of subscribed users and etc. Otherwise, you should convert the
code into a module, and use its functions or methods to call from CGI
script.
Just making a system()
call defeats the whole idea behind mod_perl, perl interpreter and modules
should be loaded again for this external program to run.
Basically, you would do:
$params=FreezeThaw::freeze( [all data to pass to the other process] ); system("program.pl $params");
and in program.pl
:
@params=FreezeThaw::thaw(shift @ARGV); # check that @params is ok close STDIN; close STDOUT; open STDERR, ">/dev/null"; setsid(); # to detach
At this point, program.pl
is running in the ``background'' while the
system()
returns and permits apache to get on with life.
This has obvious problems. Not the least of which is that @params
must not be bigger then whatever your architecture's limit is (could depend
on your shell).
Also, the communication is only one way.
However, you might want be trying to do the ``wrong thing''. If what you
want is to send information to the browser and then do some
post-processing, look into PerlCleanupHandler
.
If you are interested in more deep level details, this is what actually happens when you fork and make a system call, like
system("echo Hi"),exit unless fork();
What happens is that fork()
gives you 2 execution paths and
the child gets virtual memory sharing a copy of the program text (read
only) and sharing a copy of the data space copy-on-write (remember why you
pre-load modules in mod_perl?). In the above code a parent will immediately
continue with the code that comes up after the fork, while the forked
process will execute system("echo Hi")
and then terminate itself. Note that you might need to set:
$SIG{CHLD} = sub {wait};
or
$SIG{CHLD} = IGNORE;
or the terminated process might become a zombie. Normally, every process has its parent, many processes a children of PID 1, the init process. Zombie, is a process that doesn't have a father. When the child quits, it reports the termination to his parent. If he doesn't know who the father is it becomes zombie. (META: Did I formulate it correctly?)
The only work is setting up the page tables for the virtual memory and the second process goes on its separate way.
Next, Perl will find /bin/echo
along the search path, and invoke it directly. Perl system()
is *not* system(3)
[C-library]. Only when the command has shell meta-chars does Perl invoke a
real shell. That's a *very* nice optimization.
Only if you do:
system "sh -c 'echo foo'"
OS actually parses your command with a shell so you exec()
a
copy of
/bin/sh
, but since one is almost certainly already running somewhere, the system
will notice that (via the disk inode reference) and replace your virtual
memory page table with one pointed at the already-loaded program code plus
your own data space. Then the shell parses the passed command.
Since it is echo
, it will execute it as a built-in in the latter example or a /bin/echo
in the former and be done, but this is only an example. You aren't calling system("echo Hi")
in your mod_perl scripts, right? Since most other real things (heavy
programs executed as a subprocess) would involve repeating the process to
load the specified command or script (it might involve some actual demand
paging from the program file if you execute new code).
The only place you see real overhead from this scheme is when the parent
process is huge (unfortunately like mod_perl...) and the page table becomes
large as a side effect. The whole point of mod_perl is to avoid having to
fork()
/ exec()
something on every hit, though.
Perl can do just about anything by itself. However, you probably won't get
in trouble until you hit about 30 forks/sec on a so-so pentium.
Let's say that you wrote a few handlers to process a request, and they all
need to share some custom Perl data structure. The pnotes()
method comes to rescue. Taken that one of the handlers stored some data in
hash %my_data
, before it finished its activity:
# First handler: $r->pnotes('my_info' => \%hash);
All the following handler will be able to retrive the stored data with.
# Later handler: my $info = $r->pnotes('my_info'); print $info->{foo};
The stored information will be destroyed at the end of the request.
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Before we dive into performance issues, there is something very important to understand. It applies to any webserver, not only apache. All the efforts are made to make user's web browsing experience a swift. Among other web site usability factors, speed is one of the most crucial ones. What is a correct speed measurement? Since user is the one that interacts with web site, speed measurement is a time passed from the moment user follows a link or presses a submit button till the resulting page is being rendered by her browser. So if we trace the data packet's movement as it leaves user's machine (request sent) till the reply arrives, the packet travels through many entities on its way. It has to make its way through the network, passing many interconnection nodes, before it enters the target machine it might go through proxy (accelerator) servers, then it's being served by your server, and finally it has to make the whole way back. A webserver is only one of the elements the packet sees on its way. You could work hard to fine tune your webserver for the best performance, but a slow NIC (Network Interface Card) or slow network connection from your server might defeat it all. That's why it's important to think big and to be aware of possible bottlenecks between the server and the web. Of course there is nothing you can do if user has a slow connection on its behalf.
Moreover, you might tune your scripts and webserver to process incoming requests ultra fast, so you will need a little number of working servers, but you might find out that server processes are busy waiting for slow clients to complete the download. You will see more examples in this chapter.
My point is that a web service is like car, if one of the details or mechanisms is broken the car will not drive smoothly and it can even stop dead if pushed further without first fixing it.
A very important point is the sharing of memory. If your OS supports this
(and most sane systems do), you might save more memory by sharing it
between child processes. This is only possible when you preload code at
server startup. However during a child process' life, its memory pages
becomes unshared and there is no way we can control perl to make it
allocate memory so (dynamic) variables land on different memory pages than
constants, that's why the copy-on-write effect (will explain in a moment) will hit almost at random. If you are
pre-loading many modules you might be able to balance the memory that stays
shared against the time for an occasional fork by tuning the
MaxRequestsPerChild
to a point where you restart before too much becomes unshared. In this case
the MaxRequestsPerChild
is very specific to your scenario. You should do some measurements and you
might see if this really makes a difference and what a reasonable number
might be. Each time a child reaches this upper limit and restarts it should
release the unshared copies and the new child will inherit pages that are
shared until it scribbles on them.
It is very important to understand that your goal is not to have
MaxRequestsPerChild
to be 10000. Having a child serving 300 requests on precompiled code is
already a huge speedup, so if it is 100 or 10000 it does not really matter
if it saves you the RAM by sharing. Do not forget that if you preload most
of your code at the server startup, the fork to spawn a new child will be
very very fast, because it inherits most of the preloaded code and the perl
interpreter from the parent process. But than, during the work of the
child, its memory pages (which aren't really its yet, it uses the parent's
pages) are getting dirty (originally inherited and shared variables are
getting updated/modified) and the copy-on-write
happens, which reduces the number of shared memory pages - thus enlarging
the memory demands. Killing the child and respawning a new one, allows to
get the pristine shared memory from the parent process again.
The conclusion is that MaxRequestsPerChild
should not be too big, otherwise you loose the benefits of the memory
sharing.
See Choosing MaxRequestsPerChild for more about tuning the MaxRequestsPerChild
parameter.
Use the PerlRequire
and PerlModule
directives to load commonly used modules such as CGI.pm
, DBI
and etc., when the server is started. On most systems, server children will
be able to share the code space used by these modules. Just add the
following directives into httpd.conf
:
PerlModule CGI; PerlModule DBI;
But even a better approach is to create a separate startup file (where you code in plain perl) and put there things like:
use DBI; use Carp;
Then you require()
this startup file with help of PerlRequire
directive from httpd.conf
, by placing it before the rest of the mod_perl configuration directives:
PerlRequire /path/to/start-up.pl
CGI.pm
is a special case. Ordinarily CGI.pm
autoloads most of its functions on an as-needed basis. This speeds up the
loading time by deferring the compilation phase. However, if you are using
mod_perl, FastCGI or another system that uses a persistent Perl
interpreter, you will want to precompile the methods at initialization
time. To accomplish this, call the package function compile()
like this:
use CGI (); CGI->compile(':all');
The arguments to compile()
are a list of method names or sets, and are identical to those accepted by
the use()
and import()
operators. Note that in most cases you will want to replace ':all'
with tag names you really use in your code, since generally only a subset
of subs is actually being used.
You can also preload the Registry scripts. See Preload Registry Scripts.
(META: while the numbers and conclusions are mostly correct, need to rewrite the whole benchmark section using the GTop library to report the shared memory which is very important and will improve the benchmarks)
(META: Add the memory size tests when the server was compiled with EVERYTHING=1 and without it, does loading everything imposes a big change in the memory footprint? Probably the suggestion would be as follows: For a development server use EVERYTHING=1, while for a production if your server is pretty busy and/or low on memory and every bit is on account, only the required parts should be built in. BTW, remember that apache comes with many modules that are being built by default, and you might not need those!)
I have conducted a few tests to benchmark the memory usage when some
modules are preloaded. The first set of tests checks the memory use with
Library Perl Module preload (only CGI.pm
). The second set checks the compile method of CGI.pm
. The third test checks the benefit of Library Perl Module preload but a
few of them (to see more memory saved) and also the effect of precompiling
the Registry modules with Apache::RegistryLoader
.
1. In the first test, the following script was used:
use strict; use CGI (); my $q = new CGI; print $q->header; print $q->start_html,$q->p("Hello");
Server restarted
Before the CGI.pm
preload: (No other modules preloaded)
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 87004 0.0 0.0 1060 1524 - A 16:51:14 0:00 httpd httpd 240864 0.0 0.0 1304 1784 - A 16:51:13 0:00 httpd
After running a script which uses CGI's methods (no imports):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 188068 0.0 0.0 1052 1524 - A 17:04:16 0:00 httpd httpd 86952 0.0 1.0 2520 3052 - A 17:04:16 0:00 httpd
Observation: child httpd has grown up by 1268K
Server restarted
After the CGI.pm
preload:
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 240796 0.0 0.0 1456 1552 - A 16:55:30 0:00 httpd httpd 86944 0.0 0.0 1688 1800 - A 16:55:30 0:00 httpd
after running a script which uses CGI's methods (no imports):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 86872 0.0 0.0 1448 1552 - A 17:02:56 0:00 httpd httpd 187996 0.0 1.0 2808 2968 - A 17:02:56 0:00 httpd
Observation: child httpd has grown up by 1168K, 100K less then without preload - good!
Server restarted
After CGI.pm
preloaded and compiled with CGI->compile(':all');
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 86980 0.0 0.0 2836 1524 - A 17:05:27 0:00 httpd httpd 188104 0.0 0.0 3064 1768 - A 17:05:27 0:00 httpd
After running a script which uses CGI's methods (no imports):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 86980 0.0 0.0 2828 1524 - A 17:05:27 0:00 httpd httpd 188104 0.0 1.0 4188 2940 - A 17:05:27 0:00 httpd
Observation: child httpd has grown up by 1172K No change! So what does CGI->compile(':all') help? I think it's because we never use all of the methods CGI provides - so in real use it's faster. So you might want to compile only the tags you are about to use - then you will benefit for sure.
2. I have tried the second test to find it. I run the script:
use strict; use CGI qw(:all); print header,start_html,p("Hello");
Server restarted
After CGI.pm
was preloaded and NOT compiled with CGI->compile(':all'):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 17268 0.0 0.0 1456 1552 - A 18:02:49 0:00 httpd httpd 86904 0.0 0.0 1688 1800 - A 18:02:49 0:00 httpd
After running a script which imports symbols (all of them):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 17268 0.0 0.0 1448 1552 - A 18:02:49 0:00 httpd httpd 86904 0.0 1.0 2952 3112 - A 18:02:49 0:00 httpd
Observation: child httpd has grown up by 1264K
Server restarted
After CGI.pm
was preloaded and compiled with CGI->compile(':all'):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 86812 0.0 0.0 2836 1524 - A 17:59:52 0:00 httpd httpd 99104 0.0 0.0 3064 1768 - A 17:59:52 0:00 httpd
After running a script which imports symbols (all of them):
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 86812 0.0 0.0 2832 1436 - A 17:59:52 0:00 httpd httpd 99104 0.0 1.0 4884 3636 - A 17:59:52 0:00 httpd
Observation: child httpd has grown by 1868K. Why? Isn't
CGI::compile(':all')
supposed to make children to share the compiled code with parent? It does
works as advertised, but if you pay attention in the code we have called
only three CGI.pm
's methods - just saying use CGI qw(:all)
doesn't mean we compile the all available methods - we just import their
names. So actually this test is misleading. Execute compile()
only on the methods you are actually using and then you will see the
difference.
3. The third script:
use strict; use CGI; use Data::Dumper; use Storable; [and many lines of code, lots of globals - so the code is huge!]
Server restarted
Nothing preloaded at startup:
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 90962 0.0 0.0 1060 1524 - A 17:16:45 0:00 httpd httpd 86870 0.0 0.0 1304 1784 - A 17:16:45 0:00 httpd
Script using CGI (methods), Storable, Data::Dumper called:
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 90962 0.0 0.0 1064 1436 - A 17:16:45 0:00 httpd httpd 86870 0.0 1.0 4024 4548 - A 17:16:45 0:00 httpd
Observation: child httpd has grown by 2764K
Server restarted
Preloaded CGI (compiled), Storable, Data::Dumper at startup:
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 26792 0.0 0.0 3120 1528 - A 17:19:21 0:00 httpd httpd 91052 0.0 0.0 3340 1764 - A 17:19:21 0:00 httpd
Script using CGI (methods), Storable, Data::Dumper called
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 26792 0.0 0.0 3124 1440 - A 17:19:21 0:00 httpd httpd 91052 0.0 1.0 6568 5040 - A 17:19:21 0:00 httpd
Observation: child httpd has grown by 3276K. Ouch: 512K more!!!
The reason is that when you preload at the startup all of the methods, they
all are being precompiled, there are many of them and they take a big chunk
of memory. If you don't use the compile()
method, only the
functions that are being used will be compiled. Yes, it will slightly slow
down the first reposnse of each process, but the actuall memory usage will
be lower. BTW, if you write in the script:
use CGI qw(all);
Only the symbols of all functions are being imported. While they are taking some space, it's smaller than the space that a compiled code of these functions might occupy.
Server restarted
All the above modules + the above script PreCompiled with
Apache::RegistryLoader
at startup:
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 43224 0.0 0.0 3256 1528 - A 17:23:12 0:00 httpd httpd 26844 0.0 0.0 3488 1776 - A 17:23:12 0:00 httpd
Script using CGI (methods), Storable, Data::Dumper called:
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 43224 0.0 0.0 3252 1440 - A 17:23:12 0:00 httpd httpd 26844 0.0 1.0 6748 5092 - A 17:23:12 0:00 httpd
Observation: child httpd has grown even more 3316K ! Does not seem to be good!
Summary:
1. Library Perl Modules Preloading gave good results everywhere.
2. CGI.pm
's compile()
method seems to use even more memory. It's because we never use all of the
methods CGI provides. Do compile()
only the tags that you are going to use and you will save the overhead of
the first call for each has not yet been called method, and the memory -
since compiled code will be shared across all the children.
3. Apache::RegistryLoader
might make scripts load faster on the first request after the child has
just started but the memory usage is worse!!! See the numbers by yourself.
HW/SW used : The server is apache 1.3.2, mod_perl 1.16 running on AIX 4.1.5 RS6000 1G RAM.
Apache::RegistryLoader
compiles Apache::Registry
scripts at server startup. It can be a good idea to preload the scripts you
are going to use as well. So the code will be shared among the children.
Here is an example of the use of this technique. This code is included in a PerlRequire
'd file, and walks the directory tree under which all registry scripts are
installed. For each .pl
file encountered, it calls the Apache::RegistryLoader::handler()
method to preload the script in the parent server (before pre-forking the
child processes):
use File::Find 'finddepth'; use Apache::RegistryLoader (); { my $perl_dir = "perl/"; my $rl = Apache::RegistryLoader->new; finddepth(sub { return unless /\.pl$/; my $url = "/$File::Find::dir/$_"; print "pre-loading $url\n"; my $status = $rl->handler($url); unless($status == 200) { warn "pre-load of `$url' failed, status=$status\n"; } }, $perl_dir); }
Note that we didn't use the second argument to handler()
here, as module's manpage suggests. To make the loader smarter about the
uri->filename translation, you might need to provide a trans()
function to translate the uri to filename. URI to filename translation
normally doesn't happen until HTTP request time, so the module is forced to
roll its own translation. If filename is omitted and a trans()
routine was not defined, the loader will try using the URI relative to ServerRoot.
You have to check whether this makes any improvement for you though, I did some testing [ Preload Perl modules - Real Numbers ], and it seems that it takes more memory than when the scripts are being called from the child - This is only a first impression and needs better investigation. If you aren't concerned about few script invocations which will take some time to respond while they load the code, you might not need it all!
See also BEGIN blocks
When possible, avoid importing a module's functions into your name space.
The aliases which are created can take up quite a bit of space. Try to use
method interfaces and fully qualified
Package::function
or $Package::variable
like names instead.
PerlSetupEnv Off
is another optimization you might consider.
mod_perl fiddles with the environment to make it appear as if the script were being
called under the CGI protocol. For example, the
$ENV{QUERY_STRING}
environment variable is initialized with the contents of Apache::args(), and $ENV{SERVER_NAME}
is filled in from the value returned by Apache::server_hostname().
But %ENV
population is expensive. Those who have moved to the Perl Apache API no
longer need this extra %ENV
population, can gain by turning it Off.
Newer Perl versions also have build time options to reduce runtime memory consumption. These options might shrink down the size of your httpd by about ~150k (quite big number if you remember to multiply it by the number of chidren you use.)
-DTWO_POT_OPTIMIZE
macro improves allocations of data with size close to a power of two; but
this works for big allocations (starting with 16K by default). Such
allocations are typical for big hashes and special-purpose scripts,
especially image processing.
Perl memory allocation is by bucket with sizes close to powers of two.
Because of these malloc overhead may be big, especially for data of size
exactly a power of two. If PACK_MALLOC
is defined, perl uses a slightly different algorithm for small allocations
(up to 64 bytes long), which makes it possible to have overhead down to 1
byte for allocations which are powers of two (and appear quite often).
Expected memory savings (with 8-byte alignment in alignbytes
) is about 20% for typical Perl usage. Expected slowdown due to additional
malloc overhead is in fractions of a percent (hard to measure, because of
the effect of saved memory on speed).
You will find these and other memory improvement details in
perl5004delta.pod
.
You've probably noticed that the word shared is being repeated many times in many things related to mod_perl. Indeed, shared memory might save you a lot of money, since with sharing in place you can run many more servers than without it. See the Formula and the numbers.
How much shared memory do you have? You can see it by either using the
memory utils that comes with your system or you can deploy GTop
module:
print "Shared memory of the current process: ", GTop->new->proc_mem($$)->share,"\n";
print "Total shared memory: ", GTop->new->mem->share,"\n";
When you watch the output of the top
utility, don't confuse RSS
(or RES) column with SHARE column -- RES is a RESident memory, which is a size of pages currently swapped in.
Under Apache::Registry
the requested CGI script is always being
stat()
'ed to check whether it was modified. It adds a very little overhead, but
if you are into squeezing all the jouces from the server, you might want to
save this call. If you do -- take a look at
Apache::RegistryBB module.
Apache::Leak
(derived from Devel::Leak
) should help you with this task. Example:
use Apache::Leak; my $global = "FooAAA"; leak_test { $$global = 1; ++$global; };
The argument to leak_test()
is an anonymous sub, so you can just throw it around any code you suspect
might be leaking. Beware, it will run the code twice, because the first
time in, new SV
s are created, but does not mean you are leaking, the second pass will give
better evidence. You do not need to be inside mod_perl to use it, from the
command line, the above script outputs:
ENTER: 1482 SVs new c28b8 : new c2918 : LEAVE: 1484 SVs ENTER: 1484 SVs new db690 : new db6a8 : LEAVE: 1486 SVs !!! 2 SVs leaked !!!
Build a debuggable perl to see dumps of the SV
s. The simple way to have both a normal perl and debuggable perl, is to
follow hints in the
SUPPORT
doc for building libperld.a
, when that is built copy the
perl
from that directory to your perl bin directory, but name it
dperl
.
Leak explanation: $$global = 1;
: new global variable created
FooAAA
with value of 1
, will not be destructed until this module is destroyed.
Apache::Leak
is not very user-friendly, have a look at
B::LexInfo
. You'll see that what might appear to be a leak, is actually just a Perl
optimization. e.g. consider this code:
sub foo { my $string = shift; }
foo("a string");
B::LexInfo
will show you that Perl does not release the value from $string, unless you
undef it. this is because Perl anticipates the memory will be needed for
another string, the next time the subroutine is entered. you'll see similar
for @array
length, %hash
keys, and scratch areas of the padlist for ops such as join()
, `.
', etc.
Apache::Status
now includes a new StatusLexInfo option.
Apache::Leak works better if you've built a libperld.a (see SUPPORT) and given PERL_DEBUG=1 to mod_perl's Makefile.PL
Apache::SizeLimit
allows you to kill off Apache httpd processes if they grow too large. see
perldoc Apache::SizeLimit
for more details.
By using this module, you should be able to discontinue using the Apache
configuration directive MaxRequestsPerChild
, although for some folks, using both in combination does the job.
Apache::Resource
uses the BSD::Resource
module, which uses the C function setrlimit()
to set limits on system resources such as memory and cpu usage.
To configure use:
PerlModule Apache::Resource # set child memory limit in megabytes # (default is 64 Meg) PerlSetEnv PERL_RLIMIT_DATA 32:48 # set child CPU limit in seconds # (default is 360 seconds) PerlSetEnv PERL_RLIMIT_CPU 120 PerlChildInitHandler Apache::Resource
The following limit values are in megabytes: DATA
, RSS
,
STACK
, FSIZE
, CORE
, MEMLOCK
; all others are treated as their natural unit. Prepend PERL_RLIMIT_
for each one you want to use. Refer to setrlimit
man page on your OS for other possible resources.
If the value of the variable is of the form S:H
, S
is treated as the soft limit, and H
is the hard limit. If it is just a single number, it is used for both soft
and hard limits.
To debug add:
<Perl> $Apache::Resource::Debug = 1; require Apache::Resource; </Perl> PerlChildInitHandler Apache::Resource
and look in the error_log to see what it's doing.
Refer to perldoc Apache::Resource
and man 2 setrlimit
for more info.
A limitation of using pattern matching to identify robots is that it only catches the robots that you know about, and only those that identify themselves by name. A few devious robots masquerade as users by using user agent strings that identify themselves as conventional browsers. To catch such robots, you'll have to be more sophisticated.
Apache::SpeedLimit
comes for you to help, see:
http://www.modperl.com/chapters/ch6.html#Blocking_Greedy_Clients
How much faster is mod_perl than mod_cgi (aka plain perl/CGI)? There are
many ways to benchmark the two. I'll present a few examples and numbers
below. Checkout the benchmark
directory of mod_perl distribution for more examples.
If you are going to write your own benchmarking utility -- use
Benchmark
module for heavy scripts and Time::HiRes
module for very fast scripts (faster than 1 sec) where you need better time
precision.
There is no need to write a special benchmark though. If you want to
impress your boss or colleagues, just take some heavy CGI script you have
(e.g. a script that crunches some data and prints the results to STDOUT),
open 2 xterms and call the same script in mod_perl mode in one xterm and in
mod_cgi mode in the other. You can use lwp-get
from LWP
package to emulate the web agent (browser). (benchmark
directory of mod_perl distribution includes such an example)
See also 2 tools for benchmarking: ApacheBench and crashme test
Perrin Harkins writes on benchmarks or comparisons, official or unofficial:
I have used some of the platforms you mentioned and researched others. What I can tell you for sure, is that no commercially available system offers the depth, power, and ease of use that mod_perl has. Either they don't let you access the web server internals, or they make you use less productive languages than Perl, sometimes forcing you into restrictive and confusing APIs and/or GUI development environments. None of them offer the level of support available from simply posting a message to this list, at any price.
As for performance, beyond doing several important things (code-caching, pre-forking/threading, and persistent database connections) there isn't much these tools can do, and it's mostly in your hands as the developer to see that the things which really take the time (like database queries) are optimized.
The downside of all this is that most manager types seem to be unable to believe that web development software available for free could be better than the stuff that cost $25,000 per CPU. This appears to be the major reason most of the web tools companies are still in business. They send a bunch of suits to give PowerPoint presentations and hand out glossy literature to your boss, and you end up with an expensive disaster and an approaching deadline.
But I'm not bitter or anything...
Jonathan Peterson adds:
Most of the major solutions have something that they do better than the others, and each of them has faults. Microsoft's ASP has a very nice objects model, and has IMO the best data access object (better than DBI to use - but less portable) It has the worst scripting language. PHP has many of the advantages of Perl-based solutions, but is less complicated for developers. Netscape's Livewire has a good object model too, and provides good server-side Java integration - if you want to leverage Java skills, it's good. Also, it has a compiled scripting language - which is great if you aren't selling your clients the source code (and a pain otherwise).
mod_perl's advantage is that it is the most powerful. It offers the greatest degree of control with one of the more powerful languages. It also offers the greatest granularity. You can use an embedding module (eg eperl) from one place, a session module (Session) from another, and your data access module from yet another.
I think the
Apache::ASP
module looks very promising. It has very easy to use and adequately powerful state maintenance, a good embedding system, and a sensible object model (that emulates the Microsoft ASP one). It doesn't replicate MS's ADO for data access, butDBI
is fine for that.I have always found that the developers available make the greatest impact on the decision. If you have a team with no Perl experience, and a small or medium task, using something like PHP, or Microsoft ASP, makes more sense than driving your staff into the vertical learning curve they'll need to use mod_perl.
For very large jobs, it may be worth finding the best technical solution, and then recruiting the team with the necessary skills.
Here are the numbers from Michael Parker's mod_perl presentation at Perl Conference (Aug, 98) http://www.realtime.net/~parkerm/perl/conf98/index.htm . The script is a standard hits counter, but it logs the counts into the mysql relational DataBase:
Benchmark: timing 100 iterations of cgi, perl... [rate 1:28] cgi: 56 secs ( 0.33 usr 0.28 sys = 0.61 cpu) perl: 2 secs ( 0.31 usr 0.27 sys = 0.58 cpu) Benchmark: timing 1000 iterations of cgi,perl... [rate 1:21] cgi: 567 secs ( 3.27 usr 2.83 sys = 6.10 cpu) perl: 26 secs ( 3.11 usr 2.53 sys = 5.64 cpu) Benchmark: timing 10000 iterations of cgi, perl [rate 1:21] cgi: 6494 secs (34.87 usr 26.68 sys = 61.55 cpu) perl: 299 secs (32.51 usr 23.98 sys = 56.49 cpu)
We don't know what server configurations was used for these tests, but I guess the numbers speak for themselves.
The source code of the script is available at http://www.realtime.net/~parkerm/perl/conf98/sld006.htm .
As noted before, for very fast scripts you will have to use the
Time::HiRes
module, its usage is similar to the Benchmark
's.
use Time::HiRes qw(gettimeofday tv_interval); my $start_time = [ gettimeofday ]; &sub_that_takes_a_teeny_bit_of_time() my $end_time = [ gettimeofday ]; my $elapsed = tv_interval($start_time,$end_time); print "the sub took $elapsed secs."
See also crashme test.
At http://perl.apache.org/dist/contrib/
you will find
Apache::Timeit
package which does PerlHandler
's Benchmarking.
It's very important to make a correct configuration of the
MinSpareServers
, MaxSpareServers
, StartServers
,
MaxClients
, and MaxRequestsPerChild
parameters. There are no defaults, the values of these variable are very
important, as if too ``low'' you will under-use the system's capabilities,
and if too ``high'' chances that the server will bring the machine to its
knees.
All the above parameters should be specified on the basis of the resources
you have. While with a plain apache server, there is no big deal if you run
too many servers (not too many of course) since the processes are of ~1Mb
and aren't eating a lot of your RAM. Generally the numbers are even smaller
if memory sharing is taking place. The situation is different with
mod_perl. I have seen mod_perl processes of 20Mb and more. Now if you have MaxClients
set to 50: 50x20Mb = 1Gb - do you have 1Gb of RAM? Probably not. So how do
you tune these parameters? Generally by trying different combinations and
benchmarking the server. Again mod_perl processes can be of much smaller
size if sharing is in place.
Before you start this task you should be armed with a proper weapon. You
need a crashme utility, which will load your server with mod_perl scripts you possess. You
need it to have an ability to emulate a multiuser environment and to
emulate multiple clients behavior which will call the mod_perl scripts at
your server simultaneously. While there are commercial solutions, you can
get away with free ones which do the same job. You can use an
ApacheBench ab
utility that comes with apache distribution, a crashme script which uses
LWP::Parallel::UserAgent
or httperf
(see Download page).
Another important issue is to make sure to run testing client (load generator) on a system that is more powerful than the system being tested. After all we are trying to simulate the Internet users, where many users are trying to reach your service at once -- since a number of concurrent users can be quite large, your testing machine much be very powerful and capable to generate a heavy load. Of course you should not run the clients and the server on the same machine. If you do -- your testing results would be incorrect, since clients will eat a CPU and a memory that have to be dedicated to the server, and vice versa.
See also 2 tools for benchmarking: ApacheBench and crashme test
ab is a tool for benchmarking your Apache HTTP server. It is designed to give you an impression on how much performance your current Apache installation can give. In particular, it shows you how many requests per secs your Apache server is capable of serving. The ab tool comes bundled with apache source distribution (and it's free :).
Let's try it. We will simulate 10 users concurrently requesting a very
light script at www.nowhere.com:81/test/test.pl
. Each ``user'' makes 10 requests.
% ./ab -n 100 -c 10 www.nowhere.com:81/test/test.pl
The results are:
Concurrency Level: 10 Time taken for tests: 0.715 seconds Complete requests: 100 Failed requests: 0 Non-2xx responses: 100 Total transferred: 60700 bytes HTML transferred: 31900 bytes Requests per second: 139.86 Transfer rate: 84.90 kb/s received Connection Times (ms) min avg max Connect: 0 0 3 Processing: 13 67 71 Total: 13 67 74
The only numbers we really care about are:
Complete requests: 100 Failed requests: 0 Requests per second: 139.86
Let's raise the load of requests to 100 x 10 (10 users, each makes 100 requests)
% ./ab -n 1000 -c 10 www.nowhere.com:81/perl/access/access.cgi Concurrency Level: 10 Complete requests: 1000 Failed requests: 0 Requests per second: 139.76
As expected nothing changes -- we have the same 10 concurrent users. Now let's raise the number of concurrent users to 50:
% ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi Complete requests: 1000 Failed requests: 0 Requests per second: 133.01
We see that the server is capable of serving 50 concurrent users at an
amazing 133 req/sec! Let's find the upper boundary. Using -n 10000
-c 1000
failed to get results (Broken Pipe?). Using -n 10000 -c
500
derived 94.82 req/sec. The server's performance went down with the high
load.
The above tests were performed with the following configuration:
MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 50 MaxRequestsPerChild 1500
Now let's kill a child after a single request, we will use the following configuration:
MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 100 MaxRequestsPerChild 1
Simulate 50 users each generating a total of 20 requests:
% ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi
The benchmark timed out with the above configuration.... I watched the
output of ps
as I ran it, the parent process just wasn't capable of respawning the
killed children at that rate...When I raised the
MaxRequestsPerChild
to 10 I've got 8.34 req/sec - very bad (18 times slower!) (You can't
benchmark the importance of the
MinSpareServers
, MaxSpareServers
and StartServers
with this kind of test).
Now let's try to return MaxRequestsPerChild
to 1500, but to lower the
MaxClients
to 10 and run the same test:
MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 10 MaxRequestsPerChild 1500
I've got 27.12 req/sec, which is better but still 4-5 times slower (133
with MaxClients
of 50)
Summary: I have tested a few combinations of server configuration variables (MinSpareServers
MaxSpareServers
StartServers
MaxClients
MaxRequestsPerChild
). And the results we have received are as follows:
MinSpareServers
, MaxSpareServers
and StartServers
are only important for user response times (sometimes user will have to
wait a bit).
The important parameters are MaxClients
and
MaxRequestsPerChild
. MaxClients
should be not to big so it will not abuse your machine's memory resources
and not too small, when users will be forced to wait for the children to
become free to come serve them. MaxRequestsPerChild
should be as big as possible, to take the full benefit of mod_perl, but
watch your server at the beginning to make sure your scripts are not
leaking memory, thereby causing your server (and your service) to die very
fast.
Also it is important to understand that we didn't test the response times in the tests above, but the ability of the server to respond under a heavy load of requests. If the script that was used to test was heavier, the numbers would be different but the conclusions are very similar.
The benchmarks were run with:
HW: RS6000, 1Gb RAM SW: AIX 4.1.5 . mod_perl 1.16, apache 1.3.3 Machine running only mysql, httpd docs and mod_perl servers. Machine was _completely_ unloaded during the benchmarking.
After each server restart when I did changes to the server's configurations, I made sure the scripts were preloaded by fetching a script at least once by every child.
It is important to notice that none of requests timed out, even if was kept in server's queue for more than 1 minute! (That is the way ab works, which is OK for the testing purposes but will be unacceptable in the real world - users will not wait for more than 5-10 secs for a request to complete, and the client (browser) will timeout in a few minutes.)
Now let's take a look at some real code whose execution time is more than a few millisecs. We will do real testing and collect the data in tables for easier viewing.
I will use the following abbreviations:
NR = Total Number of Request NC = Concurrency MC = MaxClients MRPC = MaxRequestsPerChild RPS = Requests per second
Running a mod_perl script with lots of mysql queries (the script under test is mysqld bounded) (http://www.nowhere.com:81/perl/access/access.cgi?do_sub=query_form), with configuration:
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 MaxRequestsPerChild 5000
gives us:
NR NC RPS comment ------------------------------------------------ 10 10 3.33 # not a reliable statistics 100 10 3.94 1000 10 4.62 1000 50 4.09
Conclusions: Here I wanted to show that when the application is slow -- not due to perl loading, code compilation and execution, but bounded to some external operation like mysqld querying which made the bottleneck -- it almost does not matter what load we place on the server. The RPS (Requests per second) is almost the same (given that all the requests have been served, you have an ability to queue the clients, but be aware that something that goes to queue means a waiting client and a client (browser) that might time out!)
Now we will benchmark the same script without using the mysql (perl only bounded code) (http://www.nowhere.com:81/perl/access/access.cgi), it's the same script that just returns a HTML form, without making any SQL queries.
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 MaxRequestsPerChild 5000
NR NC RPS comment ------------------------------------------------ 10 10 26.95 # not a reliable statistics 100 10 30.88 1000 10 29.31 1000 50 28.01 1000 100 29.74 10000 200 24.92 100000 400 24.95
Conclusions: This time the script we executed was pure perl (not bounded to I/O or
mysql), so we see that the server serves the requests much faster. You can
see the RequestPerSecond
(RPS) is almost the same for any load, but goes lower when the number of
concurrent clients goes beyond the MaxClients
. With 25 RPS, the client supplying a load of 400 concurrent clients will
be served in 16 secs. But to get more realistic and assume the max
concurrency of 100, with 30 RPS, the client will be served in 3.5 secs,
which is pretty good for a highly loaded server.
Now we will use the server for its full capacity, by keeping all
MaxClients
alive all the time and having a big
MaxRequestsPerChild
, so no server will be killed during the benchmarking.
MinSpareServers 50 MaxSpareServers 50 StartServers 50 MaxClients 50 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 100 10 32.05 1000 10 33.14 1000 50 33.17 1000 100 31.72 10000 200 31.60
Conclusion: In this scenario there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU.
Now we will try to change the MaxClients
and to watch the results: Let's reduce MC to 10.
MinSpareServers 8 MaxSpareServers 10 StartServers 10 MaxClients 10 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 10 10 23.87 # not a reliable statistics 100 10 32.64 1000 10 32.82 1000 50 30.43 1000 100 25.68 1000 500 26.95 2000 500 32.53
Conclusions: A very little difference! Almost no change! 10 servers were able to serve
almost with the same throughput as 50 servers. Why? My guess it's because
of CPU throttling. It seems that 10 servers were serving requests 5 times
faster than when in the test above we worked with 50 servers. In the case
above each child received its CPU time slice 5 times less frequently. So
having a big value for
MaxClients
, doesn't mean that the performance will be better. You have just seen the
numbers!
Now we will start to drastically reduce the MaxRequestsPerChild
:
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 NR NC MRPC RPS comment ------------------------------------------------ 100 10 10 5.77 100 10 5 3.32 1000 50 20 8.92 1000 50 10 5.47 1000 50 5 2.83 1000 100 10 6.51
Conclusions: When we drastically reduce the MaxRequestsPerChild
, the performance starts to become closer to the plain mod_cgi. Just for
comparison with mod_cgi, here are the numbers of this run with mod_cgi:
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 NR NC RPS comment ------------------------------------------------ 100 10 1.12 1000 50 1.14 1000 100 1.13
Conclusion: mod_cgi is much slower :) in test NReq/NClients 100/10 the RPS in mod_cgi was of 1.12 and in mod_perl of 32, which is 30 times faster!!! In the first test each child waited about 100 secs to be served. In the second and third 1000 secs!
This is another crashme suite originally written by Michael Schilli and
located at http://www.linux-magazin.de/ausgabe.1998.08/Pounder/pounder.html
. I did a few modifications (mostly adding my()
operands). I
also allowed it to accept more than one url to test, since sometimes you
want to test an overall and not just one script.
The tool provides the same results as ab above but it also allows you to set the timeout value, so requests will fail if not served within the time out period. You also get Latency (secs/Request) and Throughput (Requests/sec) numbers. It can give you a better picture and make a complete simulation of your favorite Netscape browser :).
I have noticed while running these 2 benchmarking suites - ab gave me results 2.5-3.0 times better. Both suites run on the same machine with the same load with the same parameters. But the implementations are different.
Sample output:
URL(s): http://www.nowhere.com:81/perl/access/access.cgi Total Requests: 100 Parallel Agents: 10 Succeeded: 100 (100.00%) Errors: NONE Total Time: 9.39 secs Throughput: 10.65 Requests/sec Latency: 0.85 secs/Request
And the code:
#!/usr/apps/bin/perl -w use LWP::Parallel::UserAgent; use Time::HiRes qw(gettimeofday tv_interval); use strict; ### # Configuration ### my $nof_parallel_connections = 10; my $nof_requests_total = 100; my $timeout = 10; my @urls = ( 'http://www.nowhere.com:81/perl/faq_manager/faq_manager.pl', 'http://www.nowhere.com:81/perl/access/access.cgi', ); ################################################## # Derived Class for latency timing ################################################## package MyParallelAgent; @MyParallelAgent::ISA = qw(LWP::Parallel::UserAgent); use strict; ### # Is called when connection is opened ### sub on_connect { my ($self, $request, $response, $entry) = @_; $self->{__start_times}->{$entry} = [Time::HiRes::gettimeofday]; } ### # Are called when connection is closed ### sub on_return { my ($self, $request, $response, $entry) = @_; my $start = $self->{__start_times}->{$entry}; $self->{__latency_total} += Time::HiRes::tv_interval($start); } sub on_failure { on_return(@_); # Same procedure } ### # Access function for new instance var ### sub get_latency_total { return shift->{__latency_total}; } ################################################## package main; ################################################## ### # Init parallel user agent ### my $ua = MyParallelAgent->new(); $ua->agent("pounder/1.0"); $ua->max_req($nof_parallel_connections); $ua->redirect(0); # No redirects ### # Register all requests ### foreach (1..$nof_requests_total) { foreach my $url (@urls) { my $request = HTTP::Request->new('GET', $url); $ua->register($request); } } ### # Launch processes and check time ### my $start_time = [gettimeofday]; my $results = $ua->wait($timeout); my $total_time = tv_interval($start_time); ### # Requests all done, check results ### my $succeeded = 0; my %errors = (); foreach my $entry (values %$results) { my $response = $entry->response(); if($response->is_success()) { $succeeded++; # Another satisfied customer } else { # Error, save the message $response->message("TIMEOUT") unless $response->code(); $errors{$response->message}++; } } ### # Format errors if any from %errors ### my $errors = join(',', map "$_ ($errors{$_})", keys %errors); $errors = "NONE" unless $errors; ### # Format results ### #@urls = map {($_,".")} @urls; my @P = ( "URL(s)" => join("\n\t\t ", @urls), "Total Requests" => "$nof_requests_total", "Parallel Agents" => $nof_parallel_connections, "Succeeded" => sprintf("$succeeded (%.2f%%)\n", $succeeded * 100 / $nof_requests_total), "Errors" => $errors, "Total Time" => sprintf("%.2f secs\n", $total_time), "Throughput" => sprintf("%.2f Requests/sec\n", $nof_requests_total / $total_time), "Latency" => sprintf("%.2f secs/Request", ($ua->get_latency_total() || 0) / $nof_requests_total), ); my ($left, $right); ### # Print out statistics ### format STDOUT = @<<<<<<<<<<<<<<< @* "$left:", $right . while(($left, $right) = splice(@P, 0, 2)) { write; }
The MaxClients
directive sets the limit on the number of simultaneous requests that can be
supported; no more than this number of child server processes will be
created. To configure more than 256 clients, you must edit the HARD_SERVER_LIMIT
entry in httpd.h
and recompile. In our case we want this variable to be as small as
possible, this way we can virtually bound the resources used by the server
children. Since we can restrict each child's process size (see
Limiting the size of the processes) -- the calculation of MaxClients
is pretty straightforward :
Total RAM Dedicated to the Webserver MaxClients = ------------------------------------ MAX child's process size
So if I have 400Mb left for the webserver to run with, I can set the
MaxClients
to be of 40 if I know that each child is bounded to the 10Mb of memory
(e.g. with
Apache::SizeLimit
).
Certainly you will wonder what happens to your server if there are more
than MaxClients
concurrent users at some moment. This situation is accompanied by the
following warning message into the
error.log
file:
[Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting, consider raising the MaxClients setting
There is no problem -- any connection attempts over the MaxClients
limit will normally be queued, up to a number based on the
ListenBacklog
directive. Once a child process is freed at the end of a different request,
the connection will then be served.
But it is an error because clients are being put in the queue rather than getting served at once, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more children. The best approach is to try not to have this condition reached at all, and if reach it often you should start to worry about it.
It's important to understand how much real memory a child occupies. Your
children can share the memory between them (when OS supports that and you
take action to allow the sharing happen - See
Preload Perl modules at server startup). If this is the case, chances are that your MaxClients
can be even higher. But it seems that it's not so simple to calculate the
absolute number. (If you come up with solution please let us know!). If the
shared memory was of the same size through the child's life, we could
derive a much better formula:
Total_RAM + Shared_RAM_per_Child * MaxClients MaxClients = --------------------------------------------- Max_Process_Size - 1
which is:
Total_RAM - Max_Process_Size MaxClients = --------------------------------------- Max_Process_Size - Shared_RAM_per_Child
Let's roll some calculations:
Total_RAM = 500Mb Max_Process_Size = 10Mb Shared_RAM_per_Child = 4Mb
500 - 10 MaxClients = --------- = 81 10 - 4
With no sharing in place
500 MaxClients = --------- = 50 10
With sharing in place you can have 60% more servers without purchasing more RAM, if you improve and keep the sharing level, let's say:
Total_RAM = 500Mb Max_Process_Size = 10Mb Shared_RAM_per_Child = 8Mb
500 - 10 MaxClients = --------- = 245 10 - 8
390% more servers!!! You've got the point :)
The MaxRequestsPerChild
directive sets the limit on the number of requests that an individual child
server process will handle. After
MaxRequestsPerChild
requests, the child process will die. If
MaxRequestsPerChild
is 0, then the process will live forever.
Setting MaxRequestsPerChild
to a non-zero limit has two beneficial effects: it solves memory leakages
and helps reduce the number of processes when the server load reduces.
The first reason is the most crucial for mod_perl, since sloppy programming
will cause a child process to consume more memory after each request. If
left unbounded, then after a certain number of requests the children will
use up all the available memory and leave the server to die from memory
starvation. Note, that sometimes standard system libraries leak memory too,
especially on OSes with bad memory management (e.g. Solaris 2.5 on x86
arch). If this is your case you can set MaxRequestsPerChild
to a small number, which will allow the system to reclaim the memory,
greedy child process consumed, when it exits after MaxRequestsPerChild
requests. But beware -- if you set this number too low, you will loose a
fracture of the speed bonus you receive with mod_perl. Consider using Apache::PerlRun
if this is the case. Also setting MaxSpareServers
to a number close to
MaxClients
, will improve the response time (but your parent process will be busy
respawning new children all the time!)
Another approach is to use Apache::SizeLimit
(See Limiting the size of the processes). By using this module, you should be able to discontinue using the
MaxRequestsPerChild
, although for some folks, using both in combination does the job.
See also Preload Perl modules at server startup and Sharing Memory.
With mod_perl enabled, it might take as much as 30 seconds from the time
you start the server until it is ready to serve incoming requests. This
delay depends on the OS, the number of preloaded modules and the process
load of the machine. So it's best to set
StartServers
and MinSpareServers
to high numbers, so that if you get a high load just after the server has
been restarted, the fresh servers will be ready to serve requests
immediately. With mod_perl, it's usually a good idea to raise all 3
variables higher than normal. In order to maximize the benefits of
mod_perl, you don't want to kill servers when they are idle, rather you
want them to stay up and available to immediately handle new requests. I
think an ideal configuration is to set MinSpareServers
and MaxSpareServers
to similar values, maybe even the same. Having the MaxSpareServers
close to MaxClients
will completely use all of your resources (if
MaxClients
has been chosen to take the full advantage of the resources), but it'll
make sure that at any given moment your system will be capable of
responding to requests with the maximum speed (given that number of
concurrent requests is not higher than
MaxClients
.)
Let's try some numbers. For a heavily loaded web site and a dedicated machine I would think of (note 400Mb is just for example):
Available to webserver RAM: 400Mb Child's memory size bounded: 10Mb MaxClients: 400/10 = 40 (larger with mem sharing) StartServers: 20 MinSpareServers: 20 MaxSpareServers: 35
However if I want to use the server for many other tasks, but make it capable of handling a high load, I'd think of:
Available to webserver RAM: 400Mb Child's memory size bounded: 10Mb MaxClients: 400/10 = 40 StartServers: 5 MinSpareServers: 5 MaxSpareServers: 10
(These numbers are taken off the top of my head, and it shouldn't be used as a rule, but rather as examples to show you some possible scenarios. Use this information wisely!)
OK, we've run various benchmarks -- let's summarize the conclusions:
If your scripts are clean and don't leak memory, set this variable to a
number as large as possible (10000?). If you use
Apache::SizeLimit
, you can set this parameter to 0 (equal to infinity). You will want this
parameter to be smaller if your code becomes unshared over the process'
life.
If you keep a small number of servers active most of the time, keep this
number low. Especially if MaxSpareServers
is low as it'll kill the just loaded servers before they were utilized at
all (if there is no load). If your service is heavily loaded, make this
number close to
MaxClients
(and keep MaxSpareServers
equal to MaxClients
as well.)
If your server performs other work besides web serving, make this low so the memory of unused children will be freed when there is no big load. If your server's load varies (you get loads in bursts) and you want fast response for all clients at any time, you will want to make it high, so that new children will be respawned in advance and be waiting to handle bursts of requests.
The logic is the same as of MinSpareServers
- low if you need the machine for other tasks, high if it's a dedicated web
host and you want a minimal response delay.
Not too low, so you don't get into a situation where clients are waiting for the server to start serving them (they might wait, but not for too long). Do not set it too high, since if you get a high load and all requests will be immediately granted and served, your CPU will have a hard time keeping up, and if the child's size * number of running children is larger than the total available RAM, your server will start swapping (which will slow down everything, which in turn will make things even more slower, until eventually your machine will die). It's important that you take pains to ensure that swapping does not normally happen. Swap space is an emergency pool, not a resource to be used on a consistent basis. If you are low on memory and you badly need it - buy it, memory is amazingly cheap these days.
But based on the test I conducted above, even if you have plenty of memory
like I have (1Gb), increasing MaxClients
sometimes will give you no speedup. The more clients are running, the more
CPU time will be required, the less CPU time slices each process will
receive. The response latency (the time to respond to a request) will grow,
so you won't see the expected improvement. The best approach is to find the
minimum requirement for your kind of service and the maximum capability of
your machine. Then start at the minimum and test like I did, successively
raising this parameter until you find the point on the curve of the graph
of the latency or/and throughput where the improvement becomes smaller.
Stop there and use it. Of course when you use these parameters in
production server, you will have the ability to tune them more precisely,
since then you will see the real numbers. Also don't forget that if you add
more scripts, or just modify the running ones -- most probably that the
parameters need to be recalculated, since the processes will grow in size
as you compile in more code.
Another popular use of mod_perl is to take advantage of its ability to maintain persistent open database connections. The basic approach is as follows:
# Apache::Registry script ------------------------- use strict; use vars qw($dbh); $dbh ||= SomeDbPackage->connect(...);
Since $dbh
is a global variable for the child, once the child has opened the
connection it will use it over and over again, unless you perform disconnect()
.
Be careful to use different names for handlers if you open connection to different databases!
Apache::DBI
allows you to make a persistent database connection. With this module
enabled, every connect()
request to the plain DBI
module will be forwarded to the Apache::DBI
module. This looks to see whether a database handle from a previous
connect()
request has already been opened, and if this handle is still valid using
the ping method. If these two conditions are fulfilled it just returns the
database handle. If there is no appropriate database handle or if the ping
method fails, a new connection is established and the handle is stored for
later re-use. There is no need to delete the disconnect()
statements
from your code. They will not do a thing, as the Apache::DBI
module overloads the disconnect()
method with a NOP. On child's exit there is no explicit disconnect, the
child dies and so does the database connection. You may leave the use DBI;
statement inside the scripts as well.
The usage is simple -- add to httpd.conf
:
PerlModule Apache::DBI
It is important, to load this module before any other DBI
,
DBD::*
and ApacheDBI*
modules!
db.pl ------------ use DBI; use strict; my $dbh = DBI->connect( 'DBI:mysql:database', 'user', 'password', { autocommit => 0 } ) || die $DBI::errstr; ...rest of the program
If you use DBI
for DB connections, and you use Apache::DBI
to make them persistent, it also allows you to preopen connections to DB
for each child with connect_on_init()
method, thus saving up a connection overhead on the very first request of
every child.
use Apache::DBI (); Apache::DBI->connect_on_init("DBI:mysql:test", "login", "passwd", { RaiseError => 1, PrintError => 0, AutoCommit => 1, } );
This can be used as a simple way to have apache children establish
connections on server startup. This call should be in a startup file
require()d
by PerlRequire
or inside <Perl> section. It will establish a connection when a child is started in
that child process. See the Apache::DBI
manpage to see the requirements for this method.
You can also benefit from persistent connections by replacing
prepare()
with prepare_cached().
But it can
produce a little overhead (META, why?).
Another problem is with timeouts: some databases disconnect the client
after a certain time of inactivity. This problem is known as morning
bug. The ping()
method ensures that this will not happen. Some
DBD
drivers don't have this method, check the Apache::DBI
manpage to see how to write a ping()
method.
Another approach is to change the client's connection timeout. For mysql
users, starting from mysql-3.22.x you can set a wait_timeout
option at mysqld server startup to change the default value. Setting it to
36 hours probably would fix the timeout problem.
As you know local $|=1;
disables the buffering of the currently selected file handle (default is STDOUT
). If you enable it,
ap_rflush()
is called after each print()
, unbuffering Apache's IO.
If you are using a _bad_ style in generating output, which consist of
multiple print()
calls, or you just have too many of them, you will experience a degradation
in performance. The severity depends on the number of the calls you make.
Many old CGIs were written in the style of:
print "<BODY BGCOLOR=\"black\" TEXT=\"white\">"; print "<H1>"; print "Hello"; print "</H1>"; print "<A HREF=\"foo.html\"> foo </A>"; print "</BODY>";
which reveals the following drawbacks: multiple print()
calls - performance degradation with $|=1
, backslashism which makes the code less readable and more difficult to
format the HTML to be easily readable as CGI's output. The code below
solves them all:
print qq{ <BODY BGCOLOR="black" TEXT="white"> <H1> Hello </H1> <A HREF="foo.html"> foo </A> </BODY> };
I guess you see the difference. Be careful though, when printing a
<HTML
> tag. The correct way is:
print qq{<HTML> <HEAD></HEAD> <BODY> }
If you try the following:
print qq{ <HTML> <HEAD></HEAD> <BODY> }
Some older browsers might not accept the output as HTML, but rather print
it as a plain text, since they expect the first characters after the
headers and empty line to be <HTML
> and not spaces and/or additional newline and then <HTML
>. Even if it works with your browser, it might not work for others.
Now let's go back to the $|=1
topic. I still disable buffering, for 2 reasons: I use few print()
calls by printing out multiline HTML and not a line per print()
and I want my users to see the output immediately. So if I am about to
produce the results of the DB query, which might take some time to
complete, I want users to get some titles ahead. This improves the
usability of my site. Recall yourself: What do you like better: getting the
output a bit slower, but steadily from the moment you've pressed the Submit button or having to watch the ``falling stars'' for awhile and then to
receive the whole output at once, even a few millisecs faster (if the
client (browser) did not time out till then).
Conclusion: Do not blindly follow suggestions, but think what is best for you in every given case.
One of the important issues in improving the performance is reduction of memory usage - the less memory each server uses, the more server processes you can start, and thus the more performance you have (from the user's point of view - the response speed )
See Global Variables
Profiling process helps you to determine which subroutines or just snippets of code take the longest execution time and which subroutines are being called most often. Probably you will want to optimize those, and to improve the code toward efficiency.
It is possible to profile code running under mod_perl with the
Devel::DProf
module, available on CPAN. However, you must have apache version 1.3b3 or
higher and the PerlChildExitHandler
enabled (during the httpd build process). When the server is started,
Devel::DProf
installs an END
block to write the tmon.out
file. This block will be called at the server shutdown. Here is how to
start and stop a server with the profiler enabled:
% setenv PERL5OPT -d:DProf % httpd -X -d `pwd` & ... make some requests to the server here ... % kill `cat logs/httpd.pid` % unsetenv PERL5OPT % dprofpp
The Devel::DProf
package is a Perl code profiler. It will collect information on the
execution time of a Perl script and of the subs in that script (remember
that print()
and map()
are just like any other subroutines you write, but they are come bundled
with Perl!)
Another approach is to use Apache::DProf
, which hooks
Devel::DProf
into mod_perl. The Apache::DProf
module will run a
Devel::DProf
profiler inside each child server and write the
tmon.out
file in the directory $ServerRoot/logs/dprof/$$
when the child is shutdown (where $$
is a number of the child process). All it takes is to add to httpd.conf
:
PerlModule Apache::DProf
Remember that any PerlHandler that was pulled in before
Apache::DProf
in the httpd.conf
or <startup.pl>, would not have its code debugging info inserted. To run dprofpp
, chdir to
$ServerRoot/logs/dprof/$$
and run:
% dprofpp
Which approach is better?
use CGI; my $q = new CGI; print $q->param('x');
versus
use CGI (:standard); print param('x');
There is not any performance benefit of using the object calls rather than
the function calls, but there is a real memory hit when you import all of CGI.pm
's function calls into your process memory. This can be significant,
particularly when there are many child daemons.
I strongly endorse Apache::Request (libapreq) - Generic Apache Request Library. Its guts are all written in C, giving it a significant memory and performance benefit.
See Apache::GzipChain - compress HTML (or anything) in the OutputChain
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
There is no such a thing as a single RIGHT strategy in web server business, though there are many wrong ones. Never believe a person who says: "Do it this way, this is the best!". As the old saying goes: "Trust but verify". There are too many technologies out there to choose from, and it would take an enormous investment of time and money to try to validate each one before deciding which is the best choice for your situation. Keeping this idea in mind, I will present some different combinations of mod_perl and other technologies or just standalone mod_perl. I'll describe how these things work together, and offer my opinions on the pros and cons of each, the relative degree of difficulty in installing and maintaining them, some hints on approaches that should be used and things to avoid.
To be clear, I will not address all technologies and tools, but limit this discussion to those complementing mod_perl.
Please let me stress it again: DO NOT blindly copy someone's setup and hope for a good result. Choose what is best for your situation -- it might take some effort to find it out.
There are several different ways to build, configure and deploy your mod_perl enabled server. Some of them are:
Having one binary and one config file (one big binary for mod_perl).
Having two binaries and two config files (one big binary for mod_perl and one small for static objects like images.)
Having one DSO-style binary, mod_perl loadable object and two config files (Dynamic linking lets you compile once and have a big and a small binary in memory BUT you have to deal with a freshly made solution that has weak documentation and is still subject to change and is rather more complex.)
Any of the above plus a reverse proxy server in http accelerator mode.
If you are a newbie, I would recommend that you start with the first option and work on getting your feet wet with apache and mod_perl. Later, you can decide whether to move to the second one which allows better tuning at the expense of more complicated administration, or to the third option -- the more state-of-the-art-yet-suspiciously-new DSO system, or to the fourth option which gives you even more power.
The first option will kill your production site if you serve a lot of static data with ~2-12 MB webserver processes. On the other hand, while testing you will have no other server interaction to mask or add to your errors.
The second option allows you to seriously tune the two servers for maximum performance. On the other hand you have to deal with proxying or fancy site design to keep the two servers in synchronization. In this configuration, you also need to choose between running the two servers on multiple ports, multiple IPs, etc... This adds the burden of administrating more than one server.
The third option (DSO) -- as mentioned above -- means playing with the
bleeding edge. In addition mod_so
(the DSO module) adds size and complexity to your binaries. With DSO,
modules can be added and removed without recompiling the server, and
modules are even shared among multiple servers. Again, it is bleeding edge
and still somewhat platform specific, but your mileage may vary. See mod_perl server as DSO.
The fourth option (proxy in http accelerator mode), once correctly configured and tuned, improves the performance of any of the above three options by caching and buffering page results.
The rest of this chapter discusses the pros and the cons of each of these presented configurations. Real World Scenarios Implementaion describes the implementation techniques of these schemas.
The first approach is to implement a straightforward mod_perl server. Just take your plain apache server and add mod_perl, like you add any other apache module. You continue to run it at the port it was running before. You probably want to try this before you proceed to more sophisticated and complex techniques.
The advantages:
Simplicity. You just follow the installation instructions, configure it, restart the server and you are done.
No network changes. You do not have to worry about using additional ports as we will see later.
Speed. You get a very fast server, you see an enormous speedup from the first moment you start to use it.
The disadvantages:
The process size of a mod_perl-enabled Apache server is huge (starting from 4Mb at startup and growing to 10Mb and more, depending on how you use it) compared to the typical plain Apache. Of course if memory sharing is in place -- RAM requirements will be smaller.
You probably have a few tens of children processes. The additional memory requirements add up in direct relation to the number of children processes. Your memory demands are growing by an order of magnitude, but this is the price you pay for the additional performance boost of mod_perl. With memory prices so cheap nowadays, the additional cost is low -- especially when you consider the dramatic performance boost mod_perl gives to your services with every 100Mb of RAM you add.
While you will be happy to have these monster processes serving your scripts with monster speed, you should be very worried about having them serve static objects such as images and html files. Each static request served by a mod_perl-enabled server means another large process running, competing for system resources such as memory and CPU cycles. The real overhead depends on static objects request rate. Remember that if your mod_perl code produces HTML code which includes images, each one will turn into another static object request. Having another plain webserver to serve the static objects solves this not pleasant obstacle. Having a proxy server as a front end, caching the static objects and freeing the mod_perl processes from this burden is another solution. We will discuss both below.
Another drawback of this approach is that when serving output to a client with a slow connection, the huge mod_perl-enabled server process (with all of its system resources) will be tied up until the response is completely written to the client. While it might take a few milliseconds for your script to complete the request, there is a chance it will be still busy for some number of seconds or even minutes if the request is from a slow connection client. As in the previous drawback, a proxy solution can solve this problem. More on proxies later.
Proxying dynamic content is not going to help much if all the clients are on a fast local net (for example, if you are administering an Intranet.) On the contrary, it can decrease performance. Still, remember that some of your Intranet users might work from home through the slow modem links.
If you are new to mod_perl, this is probably the best way to get yourself started.
And of course, if your site is serving only mod_perl scripts (close to zero static objects, like images), this might be the perfect choice for you!
For implementation notes see : One Plain and One mod_perl enabled Apache Servers
As I have mentioned before, when running scripts under mod_perl, you will notice that the httpd processes consume a huge amount of virtual memory, from 5Mb to 15Mb and even more. That is the price you pay for the enormous speed improvements under mod_perl. (Again -- shared memory keeps the real memory that is being used much smaller :)
Using these large processes to serve static objects like images and html documents is overkill. A better approach is to run two servers: a very light, plain apache server to serve static objects and a heavier mod_perl-enabled apache server to serve requests for dynamic (generated) objects (aka CGI).
From here on, I will refer to these two servers as httpd_docs (vanilla apache) and httpd_perl (mod_perl enabled apache).
The advantages:
The heavy mod_perl processes serve only dynamic requests, which allows the deployment of fewer of these large servers.
MaxClients
, MaxRequestsPerChild
and related parameters can now be optimally tuned for both httpd_docs
and httpd_perl
servers, something we could not do before. This allows us to fine tune the
memory usage and get a better server performance.
Now we can run many lightweight httpd_docs
servers and just a few heavy httpd_perl
servers.
An important note: When user browses static pages and the base URL in the Location window points to the static server, for example
http://www.nowhere.com/index.html
-- all relative URLs (e.g. <A
HREF="/main/download.html"
>) are being served by the light plain apache server. But this is not
the case with dynamically generated pages. For example when the base URL in
the Location window points to the dynamic server -- (e.g. http://www.nowhere.com:8080/perl/index.pl
) all relative URLs in the dynamically generated HTML will be served by the
heavy mod_perl processes. You must use a fully qualified URLs and not the
relative ones! http://www.nowhere.com/icons/arrow.gif
is a full URL, while
/icons/arrow.gif
is a relative one. Using <BASE
HREF="http://www.nowhere.com/"
> in the generated HTML is another way to handle this problem. Also the httpd_perl
server could rewrite the requests back to httpd_docs
(much slower) and you still need an attention of the heavy servers. This is
not an issue if you hide the internal port implementations, so client sees
only one server running on port 80
. (See Publishing port numbers different from 80)
The disadvantages:
An administration overhead.
A need for two different sets of configuration, log and other files. We
need a special directory layout to manage these. While some directories can
be shared between the two servers (like the include
directory, containing the apache include files -- assuming that both are
built from the same source distribution), most of them should be separated
and the configuration files updated to reflect the changes.
A need for two sets of controlling scripts (startup/shutdown) and watchdogs.
If you are processing log files, now you probably will have to merge the two separate log files into one before processing them.
We still have the problem of a mod_perl process spending its precious time serving slow clients, when the processing portion of the request was completed long time ago, exactly as in the one server approach. Deploying a proxy solves this, and will be covered in the next sections.
As with only one server approach, this is not a major disadvantage if you are on a fast local Intranet. It is likely that you do not want a buffering server in this case.
Before you go on with this solution you really want to look at the Adding a Proxy Server in http Accelerator Mode section.
For implementation notes see : One Plain and One mod_perl enabled Apache Servers
If the only requirement from the light server is for it to serve static
objects, then you can get away with non-apache servers having an even
smaller memory footprint. thttpd
has been reported to be about 5 times faster then apache (especially under
a heavy load), since it is very simple and uses almost no memory (260k) and
does not spawn child processes.
Meta: Hey, No personal experience here, only rumours. Please let me know if I have missed some pros/cons here. Thanks!
The Advantages:
All the advantages of the 2 servers scenario.
More memory saving. Apache is about 4 times bigger then thttpd, if you spawn 30 children you use about 30M of memory, while thttpd uses only 260k - 100 times less! You could use the saved 30M to run more mod_perl servers.
Note that this is not true if your OS supports memory sharing and you configured apache to use it (it is a DSO approach. There is no memory sharing if apache modules are being statically compiled into httpd). If you do allow memory sharing -- 30 light apache servers ought to use about 3-4Mb only, because most of it will be shared. If this is the case -- the save ups are much smaller with thttpd.
Reported to be about 5 times faster then plain apache serving static objects.
The Disadvantages:
Lacks some of apache's features, like access control, error redirection, customizable log file formats, and so on.
At the beginning there were 2 servers: one - plain apache server, which was very light, and configured to serve static objects, the other -- mod_perl enabled,
which was very heavy and aimed to serve mod_perl scripts. We named them: httpd_docs
and httpd_perl
appropriately. The two servers coexisted at the same IP(DNS)
by listening to different ports: 80 -- for httpd_docs
(e.g. http://www.nowhere.com/images/test.gif
) and 8080 -- for
httpd_perl
(e.g. http://www.nowhere.com:8080/perl/test.pl
). Note that I did not write http://www.nowhere.com:80 for the
first example, since port 80 is a default http port. (Later on, I will be
moving the
httpd_docs
server to port 81.)
Now I am going to convince you that you want to use a proxy server (in the http accelerator mode). The advantages are:
Allow serving of static objects from the proxy's cache (objects that
previously were entirely served by the httpd_docs
server).
You get less I/O activity reading static objects from the disk (proxy serves the most ``popular'' objects from the RAM memory - of course you benefit more if you allow the proxy server to consume more RAM). Since you do not wait for the I/O to be completed you are able to serve the static objects much faster.
The proxy server acts as a sort of output buffer for the dynamic content. The mod_perl server sends the entire response to the proxy and is then free to deal with other requests. The proxy server is responsible for sending the response to the browser. So if the transfer is over a slow link, the mod_perl server is not waiting around for the data to move.
Using numbers is always more convincing :) Let's take a user connected to your site with 28.8 kbps (bps == bits/sec) modem. It means that the speed of the user's link is 28.8/8 = 3.6 kbytes/sec. I assume an average generated HTML page to be of 10kb (kb == kilobytes) and an average script that generates this output in 0.5 secs. How much time will the server wait before the user gets the whole output response? A simple calculation reveals pretty scary numbers - it will have to wait for another 6 secs (20kb/3.6kb), when it could serve another 12 (6/0.5) dynamic requests in this time. This very simple example shows us that we need a twelve the number of children running, which means you will need only one twelve of the memory (which is not quite true because some parts of the code are being shared). But you know that nowadays scripts return pages which sometimes are being blown up with javascript code and similar, which makes them of 100kb size and download time to be of... (This calculation is left to you as an exercise :)
To make your estimation of download time numbers even worse, let me remind you that many users like to open many browser windows and do many things at once (download files and browse heavy sites). So the speed of 3.6kb/sec we were assuming before, may many times be 5-10 times slower.
Also we are going to hide the details of the server's implementation. Users will never see ports in the URLs (more on that topic later). And you can have a few boxes serving the requests, and only one serving as a front end, which spreads the jobs between the servers in a way you configured it too. So you can actually put down one server down for upgrade, but end user will never notice that because the front end server will dispatch the jobs to other servers. (Of course this is a pretty big issue, and it would not be discussed in the scope of this document)
For security reasons, using any httpd accelerator (or a proxy in httpd accelerator mode) is essential because you do not let your internal server get directly attacked by arbitrary packets from whomever. The httpd accelerator and internal server communicate in expected HTTP requests. This allows for only your public ``bastion'' accelerating www server to get hosed in a successful attack, while leaving your internal data safe.
The disadvantages are:
Of course there are drawbacks. Luckily, these are not functionality drawbacks, but more of administration hassle. You add another daemon to worry about, and while proxies are generally stable, you have to make sure to prepare proper startup and shutdown scripts, which are being run at the boot and reboot appropriately. Also, maybe a watchdog script running at the crontab.
Proxy servers can be configured to be light or heavy, the admin must decide what gives the highest performance for his application. A proxy server like squid is light in the concept of having only one process serving all requests. But it can appear pretty heavy when it loads objects into memory for faster service.
Have I succeeded in convincing you that you want the proxy server?
If you are on a local area network (LAN), then the big benefit of the proxy buffering the output and feeding a slow client is gone. You are probably better off sticking with a straight mod_perl server in this case.
As of this writing the two proxy implementations are known to be used in bundle with mod_perl - squid proxy server and mod_proxy which is a part of the apache server. Let's compare the two of them.
The Advantages:
Caching of static objects. So these are being served much faster assuming that your cache size is big enough to keep the most requested objects in the cache.
Buffering of dynamic content, by taking the burden of returning the content generated by mod_perl servers to slow clients, thus freeing mod_perl servers from waiting for the slow clients to download the data. Freed servers immediately switch to serve other requests, thus your number of required servers goes dramatically down.
Non-linear URL space / server setup. You can use Squid to play some tricks with the URL space and/or domain based virtual server support.
The Disadvantages:
Proxying dynamic content is not going to help much if all the clients are on a fast local net. Also, a message on the squid mailing list implied that squid only buffers in 16k chunks so it would not allow a mod_perl to complete immediately if the output is larger.
Speed. Squid is not very fast today when compared to plain file based web servers available. Only if you are using a lot of dynamic features such as mod_perl or similar speed is a reason to use Squid, and then only if the application and server is designed with caching in mind.
Memory usage. Squid uses quite a bit of memory.
HTTP protocol level. Squid is pretty much a HTTP/1.0
server, which seriously limits the deployment of HTTP/1.1
features.
HTTP headers, dates and freshness. The squid server might give out ``old'' pages, confusing downstream/client caches. Also chances are that you will be giving out stale pages. (You update the some documents on the site, but squid will still serve the old ones.)
Stability. Compared to plain web servers Squid is not the most stable.
The presented pros and cons lead to an idea, that probably you might want squid more for its dynamic content buffering features, but only if your server serves mostly dynamic requests. So in this situation it is better to have a plain apache server serving static objects, and squid proxying the mod_perl enabled server only. At least when performance is the goal.
For implementation details see: Running 1 webserver and squid in httpd accelerator mode and Running 2 webservers and squid in httpd accelerator mode
I do not think the difference in speed between apache's mod_proxy and squid is relevant for most sites, since the real value of what they do is buffering for slow client connections. However squid runs as a single process and probably consumes fewer system resources. The trade-off is that mod_rewrite is easy to use if you want to spread parts of the site across different back end servers, and mod_proxy knows how to fix up redirects containing the back-end server's idea of the location. With squid you can run a redirector process to proxy to more than one back end, but there is a problem in fixing redirects in a way that keeps the client's view of both server names and port numbers in all cases. The difficult case being where you have DNS aliases that map to the same IP address for an alias and you want the redirect to use port 80 (when the server is really on a different port) but you want it to keep the specific name the browser sent so it does not change in the client's Location window.
The Advantages:
No additional server is needed. We keep the one plain plus one mod_perl
enabled apache servers. All you need is to enable the
mod_proxy
in the httpd_docs
server and add a few lines to
httpd.conf
file.
ProxyPass
and ProxyPassReverse
directives allow you to hide the internal redirects, so if http://nowhere.com/modperl/
is actually
http://localhost:81/modperl/
, it will be absolutely transparent for user. ProxyPass
redirects the request to the mod_perl server, and when it gets the respond, ProxyPassReverse
rewrites the URL back to the original one, e.g:
ProxyPass /modperl/ http://localhost:81/modperl/ ProxyPassReverse /modperl/ http://localhost:81/modperl/
It does mod_perl output buffering like squid does. See the Using mod_proxy notes for more details.
It even does caching. You have to produce correct Content-Length
,
Last-Modified
and Expires
http headers for it to work. If some dynamic content is not to change
constantly, you can dramatically increase performance by caching it with ProxyPass
.
ProxyPass
happens before the authentication phase, so you do not have to worry about
authenticating twice.
Apache is able to accel https (secure) requests completely, while also doing http accel. (with squid you have to use an external redirection program for that).
The latest (from apache 1.3.6) Apache proxy accel mode reported to be very stable.
The Disadvantages:
Users reported that it might be a bit slow, but the latest version is fast enough. (How fast is enough? :)
For implementation see Using mod_proxy.
|
||
Written by Stas Bekman.
Last Modified at 08/17/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
The Installation is very very simple (example of installation on Linux OS):
% cd /usr/src % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz % tar zvxf apache_x.xx.tar.gz % tar zvxf mod_perl-x.xx.tar.gz % cd mod_perl-x.xx % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \ DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1 % make && make test && make install % cd ../apache_x.x.x % make install
That's all!
Notes: Replace x.x.x with the real version numbers of mod_perl and apache.
gnu tar
uncompresses as well (with z
flag).
First download the sources of both packages, e.g. you can use
lwp-download
utility to do it. lwp-download
is a part of the LWP (or libwww
) package, you will need to have it installed in order for mod_perl's make test
to pass. Once you install this package unless it's already installed, lwp-download
will be available for you as well.
% lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
Extract both sources. Usually I open all the sources in /usr/src/
, your mileage may vary. So move the sources and chdir
to the directory, you want to put the sources in. Gnu tar
utility knows to uncompress too with z
flag, if you have a non-gnu tar
utility, it will be incapable to decompress, so you would do it in two
steps: first uncompressing the packages with gzip -d apache_x.xx.tar.gz
and gzip -d mod_perl-x.xx.tar.gz
, second un-tarring them with tar
xvf apache_x.xx.tar
and tar xvf mod_perl-x.xx.tar
.
% cd /usr/src % tar zvxf apache_x.xx.tar.gz % tar zvxf mod_perl-x.xx.tar.gz
chdir
to the mod_perl source directory:
% cd mod_perl-x.xx
Now build the make file, for a basic work and first time installation the
parameters in the example below are the only ones you would need. APACHE_SRC
tells where the apache src
directory is. If you have followed my suggestion and have extracted the
both sources under the same directory (/usr/src
), do:
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \ DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
There are many additional parameters. You can find some of them in the
configuration dedicated and other sections. While running perl
Makefile.PL ...
the process will check for prerequisites and tell you if something is
missing, If you are missing some of the perl packages or other software --
you will have to install these before you proceed.
Now we make the project (by building the mod_perl extension and calling make
in apache source directory to build a httpd
),
test it (by running various tests) and install the mod_perl modules.
% make && make test && make install
Note that if make fails, neither make test nor make install will be not executed. If make test fails, make install will be not executed.
Now change to apache source directory and run make install
to install apache's headers, default configuration files, to build apache
directory tree and to put the httpd
there.
% cd ../apache_x.x.x % make install
When you execute the above command, apache installation process will tell
you how to start a freshly built webserver (the path of the
apachectl
, more about it later) and where the configuration files are. Remember (or
even better write down) both, since you will need this information very
soon. On my machine the two important paths are:
/usr/local/apache/bin/apachectl /usr/local/apache/conf/httpd.conf
Now the build and the installation processes are completed. Just configure httpd.conf
and start the webserver.
A basic configuration is a simple one. First configure the apache as you
always do (set Port
, User
, Group
, correct ErrorLog
and other file paths and etc), start the server and make sure it works. One
of the ways to start and stop the server is to use
apachectl
utility:
% /usr/local/apache/bin/apachectl start % /usr/local/apache/bin/apachectl stop
Shut the server down, open the httpd.conf
in your favorite editor and scroll to the end of the file, where we will
add the mod_perl configuration directives (of course you can place them
anywhere in the file).
Add the following configuration directives:
Alias /perl/ /home/httpd/perl/
Assuming that you put all your scripts, that should be executed by mod_perl
enabled server, under /home/httpd/perl/
directory.
PerlModule Apache::Registry <Location /perl> SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On allow from all </Location>
Now put a test script into /home/httpd/perl/
directory:
test.pl ------- #!/usr/bin/perl -w use strict; print "Content-type: text/html\r\n\r\n"; print "It worked!!!\n"; -------
Make it executable and readable by server, if your server is running as
user nobody
(hint: look for User
directive in httpd.conf
file), do the following:
% chown nobody /home/httpd/perl/test.pl % chmod u+rx /home/httpd/perl/test.pl
Test that the script is running from the command line, by executing it:
% /home/httpd/perl/test.pl
You should see:
Content-type: text/html It worked!!!
Now it is a time to test our mod_perl server, assuming that your config
file includes Port 80
, go to your favorite Netscape browser and fetch the following URL (after
you have started the server):
http://localhost/perl/test.pl
Make sure that you have a loop-back device configured, if not -- use the real server name for this test, for example:
http://www.nowhere.com/perl/test.pl
You should see:
It worked!!!
If something went wrong, go through the installation process again, and
make sure you didn't make a mistake. If that doesn't help, read the INSTALL
pod document (perlpod INSTALL
) in the mod_perl distribution directory.
Now copy some of your perl/CGI scripts into a /home/httpd/perl/
directory and see them working much much faster, from the newly configured
base URL (/perl/
). Some of your scripts will not work out of box and will demand some minor
tweaking or major rewrite to make them work properly with mod_perl enabled
server. Chances are that if you are not practicing a sloppy programming
techniques -- the scripts will work without any modifications at all.
The above setup is very basic, it will help you to have a mod_perl enabled server running and to get a good feeling from watching your previously slow CGIs now flying.
As with perl you can start benefit from mod_perl from the very first moment you try it. When you become more familiar with mod_perl you will want to start writing apache handlers and deploy more of the mod_perl power.
Since we are going to run two apache servers we will need two different sets of configuration, log and other files. We need a special directory layout. While some of the directories can be shared between the two servers (assuming that both are built from the same source distribution), others should be separated. From now on I will refer to these two servers as httpd_docs (vanilla Apache) and httpd_perl (Apache/mod_perl).
For this illustration, we will use /usr/local
as our root
directory. The Apache installation directories will be stored under this
root (/usr/local/bin
, /usr/local/etc
and etc...)
First let's prepare the sources. We will assume that all the sources go
into /usr/src
dir. It is better when you use two separate copies of apache sources. Since
you probably will want to tune each apache version at separate and to do
some modifications and recompilations as the time goes. Having two
independent source trees will prove helpful, unless you use DSO
, which is covered later in this section.
Make two subdirectories:
% mkdir /usr/src/httpd_docs % mkdir /usr/src/httpd_perl
Put the Apache sources into a /usr/src/httpd_docs
directory:
% cd /usr/src/httpd_docs % gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -
If you have a gnu tar:
% tar xvzf /tmp/apache_x.x.x.tar.gz
Replace /tmp
directory with a path to a downloaded file and
x.x.x
with the version of the server you have.
% cd /usr/src/httpd_docs % ls -l drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 apache_x.x.x/
Now we will prepare the httpd_perl
server sources:
% cd /usr/src/httpd_perl % gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf - % gzip -dc /tmp/modperl-x.xx.tar.gz | tar xvf - % ls -l drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 apache_x.x.x/ drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 modperl-x.xx/
Time to decide on the desired directory structure layout (where the apache files go):
ROOT = /usr/local
The two servers can share the following directories (so we will not duplicate data):
/usr/local/bin/ /usr/local/lib /usr/local/include/ /usr/local/man/ /usr/local/share/
Important: we assume that both servers are built from the same Apache source version.
Servers store their specific files either in httpd_docs
or
httpd_perl
sub-directories:
/usr/local/etc/httpd_docs/ httpd_perl/ /usr/local/sbin/httpd_docs/ httpd_perl/ /usr/local/var/httpd_docs/logs/ proxy/ run/ httpd_perl/logs/ proxy/ run/
After completion of the compilation and the installation of the both
servers, you will need to configure them. To make things clear before we
proceed into details, you should configure the
/usr/local/etc/httpd_docs/httpd.conf
as a plain apache and Port
directive to be 80 for example. And
/usr/local/etc/httpd_perl/httpd.conf
to configure for mod_perl server and of course whose Port
should be different from the one
httpd_docs
server listens to (e.g. 8080). The port numbers issue will be discussed
later.
The next step is to configure and compile the sources: Below are the procedures to compile both servers taking into account the directory layout I have just suggested to use.
Let's proceed with installation. I will use x.x.x instead of real version numbers so this document will never become obsolete :).
% cd /usr/src/httpd_docs/apache_x.x.x % make clean % env CC=gcc \ ./configure --prefix=/usr/local \ --sbindir=/usr/local/sbin/httpd_docs \ --sysconfdir=/usr/local/etc/httpd_docs \ --localstatedir=/usr/local/var/httpd_docs \ --runtimedir=/usr/local/var/httpd_docs/run \ --logfiledir=/usr/local/var/httpd_docs/logs \ --proxycachedir=/usr/local/var/httpd_docs/proxy
If you need some other modules, like mod_rewrite and mod_include (SSI), add them here as well:
--enable-module=include --enable-module=rewrite
Note: gcc
-- compiles httpd by 100K+ smaller then cc
on AIX OS. Remove the line env CC=gcc
if you want to use the default compiler. If you want to use it and you are
a (ba)?sh user you will not need the
env
function, t?csh users will have to keep it in.
Note: add --layout
to see the resulting directories' layout without actually running the
configuration process.
% make % make install
Rename httpd
to http_docs
% mv /usr/local/sbin/httpd_docs/httpd \ /usr/local/sbin/httpd_docs/httpd_docs
Now update an apachectl utility to point to the renamed httpd via your favorite text editor or by using perl:
% perl -p -i -e 's|httpd_docs/httpd|httpd_docs/httpd_docs|' \ /usr/local/sbin/httpd_docs/apachectl
Before you start to configure the mod_perl sources, you should be aware
that there are a few Perl modules that have to be installed before building
mod_perl. You will be alerted if any required modules are missing when you
run the perl Makefile.PL
command line below. If you discover that some are missing, pick them from
your nearest CPAN repository (if you do not know what is it, make a visit
to http://www.perl.com/CPAN ) or run
the CPAN
interactive shell via the command line perl -MCPAN -e shell
.
Make sure the sources are clean:
% cd /usr/src/httpd_perl/apache_x.x.x % make clean % cd /usr/src/httpd_perl/mod_perl-x.xx % make clean
It is important to make clean since some of the versions are not binary compatible (e.g apache 1.3.3 vs 1.3.4) so any ``third-party'' C modules need to be re-compiled against the latest header files.
Here I did not find a way to compile with gcc
(my perl was compiled with cc
so we have to compile with the same compiler!!!
% cd /usr/src/httpd_perl/mod_perl-x.xx
% /usr/local/bin/perl Makefile.PL \ APACHE_PREFIX=/usr/local/ \ APACHE_SRC=../apache_x.x.x/src \ DO_HTTPD=1 \ USE_APACI=1 \ PERL_MARK_WHERE=1 \ PERL_STACKED_HANDLERS=1 \ ALL_HOOKS=1 \ APACI_ARGS=--sbindir=/usr/local/sbin/httpd_perl, \ --sysconfdir=/usr/local/etc/httpd_perl, \ --localstatedir=/usr/local/var/httpd_perl, \ --runtimedir=/usr/local/var/httpd_perl/run, \ --logfiledir=/usr/local/var/httpd_perl/logs, \ --proxycachedir=/usr/local/var/httpd_perl/proxy
Notice that all APACI_ARGS
(above) must be passed as one long line if you work with t?csh
!!! However it works correctly the way it shown above with (ba)?sh
(by breaking the long lines with '\
'). If you work with t?csh
it does not work, since t?csh
passes APACI_ARGS
arguments to ./configure
by keeping the new lines untouched, but stripping the original '\
', thus breaking the configuration process.
As with httpd_docs
you might need other modules like
mod_rewrite
, so add them here:
--enable-module=rewrite
Note: PERL_STACKED_HANDLERS=1
is needed for Apache::DBI
Now, build, test and install the httpd_perl
.
% make && make test && make install
Note: apache puts a stripped version of httpd
at
/usr/local/sbin/httpd_perl/httpd
. The original version which includes debugging symbols (if you need to run
a debugger on this executable) is located at
/usr/src/httpd_perl/apache_x.x.x/src/httpd
.
Note: You may have noticed that we did not run make install
in the apache's source directory. When USE_APACI
is enabled,
APACHE_PREFIX
will specify the --prefix
option for apache's
configure
utility, specifying the installation path for apache. When this option is
used, mod_perl's make install
will also
make install
on the apache side, installing the httpd binary, support tools, along with
the configuration, log and document trees.
If make test
fails, look into t/logs
and see what is in there. Also see make test fails.
While doing perl Makefile.PL ...
mod_perl might complain by warning you about missing libgdbm
. Users reported that it is actually crucial, and you must have it in order
to successfully complete the mod_perl building process.
Now rename the httpd
to httpd_perl
:
% mv /usr/local/sbin/httpd_perl/httpd \ /usr/local/sbin/httpd_perl/httpd_perl
Update the apachectl utility to point to renamed httpd name:
% perl -p -i -e 's|httpd_perl/httpd|httpd_perl/httpd_perl|' \ /usr/local/sbin/httpd_perl/apachectl
Now when we have completed the building process, the last stage before running the servers, is to configure them.
Configuring of httpd_docs
server is a very easy task. Open
/usr/local/etc/httpd_docs/httpd.conf
into your favorite editor (starting from version 1.3.4 of Apache - there is
only one file to edit). And configure it as you always do. Make sure you
configure the log files and other paths according to the directory layout
we decided to use.
Start the server with:
/usr/local/sbin/httpd_docs/apachectl start
Here we will make a basic configuration of the httpd_perl
server. We edit the /usr/local/etc/httpd_perl/httpd.conf
file. As with
httpd_docs
server configuration, make sure that ErrorLog
and other file's location directives are set to point to the right places,
according to the chosen directory layout.
The first thing to do is to set a Port
directive - it should be different from 80
since we cannot bind 2 servers to use the same port number on the same
machine. Here we will use 8080
. Some developers use port 81
, but you can bind to it, only if you have root permissions. If you are
running on multiuser machine, there is a chance someone already uses that
port, or will start using it in the future - which as you understand might
cause a collision. If you are the only user on your machine, basically you
can pick any not used port number. Port number choosing is a controversial
topic, since many organizations use firewalls, which may block some of the
ports, or enable only a known ones. From my experience the most used port
numbers are: 80
, 81
, 8000
and 8080
. Personally, I prefer the port 8080
. Of course with 2 server scenario you can hide the nonstandard port number
from firewalls and users, by either using the mod_proxy's ProxyPass
or proxy server like squid.
For more details see Publishing port numbers different from 80 , Running 1 webserver and squid in httpd accelerator mode, Running 2 webservers and squid in httpd accelerator mode and Using mod_proxy.
Now we proceed to mod_perl specific directives. A good idea will be to add
them all at the end of the httpd.conf
, since you are going to fiddle a lot with them at the beginning.
First, you need to specify the location where all mod_perl scripts will be located.
Add the following configuration directive:
# mod_perl scripts will be called from Alias /perl/ /usr/local/myproject/perl/
From now on, all requests starting with /perl
will be executed under mod_perl
and will be mapped to the files in
/usr/local/myproject/perl/
.
Now we should configure the /perl
location.
PerlModule Apache::Registry
<Location /perl> #AllowOverride None SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI allow from all PerlSendHeader On </Location>
This configuration causes all scripts that are called with a /perl
path prefix to be executed under the Apache::Registry
module and as a CGI (so the ExecCGI
, if you omit this option the script will be printed to the user's browser
as a plain text or will possibly trigger a 'Save-As' window). Apache::Registry
module lets you run almost unaltered CGI/perl scripts under mod_perl
. PerlModule
directive is an equivalent of perl's require()
. We load the
Apache::Registry
module before we use it in the PerlHandler
in the Location
configuration.
PerlSendHeader On
tells the server to send an HTTP header to the browser on every script
invocation. You will want to turn this off for nph (non-parsed-headers)
scripts.
This is only a very basic configuration. Server Configuration section covers the rest of the details.
Now start the server with:
/usr/local/sbin/httpd_perl/apachectl start
While I have detailed the mod_perl server installation, you are on your own
with installing the squid server (See Getting Helped for more details). I run linux, so I downloaded the rpm package, installed
it, configured the /etc/squid/squid.conf
, fired off the server and was all set. Basically once you have the squid
installed, you just need to modify the default squid.conf
the way I will explain below, then you are ready to run it.
First, let's understand what do we have in hands and what do we want from
squid. We have an httpd_docs
and httpd_perl
servers listening on ports 81 and 8080 accordingly (we have to move the
httpd_docs server to port 81, since port 80 will be taken over by squid).
Both reside on the same machine as squid. We want squid to listen on port
80, forward a single static object request to the port httpd_docs server
listens to, and dynamic request to httpd_perl's port. Both servers return
the data to the proxy server (unless it is already cached in the squid), so
user never sees the other ports and never knows that there might be more
then one server running. Proxy server makes all the magic behind it
transparent to user. Do not confuse it with mod_rewrite, where a server redirects the request somewhere according to the rules and
forgets about it. The described functionality is being known as httpd accelerator mode
in proxy dialect.
You should understand that squid can be used as a straight forward proxy
server, generally used at companies and ISPs to cut down the incoming
traffic by caching the most popular requests. However we want to run it in
the httpd accelerator mode
. Two directives:
httpd_accel_host
and httpd_accel_port
enable this mode. We will see more details in a few seconds. If you are
currently using the squid in the regular proxy mode, you can extend its
functionality by running both modes concurrently. To accomplish this, you
extend the existent squid configuration with httpd accelerator mode
's related directives or you just create one from scratch.
As stated before, squid listens now to the port 80, we have to move the httpd_docs server to listen for example to the port 81 (your mileage may vary :). So you have to modify the httpd.conf in the httpd_docs configuration directory and restart the httpd_docs server (But not before we get the squid running if you are working on the production server). And as you remember httpd_perl listens to port 8080.
Let's go through the changes we should make to the default configuration
file. Since this file (/etc/squid/squid.conf
) is huge (about 60k+) and we would not use 95% of it, my suggestion is to
write a new one including only the modified directives.
We want to enable the redirect feature, to be able to serve requests, by
more then one server (in our case we have httpd_docs and httpd_perl)
servers. So we specify httpd_accel_host
as virtual. This assumes that your server has multiple interfaces - Squid
will bind to all of them.
httpd_accel_host virtual
Then we define the default port - by default, if not redirected, httpd_docs will serve the pages. We assume that most requests will be of the static nature. We have our httpd_docs listening on port 81.
httpd_accel_port 81
And as described before, squid listens to port 80.
http_port 80
We do not use icp (icp used for cache sharing between neighbor machines), which is more relevant in the proxy mode.
icp_port 0
hierarchy_stoplist
defines a list of words which, if found in a URL, causes the object to be
handled directly by this cache. In other words, use this to not query
neighbor caches for certain objects. Note that I have configured the /cgi-bin
and /perl
aliases for my dynamic documents, if you named them in a different way,
make sure to use the correct aliases here.
hierarchy_stoplist /cgi-bin /perl
Now we tell squid not to cache dynamic pages.
acl QUERY urlpath_regex /cgi-bin /perl no_cache deny QUERY
Please note that the last two directives are controversial ones. If you
want your scripts to be more complying with the HTTP standards, the headers
of your scripts should carry the Caching Directives
according to the HTTP specs. You will find a complete tutorial about this
topic in Tutorial on HTTP Headers for mod_perl users
by Andreas J. Koenig (at http://perl.apache.org ). If you set the
headers correctly there is no need to tell squid accelerator to NOT
try to cache something. The headers I am talking about are
Last-Modified
and Expires
. What are they good for? Squid would not bother your mod_perl server a
second time if a request is (a) cachable and (b) still in the cache. Many
mod_perl applications will produce identical results on identical requests
at least if not much time goes by between the requests. So your squid might
have a hit ratio of 50%, which means that mod_perl servers will have as
twice as less work to do than before. This is only possible by setting the
headers correctly.
Even if you insert user-ID and date in your page, caching can save resources when you set the expiration time to 1 second. A user might double click where a single click would do, thus sending two requests in parallel, squid could serve the second request.
But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I described. But keep in mind that one day you will want to reread this snippet and the Andreas' tutorial and squeeze even more power from your servers without investing money for additional memory and better hardware.
While testing you might want to enable the debugging options and watch the
log files in /var/log/squid/
. But turn it off in your production server. I list it commented out. (28
== access control routes).
# debug_options ALL, 1, 28, 9
We need to provide a way for squid to dispatch the requests to the correct
servers, static object requests should be redirected to httpd_docs (unless
they are already cached), while dynamic should go to the httpd_perl server.
The configuration below tells squid to fire off 10 redirect daemons at the
specified path of the redirect daemon and disables rewriting of any Host:
headers in redirected requests (as suggested by squid's documentation). The
redirection daemon script is enlisted below.
redirect_program /usr/lib/squid/redirect.pl redirect_children 10 redirect_rewrites_host_header off
Maximum allowed request size in kilobytes. This one is pretty obvious. If you are using POST to upload files, then set this to the largest file's size plus a few extra kbytes.
request_size 1000 KB
Then we have access permissions, which I will not explain. But you might want to read the documentation so to avoid any security flaws.
acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl myserver src 127.0.0.1/255.255.255.255 acl SSL_ports port 443 563 acl Safe_ports port 80 81 8080 81 443 563 acl CONNECT method CONNECT http_access allow manager localhost http_access allow manager myserver http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports # http_access allow all
Since squid should be run as non-root user, you need these if you are invoking the squid as root.
cache_effective_user squid cache_effective_group squid
Now configure a memory size to be used for caching. A squid documentation warns that the actual size of squid can grow three times larger than the value you are going to set.
cache_mem 20 MB
Keep pools of allocated (but unused) memory available for future use. Read more about it in the squid documents.
memory_pools on
Now tight the runtime permissions of the cache manager CGI script (cachemgr.cgi
,that comes bundled with squid) on your production server.
cachemgr_passwd disable shutdown #cachemgr_passwd none all
Now the redirection daemon script (you should put it at the location you
have specified by redirect_program
parameter in the config file above, and make it executable by webserver of
course):
#!/usr/local/bin/perl $|=1; while (<>) { # redirect to mod_perl server (httpd_perl) print($_), next if s|(:81)?/perl/|:8080/perl/|o;
# send it unchanged to plain apache server (http_docs) print; }
In my scenario the proxy and the apache servers are running on the same
machine, that's why I just substitute the port. In the presented squid
configuration, requests that passed through squid are converted to point to
the localhost (which is 127.0.0.1
). The above redirector can be more complex of course, but you know the
perl, right?
A few notes regarding redirector script:
You must disable buffering. $|=1;
does the job. If you do not disable buffering, the STDOUT
will be flushed only when the buffer becomes full and its default size is
about 4096 characters. So if you have an average URL of 70 chars, only
after 59 (4096/70) requests the buffer will be flushed, and the requests
will finally achieve the server in target. Your users will just wait till
it will be filled up.
If you think that it is a very ineffective way to redirect, I'll try to prove you the opposite. The redirector runs as a daemon, it fires up N redirect daemons, so there is no problem with perl interpreter loading, exactly like mod_perl -- perl is loaded all the time and the code was already compiled, so redirect is very fast (not slower if redirector was written in C or alike). Squid keeps an open pipe to each redirect daemon, thus there is even no overhead of the expensive system calls.
Now it is time to restart the server, at linux I do it with:
/etc/rc.d/init.d/squid restart
Now the setup is complete ...
Almost... When you try the new setup, you will be surprised and upset to discover a port 81 showing up in the URLs of the static objects (like htmls). Hey, we did not want the user to see the port 81 and use it instead of 80, since then it will bypass the squid server and the hard work we went through was just a waste of time?
The solution is to run both squid and httpd_docs at the same port. This can
be accomplished by binding each one to a specific interface. Modify the httpd.conf
in the httpd_docs
configuration directory:
Port 80 BindAddress 127.0.0.1 Listen 127.0.0.1:80
Modify the squid.conf
:
http_port 80 tcp_incoming_address 123.123.123.3 tcp_outgoing_address 127.0.0.1 httpd_accel_host 127.0.0.1 httpd_accel_port 80
Where 123.123.123.3
should be replaced with IP of your main server. Now restart squid and
httpd_docs in either order you want, and voila the port number has gone.
You must also have in the /etc/hosts
an entry (most chances that it's already there):
127.0.0.1 localhost.localdomain localhost
Now if your scripts were generating HTML including fully qualified self references, using the 8080 or other port -- you should fix them to generate links to point to port 80 (which means not using the port at all). If you do not, users will bypass squid, like if it was not there at all, by making direct requests to the mod_perl server's port.
The only question left is what to do with users who bookmarked your services and they still have the port 8080 inside the URL. Do not worry about it. The most important thing is for your scripts to return a full URLs, so if the user comes from the link with 8080 port inside, let it be. Just make sure that all the consecutive calls to your server will be rewritten correctly. During a period of time users will change their bookmarks. What can be done is to send them an email if you have one, or to leave a note on your pages asking users to update their bookmarks. You could avoid this problem if you did not publish this non-80 port in first place. See Publishing port numbers different from 80.
<META> Need to write up a section about server logging with squid. One thing I sure would like to know is how requests are logged with this setup. I have, as most everyone I imagine, log rotation, analysis, archiving scripts and they all assume a single log. Does one have different logs that have to be merged (up to 3 for each server + squid) ? Even when squid responds to a request out of its cache I'd still want the thing to be logged. </META>
See Using mod_proxy for information about X-Forwarded-For
.
To save you some keystrokes, here is the whole modified squid.conf
:
http_port 80 tcp_incoming_address 123.123.123.3 tcp_outgoing_address 127.0.0.1 httpd_accel_host 127.0.0.1 httpd_accel_port 80 icp_port 0 hierarchy_stoplist /cgi-bin /perl acl QUERY urlpath_regex /cgi-bin /perl no_cache deny QUERY # debug_options ALL,1 28,9 redirect_program /usr/lib/squid/redirect.pl redirect_children 10 redirect_rewrites_host_header off request_size 1000 KB acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl myserver src 127.0.0.1/255.255.255.255 acl SSL_ports port 443 563 acl Safe_ports port 80 81 8080 81 443 563 acl CONNECT method CONNECT http_access allow manager localhost http_access allow manager myserver http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports # http_access allow all cache_effective_user squid cache_effective_group squid cache_mem 20 MB memory_pools on cachemgr_passwd disable shutdown
Note that all directives should start at the beginning of the line.
When I was first told about squid, I thought: ``Hey, Now I can drop the
httpd_docs
server and to have only squid and httpd_perl
servers``. Since all my static objects will be cached by squid, I do not
need the light httpd_docs
server. But it was a wrong assumption. Why? Because you still have the
overhead of loading the objects into squid at first time, and if your site
has many of them -- not all of them will be cached (unless you have devoted
a huge chunk of memory to squid) and my heavy mod_perl servers will still
have an overhead of serving the static objects. How one would measure the
overhead? The difference between the two servers is memory consumption,
everything else (e.g. I/O) should be equal. So you have to estimate the
time needed for first time fetching of each static object at a peak period
and thus the number of additional servers you need for serving the static
objects. This will allow you to calculate additional memory requirements. I
can imagine, this amount could be significant in some installations.
So I have decided to have even more administration overhead and to stick with squid, httpd_docs and httpd_perl scenario, where I can optimize and fine tune everything. Of course this can be not your case. If you are feeling that the scenario from the previous section is too complicated for you, make it simpler. Have only one server with mod_perl built in and let the squid to do most of the job that plain light apache used to do. As I have explained in the previous paragraph, you should pick this lighter setup only if you can make squid cache most of your static objects. If it cannot, your mod_perl server will do the work we do not want it to.
If you are still with me, install apache with mod_perl and squid. Then use
a similar configuration from the previous section, but now httpd_docs is
not there anymore. Also we do not need the redirector anymore and we
specify httpd_accel_host
as a name of the server and not virtual
. There is no need to bind two servers on the same port, because we do not
redirect and there is neither Bind
nor Listen
directives in the httpd.conf
anymore.
The modified configuration (see the explanations in the previous section):
httpd_accel_host put.your.hostname.here httpd_accel_port 8080 http_port 80 icp_port 0 hierarchy_stoplist /cgi-bin /perl acl QUERY urlpath_regex /cgi-bin /perl no_cache deny QUERY # debug_options ALL, 1, 28, 9 # redirect_program /usr/lib/squid/redirect.pl # redirect_children 10 # redirect_rewrites_host_header off request_size 1000 KB acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl myserver src 127.0.0.1/255.255.255.255 acl SSL_ports port 443 563 acl Safe_ports port 80 81 8080 81 443 563 acl CONNECT method CONNECT http_access allow manager localhost http_access allow manager myserver http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports # http_access allow all cache_effective_user squid cache_effective_group squid cache_mem 20 MB memory_pools on cachemgr_passwd disable shutdown
To build it into apache just add --enable-module=proxy during the apache configure stage.
Now we will talk about apache's mod_proxy and understand how it works.
The server on port 80 answers http requests directly and proxies the mod_perl enabled server in the following way:
ProxyPass /modperl/ http://localhost:81/modperl/ ProxyPassReverse /modperl/ http://localhost:81/modperl/
PPR
is the saving grace here, that makes apache a win over Squid. It rewrites
the redirect on its way back to the original URI.
You can control the buffering feature with ProxyReceiveBufferSize
directive:
ProxyReceiveBufferSize 1048576
The above setting will set a buffer size to be of 1Mb. If it is not set
explicitly, then the default buffer size is used, which depends on OS, for
Linux I suspect it is somewhere below 32k. So basically to get an immediate
release of the mod_perl server from stale awaiting,
ProxyReceiveBufferSize
should be set to a value greater than the biggest generated respond
produced by any mod_perl script.
The ProxyReceiveBufferSize
directive specifies an explicit buffer size for outgoing HTTP and FTP connections. It has to be greater than 512 or set to 0 to
indicate that the system's default buffer size should be used.
As the name states, its buffering feature applies only to downstream data (coming from the origin server to the proxy) and not upstream (i.e. buffering the data being uploaded from the client browser to the proxy, thus freeing the httpd_perl origin server from being tied up during a large POST such as a file upload).
Apache does caching as well. It's relevant to mod_perl only if you produce proper headers, so your scripts' output can be cached. See apache documentation for more details on configuration of this capability.
Ask Bjoern Hansen has written a mod_proxy_add_forward
module for apache, that sets the X-Forwarded-For
field when doing a
ProxyPass
, similar to what squid can do. (Its location is specified in the help
section). Basically, that module adds an extra HTTP header to proxying
requests. You can access that header in the mod_perl-enabled server, and
set the IP of the remote server. You won't need to compile anything into
the back-end server, if you are using Apache::{Registry,PerlRun}
just put something like the following into start-up.pl
:
sub My::ProxyRemoteAddr ($) { my $r = shift; # we'll only look at the X-Forwarded-For header if the requests # comes from our proxy at localhost return OK unless ($r->connection->remote_ip eq "127.0.0.1"); if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) { $r->connection->remote_ip($ip); } return OK; }
And in httpd.conf
:
PerlPostReadRequestHandler My::ProxyRemoteAddr
Different sites have different needs. If you're using the header to set the
IP address, apache believes it is dealing with (in the logging and stuff),
you really don't want anyone but your own system to set the header. That's
why the above ``recommended code'' checks where the request is really
coming from, before changing the remote_ip
.
Generally you shouldn't trust the X-Forwarded-For
header. You only want to rely on X-Forwarded-For
headers from proxies you control yourself. If you know how to spoof a
cookie you've probably got the general idea on making HTTP headers and can
spoof the
X-Forwarded-For
header as well. The only address *you* can count on as being a reliable
value is the one from
r->connection->remote_ip
.
From that point on, the remote IP address is correct. You should be able to
access REMOTE_ADDR
as usual.
You could do the same thing with other environment variables (though I think several of them are preserved, you will want to run some tests to see which ones).
To build the mod_perl as DSO add USE_DSO=1
to the rest of configuration parameters (to build libperl.so
instead of
libperl.a
), like:
perl Makefile.PL USE_DSO=1 ...
If you run ./configure
from apache source do not forget to add:
--enable-shared=perl
Then just add the LoadModule
directive into your httpd.conf
.
You will find a complete explanation in the INSTALL.apaci
pod which can be found in the mod_perl distribution.
Some people reported that DSO compiled mod_perl would not run on specific OS/perl version. Also threads enabled perl reported sometimes to break the mod_perl/DSO. But it still can work for you.
Assuming that you have a setup of one ``front-end'' server, which proxies the ``back-end'' (mod_perl) server, if you need to perform the authentication in the ``back-end'' server, it should handle all authentication itself. If apache proxies correctly, it seems like it would pass through all authentication information, making the ``front-end'' apache somewhat ``dumb'', as it does nothing, but passes through all the information.
The only possible caveat in the config file is that your Auth
stuff needs to be in <Directory ...
> ... </Directory
> tags because if you use a <Location /...
> ... </Location
> the proxypass server takes the auth info for its own authentication
and would not pass it on.
The same with mod_ssl, if plugged into a front-end server, all the SSL requests be encoded/decoded properly by it.
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Make sure you have perl installed -- the newer stable version you have the
better (minimum perl.5.004!). If you don't have it -- install it. Follow
the instructions in the distribution's INSTALL
file. During the configuration stage (while running ./Configure
), make sure you answer YES
to the question:
Do you wish to use dynamic loading? [y]
Answer y
to be able to load dynamically Perl Modules extensions.
It is a good idea to try to install the apache webserver without mod_perl first. This way, if something goes wrong, you will know that it's not the apache server's problem. But you can skip this stage if you already have a working (non-mod_perl) apache server, or if you are just the daring type. In any case you should unpack the apache source distribution, preferably at the same level as the mod_perl distribution.
% ls -l /usr/src drwxr-xr-x 8 stas bar 2048 Oct 6 09:46 apache_x.x.x/ drwxr-xr-x 19 stas bar 4096 Oct 2 14:33 mod_perl-x.xx/
Now we come to the main point of this document.
Here I will give only a short example of mod_perl installation. You should read the Real World Scenarios Implementaion for a more complete description.
As with any perl package, the installation of mod_perl is very easy and
standard. perldoc INSTALL
will guide you through the configuration and the installation processes.
The fastest way to install would be:
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \ DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1 % make && make test && make install
Note: replace x.x.x with the version numbers you actually use.
To change the installation target (either if you are not root
or you need to install a second copy for testing purposes), assuming you
use /foo/server
as a base directory, you have to run this:
% perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \ DO_HTTPD=1 PERL_MARK_WHERE=1 EVERYTHING=1 \ APACHE_PREFIX=/foo/server PREFIX=/foo/server
Where PREFIX
specifies where to install the perl modules,
APACHE_PREFIX
-- the same for the apache files.
The next step is to configure the mod_perl sections of the apache configuration file. (See ModPerlConfiguration).
Fire up the server with /foo/server/sbin/apachectl start
, Look for the error reports at the error_log
file in case the server does not start up (No error message will be printed
to the console!).
There are a few ways. In older versions of apache ( < 1.3.6 ?) you could check that by running httpd -v
, it no longer works. Now you should use httpd -l
. Please notice that it is not enough to have it installed - you should of
course configure it for mod_perl and restart the server.
When starting the server, just check the error_log
file for the following message:
[Thu Dec 3 17:27:52 1998] [notice] Apache/1.3.1 (Unix) mod_perl/1.15 configured ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- resuming normal operations
Assuming that you have configured the <Location /perl-status
> section in the server configuration file fetch: http://www.nowhere.com/perl-status
using your favorite Netscape browser :-)
You should see something like this:
Embedded Perl version 5.00502 for Apache/1.3.1 (Unix) mod_perl/1.19 process 50880, running since Tue Oct 6 14:31:45 1998
Knowing the port you have configured apache to listen on, you can use
telnet
to talk directly to it.
Assuming that your mod_perl enabled server listens to port 8080, telnet to
your server at port 8080, and type HEAD / HTTP/1.0
then press the <ENTER> key TWICE:
% telnet localhost 8080<ENTER> HEAD / HTTP/1.0<ENTER><ENTER>
You should see a response like this:
HTTP/1.1 200 OK Date: Tue, 01 Dec 1998 12:27:52 GMT Server: Apache/1.3.6 (Unix) mod_perl/1.19 Connection: close Content-Type: text/html Connection closed.
The line: Server: Apache/1.3.6 (Unix) mod_perl/1.19
--confirms that you do have mod_perl installed and its version is 1.19
. Of course in your case it would be the version you have installed.
However, just because you have got mod_perl linked in there, that does not mean that you have configured your server to handle Perl scripts with mod_perl. You will find the configuration assistance at ModPerlConfiguration
Another method is to invoke a CGI script which dumps the server's environment.
Copy and paste the script below (no need for the first perl calling
(shebang) line!). Let's say you named it test.pl
, saved it at the root of the CGI scripts and CGI root is mapped directly
to the
/perl
location of your server.
print "Content-type: text/html\n\n"; print "Server's environment<P>\n"; print "<TABLE>"; foreach ( keys %ENV ) { print "<TR><TD>$_ </TD><TD>$ENV{$_}</TR></TD>"; } print "</TABLE>";
Make it readable and executable by server:
% chmod a+rx test.pl
(you will want to tune permissions on the public host).
Now fetch the URL http://www.nowhere.com:8080/perl/test.pl
(replace 8080 with the port your mod_perl enabled server is listening to.
You should see something like this (the generated output was trimmed):
SERVER_SOFTWARE Apache/1.3.6 (Unix) mod_perl/1.19 GATEWAY_INTERFACE CGI-Perl/1.1 REQUEST_METHOD GET HTTP_ACCEPT image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* MOD_PERL 1.19 REQUEST_URI /perl/test.pl SCRIPT_NAME /perl/test.pl [...snipped]
Now if I run the same script in mod_cgi mode (configured with
/cgi-bin
Alias) (you will need to add the perl invocation line
#!/bin/perl
for the above script) and fetch
http://www.nowhere.com/cgi-bin/test.pl
.
SERVER_SOFTWARE Apache/1.3.6 (Unix) GATEWAY_INTERFACE CGI/1.1 [...snipped]
You will see that two variables, SERVER_SOFTWARE
and
GATEWAY_INTERFACE
, are different from the case above. This gives you a hint of how to tell
in what mode you are running in your cgi scripts. I start all my cgi
scripts that are mod_perl aware with:
BEGIN { # Auto-detect if we are running under mod_perl or CGI. $USE_MOD_PERL = ((exists $ENV{'GATEWAY_INTERFACE'} and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/) or exists $ENV{'MOD_PERL'} ); # perl5.004 is a must under mod_perl require 5.004 if $USE_MOD_PERL; }
You might wonder why in the world you would need to know in what mode you
are running. For example you will want to use Apache::exit()
and not CORE::exit()
in your modules, but if you think that your script might be used in both
environments (mod_cgi vs. mod_perl), you will have to override the exit()
subroutine and to make the runtime decision of what method you will use.
Not that if you run scripts under Apache::Registry
handler, it takes care of overriding the
exit()
call for you, so it's not an issue if this is your case. For reasons and
implementations see: Using exit() and the whole Writing Mod Perl scripts and Porting plain CGIs to it page.
Yet another one. Why do I show all these approaches? While here they are serving a very simple purpose, they can be helpful in other situations.
Assuming you have the libwww-perl
(LWP
) package installed (you will need it installed in order to pass mod_perl's make test
anyway):
% lwp-request -e -d http://www.nowhere.com
Will show you all the headers. (The -d
option disables printing the response content.)
% lwp-request -e -d http://www.nowhere.com | egrep '^Server:'
To see the server's version only.
Use http://www.nowhere.com:port_number
if your server is listening to a non-default 80 port.
Yes, no problem with that. Follow the installation instructions and when
you encounter APACI_ARGS
use your home directory (or some other directory which you have write
access to) as a prefix, (e.g. /home/stas/www
), and everything will be installed there. There is a chance that some perl
libs will be not installed on your server by root and you will have to
install these locally too. See the http://www.singlesheaven.com/stas/TULARC/webmaster/myfaq.html#7
for more information on local perl installations.
So you end up with something like:
$ gunzip <apache_x.x.xx.tar.gz | tar xvf - $ gunzip <mod_perl-x.xx.tar.gz | tar xvf - $ cd mod_perl-x.xx $ perl Makefile.PL \ APACHE_SRC=../apache-1.3.X/src \ DO_HTTPD=1 \ USE_APACI=1 \ EVERYTHING=1 \ APACI_ARGS=--sbindir=/home/stas/sbin/httpd_perl, \ --sysconfdir=/home/stas/etc/httpd_perl, \ --localstatedir=/home/stas/var/httpd_perl, \ --runtimedir=/home/stas/var/httpd_perl/run, \ --logfiledir=/home/stas/var/httpd_perl/logs, \ --proxycachedir=/home/stas/var/httpd_perl/proxy
You will not be able to have the server listen to a port lower then 1024 if
you are not starting it as root
, so choose a port number above 1024. (I use 8080 in most cases). Note that
you will have to use a URL like http://www.nowhere.com:8080
in that case, but that is not a problem since usually users do not directly
access URLs to CGI scripts, but rather are directed to them from a link on
a web page or as the 'ACTION
' of a HTML form, so they should not know at all that the port is different
from the default port 80.
If you want your apache server to start automatically on system reboot, you
will need to invoke the server startup script from somewhere within the
init scripts on your host. This is often somewhere under /etc/rc.d
, but this path can vary depending upon the flavor of Unix you are using.
One more important thing to keep in mind is system resources. mod_perl is memory hungry -- if you run a lot of mod_perl processes on a public, multiuser (not dedicated) machine -- most likely the system administrator of the host will ask you to use less resources and even to shut down your mod_perl server and to find another home for it. You have a few solutions:
Reduce resources usage (see Limiting the size of the processes).
Ask your ISP if you can put a dedicated machine into their computer room and be root there.
Look for another ISP with lots of resources or one that supports mod_perl. You can find a list of these ISP at http://perl.apache.org .
Sure, you can take a look at the symbols inside the httpd executable. e.g. if you want to see whether you have enabled PERL_AUTH=1 while building the mod_perl, you do:
nm httpd | grep perl_authenticate
It is possible to determine which options were given to modperl's
Makefile.PL
during the configuration stage, so to be used later in recreating the same
build tree when rebuilding the server. This is relevant only if you did not
use the default config parameters and altered some of them during the
configuration stage.
I was into this problem many times. I am going to build something by passing some non-default parameters to the config script and then later when I need to rebuild the tool either to upgrade it or to make an identical copy at another machine, I would find that I do not remember what parameters I altered.
The best solution for this problem is to prepare the run file with all the parameters that are about to be used and then run it instead of typing it all by hand. So later I will have the script handy to be reused.
mod_perl suggests using the makepl_args.mod_perl
file which comes with mod_perl distribution. This is the file where you
should specify all the parameters you are going to use.
But if you have found yourself with a compiled tool and no traces of the
specified parameters left, usually you can still find them out, if the
sources were not make clean
'd. You will find the apache specific parameters in apache_x.x.x/config.status
and modperl's at in mod_perl_x.xx/apaci/mod_perl.config
.
While doing make test
you would notice that some of the tests are being reported as skipped. The
real reason is that you are missing some optional modules for these test to
be passed. For a hint you might want to peek at the content of each test
(you will find them all in the ./t
directory (mnemonic - t, tests). I'll list a few examples, but of course
the requirements might be changed in the future.
> modules/cookie......skipping test on this platform
install libapreq
> modules/psections...skipping test on this platform
install Devel::Symdump / Data::Dumper
> modules/request.....skipping test on this platform
libapreq
> modules/sandwich....skipping test on this platform
Apache::Sandwich
> modules/stage.......skipping test on this platform
Apache::Stage
> modules/symbol......skipping test on this platform
Devel::Symdump
Chances are that all of these are installed if you use CPAN.pm
to
install Bundle::Apache
.
There are two configuration parameters: PREP_HTTPD
and DO_HTTPD
, that you can use in:
perl Makefile.PL [options]
DO_HTTPD=1
means default to 'y
' for the two apache's
configure
utility prompts: (a) 'which source tree to configure against' and (b)
'whether to build the httpd in that tree'. PREP_HTTPD=1
just means default 'n
' to the second prompt -- meaning, do not build httpd (make) in the
apache source tree. In other words if you use PREP_HTTPD=1
the httpd will be not build. It will be build only if you use
DO_HTTPD=1
option and not use PREP_HTTPD=1
.
If you did not build the httpd, chdir to the apache source, and execute:
make
Then return to the mod_perl source and run:
make test make install
Note that you would have to do the same if you do not pass
APACHE_PREFIX=/path_to_installation_prefix
during the perl
Makefile.PL [options]
stage.
You will see this message when you try to run a httpd, if you have had a
stale old apache header layout in one of the include
paths during the build process. Do run find
(or locate
) utility in order to locate ap_mmn.h
file. In my case I have had a
/usr/local/include/ap_mmn.h
which was installed by RedHat install process. If this is the case get rid
of it, and rebuild it again.
For all RH fans, before you are going to build the apache by yourself, do:
rpm -e apache
to remove the pre-installed package first!
Yes, you should. You have to rebuild mod_perl enabled server since it has a
hard coded @INC
which points to the old perl and it is is probably linked to the an old libperl
library. You can try to modify the @INC
in the startup script (if you keep the old perl version around), but it is
better to build a fresh one to save you a mess.
If you are a user of mod_auth_dbm or mod_auth_db, you may need to edit Perl's Config
module. When Perl is configured it attempts to find libraries for ndbm,
gdbm, db, etc., for the *DBM*_File modules. By default, these libraries are
linked with Perl and remembered by the Config module. When mod_perl is configured with apache, the ExtUtils::Embed module returns these libraries to be linked with httpd so Perl extensions
will work under mod_perl. However, the order in which these libraries are
stored in
Config.pm, may confuse mod_auth_db*
. If mod_auth_db*
does not work with mod_perl, take a look at this order with the following
command:
% perl -V:libs
If -lgdbm
or -ldb
is before -lndbm
, example:
libs='-lnet -lnsl_s -lgdbm -lndbm -ldb -ldld -lm -lc -lndir -lcrypt';
Edit Config.pm and move -lgdbm
and -ldb
to the end of the list. Here's how to find Config.pm:
% perl -MConfig -e 'print "$Config{archlibexp}/Config.pm\n"'
Another solution for building Apache/mod_perl+mod_auth_dbm under Solaris is to remove the DBM and NDBM ``emulation'' from libgdbm.a. Seems Solaris already provides its own DBM and NDBM, and there's no reason to build GDBM with them (for us anyway).
In our Makefile for GDBM, we changed
OBJS = $(DBM_OF) $(NDBM_OF) $(GDBM_OF)
to
OBJS = $(GDBM_OF)
Rebuild libgdbm, then Apache/mod_perl.
Since most of the functionality that various apache mod_* modules provide
is being implemented in Apache::{*}
perl modules, it was reported that one can build an apache server with
mod_perl only. If you can reduce the problems down to whatever mod_perl can
handle, you can eliminate nearly every other module. Then basically you
will have a perl-server, with C code to handle the tricky HTTP bits. The
only module you will need to leave in is a mod_actions
.
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
The next step after building and installing your new mod_perl enabled
apache server, is to configure the server. To learn how to modify apache's
configuration files, please refer to the documentation included with the
apache distribution, or just view the files in
conf
directory and follow the instructions in these files - the embedded
comments within the file do a good job of explaining the options.
Before you start with mod_perl specific configuration, first configure apache, and see that it works. When done, return here to continue...
[ Note that prior to version 1.3.4, the default apache install used three configuration files -- httpd.conf, srm.conf, and access.conf. The 1.3.4 version began distributing the configuration directives in a single file -- httpd.conf. The remainder of this chapter refers to the location of the configuration directives using their historical location. ]
First, you need to specify the locations on a file-system for the scripts to be found.
Add the following configuration directives:
# for plain cgi-bin: ScriptAlias /cgi-bin/ /usr/local/myproject/cgi/ # for Apache::Registry mode Alias /perl/ /usr/local/myproject/cgi/ # Apache::PerlRun mode Alias /cgi-perl/ /usr/local/myproject/cgi/
Alias
provides a mapping of URL to file system object under
mod_perl
. ScriptAlias
is being used for mod_cgi
.
Alias defines the start of the URL path to the script you are referencing.
For example, using the above configuration, fetching
http://www.nowhere.com/perl/test.pl
, will cause the server to look for the file test.pl
at /usr/local/myproject/cgi
, and execute it as an Apache::Registry
script if we define Apache::Registry
to be the handler of /perl
location (see below). The URL
http://www.nowhere.com/perl/test.pl
will be mapped to
/usr/local/myproject/cgi/test.pl
. This means that you can have all your CGIs located at the same place in
the file-system, and call the script in any of three modes simply by
changing the directory name component of the URL (cgi-bin|perl|cgi-perl
) - is not this neat? (That is the configuration you see above - all three
Aliases point to the same directory within your file system, but of course
they can be different). If your script does not seem to be working while
running under mod_perl, you can easily call the script in straight mod_cgi
mode without making any script changes (in most cases), but rather by
changing the URL you invoke it by.
FYI: for modperl ScriptAlias
is the same thing as:
Alias /foo/ /path/to/foo/ SetHandler cgi-handler
where SetHandler cgi-handler
invokes mod_cgi. The latter will be overwritten if you enable Apache::Registry
. In other words,
ScriptAlias
does not work for mod_perl, it only appears to work when the additional
configuration is in there. If the
Apache::Registry
configuration came before the ScriptAlias
, scripts would be run under mod_cgi. While handy, ScriptAlias
is a known kludge, always better to use Alias
and SetHandler
.
Of course you can choose any other Alias (you will use it later in
httpd.conf
), you can choose to use all three modes or only one of these. It is
undesirable to run scripts in plain mod_cgi from a mod_perl-enabled server
- the price is too high, it is better to run these on plain apache server.
(See Standalone mod_perl Enabled Apache Server)
Now we will work with the httpd.conf
file. I add all the mod_perl stuff at the end of the file, after the native
apache configurations.
First we add:
<Location /perl> #AllowOverride None SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI allow from all PerlSendHeader On </Location>
This configuration causes all scripts that are called with a /perl
path prefix to be executed under the Apache::Registry
module and as a CGI (so the ExecCGI, if you omit this option the script will be printed to the caller's
browser as a plain text or possibly will trigger a 'Save-As' window).
PerlSendHeader On
tells the server to send an HTTP header to the browser on every script
invocation. You will want to turn this off for nph (non-parsed-headers)
scripts. PerlSendHeader On
means to call
ap_send_http_header()
after parsing your script headers. It is only meant for CGI emulation, its
always better to use CGI->header
from CGI.pm
module or $r->send_http_header
directly.
Remember the Alias from the section above? We must use the same
Alias
here, if you use Location
that does not have the same
Alias
defined in srm.conf
, the server will fail to locate the script in the file system. (We are
talking about script execution here -- there are cases where Location
is something that is being executed by the server itself, without having
the corresponding file, like /perl-status
location.)
Note that sometimes you will have to add :
PerlModule Apache::Registry
before you specify the location that uses Apache::Registry
as a
PerlHandler
. Basically you can start running the scripts in the
Apache::Registry
mode...
You have nothing to do about /cgi-bin
location (mod_cgi), since it has nothing to do with mod_perl.
Here is a similar location configuration for Apache::PerlRun
(More about Apache::PerlRun):
<Location /cgi-perl> #AllowOverride None SetHandler perl-script PerlHandler Apache::PerlRun Options ExecCGI allow from all PerlSendHeader On </Location>
You may load modules from the config file at server startup via:
PerlModule Apache::DBI CGI DBD::Mysql
There is a limit of 10 PerlModule
's, if you need more to be loaded when the server starts, use one PerlModule
to pull in many or write them all in a regular perl syntax and put them
into a startup file which can be loaded with use of the PerlRequire
directive.
PerlRequire /home/httpd/perl/lib/startup.pl
Both PerlModule
and PerlRequire
are implemented by require(),
but there is a subtle change. PerlModule
works like use(),
expecting a module name without .pm
extension and slashes.
Apache::DBI
is OK, while Apache/DBI.pm
is not. PerlRequire
is the opposite to PerlModule
-- it expects a relative or full path to the module or a filename, like in
the example above.
As with any file that's being required()
-- it must return a true
value, to ensure that this happens don't forget to add 1;
at the end of such files.
We must stress that all the code that is run at the server initialization
time is run with root priveleges if you are executing it as a root user
(you have to unless you choose an unpriveledged port, above 1024.
somethings that you might have to if you don't have a root access. Just
remember that you better pick a well known port like 8000 or 8080 since
other non-standard ports might be blocked by firewalls that protect many
organizations and individuals). This means that anyone who has write access
to a script or module that is loaded by PerlModule
or PerlRequire
, effectively has root access to the system. You might want to take a look
at the new and experimental PerlOpmask
directive and PERL_OPMASK_DEFAULT
compile time option to try to disable some dangerous operators.
As you know Apache specifies about 11 phases of the request loop, namely in that order: Post-Read-Request, URI Translation, Header Parsing, Access Control, Authentication, Authorization, MIME type checking, FixUp, Response (Content phase). Logging and finally Cleanup. These are the stages of a request where the Apache API allows a module to step in and do something. There is a dedicated PerlHandler for each of these stages. Namely:
PerlChildInitHandler PerlPostReadRequestHandler PerlInitHandler PerlTransHandler PerlHeaderParserHandler PerlAccessHandler PerlAuthenHandler PerlAuthzHandler PerlTypeHandler PerlFixupHandler PerlHandler PerlLogHandler PerlCleanupHandler PerlChildExitHandler
The first 4 handlers cannot be used in the <Location
>,
<Directory
>, <Files
> and .htaccess
file, the main reason is all the above require a known path to the file in
order to bind a requested path with one or more of the identifiers above.
Starting from PerlHeaderParserHandler
(5th) URI is allready being mapped to a physical pathname, thus can be used
to match the <Location
>,
<Directory
> or <Files
> configuration section, or to look at
.htaccess
file if exists at the specified directory in the translated path.
The Apache documentation (or even better -- the ``Writing Apache Modules with Perl and C'' book by Doug MacEachern and Lincoln Stein) will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling these hooks.
Note that by default Perl API expects a subrotine called handler
to handle the request in the registered PerlHandler module. Thus if your
module implements this subrotine, you can register the handler as simple as
writing:
Perl*Handler Apache::SomeModule
replace Perl*Handler with a wanted name of the handler. mod_perl will preload the specified
module for you. But if you decide to give the handler code a different
name, like my_handler
, you must preload the module and to write explicitly the chosen name.
PerlModule Apache::SomeModule Perl*Handler Apache::SomeModule::my_handler
Please note that the former approach will not preload the module at the
startup, so either explicitly preload it with PerlModule
directive, add it to the startup file or use a nice shortcut the
Perl*Handler
syntax suggests:
Perl*Handler +Apache::SomeModule
Notice the leading +
character. It's equal to:
PerlModule Apache::SomeModule Perl*Handler Apache::SomeModule
If a module wishes to know what handler is currently being run, it can find out with the current_callback method. This method is most useful to PerlDispatchHandlers who wish to only take action for certain phases.
if($r->current_callback eq "PerlLogHandler") { $r->warn("Logging request"); }
With the mod_perl stacked handlers mechanism, it is possible for more than
one Perl*Handler
to be defined and run during each stage of a request.
Perl*Handler directives can define any number of subroutines, e.g. (in config files)
PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans
With the method, Apache->push_handlers()
, callbacks can be added to the stack by scripts at runtime by mod_perl
scripts.
Apache->push_handlers()
takes the callback hook name as its first argument and a subroutine name or
reference as its second. e.g.:
Apache->push_handlers("PerlLogHandler", \&first_one); $r->push_handlers("PerlLogHandler", sub { print STDERR "__ANON__ called\n"; return 0; });
After each request, this stack is cleared out.
All handlers will be called unless a handler returns a status other than OK
or DECLINED
.
example uses:
CGI.pm
maintains a global object for its plain function interface. Since the
object is global, it does not go out of scope, DESTROY is never called. CGI->new
can call:
Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals);
This function will be called during the final stage of a request,
refreshing CGI.pm
's globals before the next request comes in.
Apache::DCELogin
establishes a DCE login context which must exist for the lifetime of a
request, so the DCE::Login
object is stored in a global variable. Without stacked handlers, users must
set
PerlCleanupHandler Apache::DCELogin::purge
in the configuration files to destroy the context. This is not
``user-friendly''. Now, Apache::DCELogin::handler
can call:
Apache->push_handlers("PerlCleanupHandler", \&purge);
Persistent database connection modules such as Apache::DBI
could push a PerlCleanupHandler
handler that iterates over %Connected
, refreshing connections or just checking that ones have not gone stale.
Remember, by the time we get to PerlCleanupHandler
, the client has what it wants and has gone away, we can spend as much time
as we want here without slowing down response time to the client (but the
process is unavailable for serving new request befor the operation is
completed).
PerlTransHandlers
may decide, based on URI or other condition, whether or not to handle a
request, e.g. Apache::MsqlProxy
. Without stacked handlers, users must configure:
PerlTransHandler Apache::MsqlProxy::translate PerlHandler Apache::MsqlProxy
PerlHandler
is never actually invoked unless translate()
sees the request is a proxy request ($r->proxyreq
), if it is a proxy request, translate()
sets $r->handler("perl-script")
, only then will PerlHandler
handle the request. Now, users do not have to specify PerlHandler Apache::MsqlProxy
, the translate()
function can set it with push_handlers()
.
Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!):
PerlHandler My::Header Some::Body A::Footer
A little test:
#My.pm package My;
sub header { my $r = shift; $r->content_type("text/plain"); $r->send_http_header; $r->print("header text\n"); } sub body { shift->print("body text\n") } sub footer { shift->print("footer text\n") } 1; __END__
#in config <Location /foo> SetHandler "perl-script" PerlHandler My::header My::body My::footer </Location>
Parsing the output of another PerlHandler? this is a little more tricky, but consider:
<Location /foo> SetHandler "perl-script" PerlHandler OutputParser SomeApp </Location> <Location /bar> SetHandler "perl-script" PerlHandler OutputParser AnotherApp </Location>
Now, OutputParser goes first, but it untie()'s
*STDOUT
and re-tie()'s to its own package like so:
package OutputParser;
sub handler { my $r = shift; untie *STDOUT; tie *STDOUT => 'OutputParser', $r; } sub TIEHANDLE { my($class, $r) = @_; bless { r => $r}, $class; } sub PRINT { my $self = shift; for (@_) { #do whatever you want to $_ $self->{r}->print($_ . "[insert stuff]"); } }
1; __END__
To build in this feature, configure with:
% perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc]
Another method Apache->can_stack_handlers
will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1
, FALSE otherwise.
If a Perl*Handler is prototyped with $$
, this handler will be invoked as method. e.g.
package My; @ISA = qw(BaseClass); sub handler ($$) { my($class, $r) = @_; ...; } package BaseClass; sub method ($$) { my($class, $r) = @_; ...; } __END__
Configuration:
PerlHandler My
or
PerlHandler My->handler
Since the handler is invoked as a method, it may inherit from other classes:
PerlHandler My->method
In this case, the My
class inherits this method from BaseClass
.
To build in this feature, configure with:
% perl Makefile.PL PERL_METHOD_HANDLERS=1 [PERL_FOO_HOOK=1,etc]
To reload PerlRequire
, PerlModule
, other use()
'd modules and flush the Apache::Registry
cache on server restart, add:
PerlFreshRestart On Make sure you read L<Evil things might happen when using PerlFreshRestart|warnings/Evil_things_might_happen_when_us>.
A very useful feature. You can watch what happens to the perl guts of the server. Below you will find the instructions of configuration and usage of this feature
Add this to httpd.conf
:
<Location /perl-status> SetHandler perl-script PerlHandler Apache::Status order deny,allow #deny from all #allow from </Location>
If you are going to use Apache::Status
, it's important to put it as a first module in the start-up file, or in
the httpd.conf
(after
Apache::Registry
):
# startup.pl use Apache::Registry (); use Apache::Status (); use Apache::DBI ();
If you don't put Apache::Status
before Apache::DBI
then you don't get Apache::DBI
's menu entry in status.
Assuming that your mod_perl server listens to port 81, fetch http://www.nowhere.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 process 187138, running since Thu Nov 19 09:50:33 1998
This is the linked menu that you should see:
Signal Handlers Enabled mod_perl Hooks PerlRequire'd Files Environment Perl Section Configuration Loaded Modules Perl Configuration ISA Tree Inheritance Tree Compiled Registry Scripts Symbol Table Dump
Let's follow for example : PerlRequire'd Files -- we see:
PerlRequire Location /usr/myproject/lib/apache-startup.pl /usr/myproject/lib/apache-startup.pl
From some menus you can continue deeper to peek at the perl internals of the server, to watch the values of the global variables in the packages, to the list of cached scripts and modules and much more. Just click around...
Sometimes when you fetch /perl-status
you and follow the Compiled
Registry Scripts link from the status menu -- you see no listing of scripts at all. This is
absolutely correct -- Apache::Status
shows the registry scripts compiled in the httpd child which is serving
your request for /perl-status
. If a child has not compiled yet the script you are asking for, /perl-status
will just show you the main menu. This usually happens when the child was
just spawned.
PerlSetEnv key val PerlPassEnv key
PerlPassEnv
passes, PerlSetEnv
sets and passes the
ENVironment variables to your scripts. you can access them in your scripts through %ENV
(e.g. $ENV{"key"}
).
Regarding the setting of PerlPassEnv PERL5LIB
in httpd.conf If you turn on taint checks (PerlTaintMode On
), $ENV{PERL5LIB}
will be ignored (unset).
PerlSetVar
is very similar to PerlSetEnv
, but you extract it with another method. In <Perl> sections:
push @{ $Location{"/"}->{PerlSetVar} }, [ 'FOO' => BAR ];
and in the code you read it with:
my $r = Apache->request; print $r->dir_config('FOO');
Since many times you have to add many perl directives to the configuration
file, it can be a good idea to put all of these into a one file, so the
configuration file will be cleaner. Add the following line to httpd.conf
:
# startup.perl loads all functions that we want to use within # mod_perl Perlrequire /path/to/startup.pl
before the rest of the mod_perl configuration directives.
Also you can call perl -c perl-startup
to test the file's syntax. What does this take?
An example of perl-startup file:
use strict; # extend @INC if needed use lib qw(/dir/foo /dir/bar); # make sure we are in a sane environment. $ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/ or die "GATEWAY_INTERFACE not Perl!"; # for things in the "/perl" URL use Apache::Registry; #load perl modules of your choice here #this code is interpreted *once* when the server starts use LWP::UserAgent (); use DBI (); # tell me more about warnings use Carp (); $SIG{__WARN__} = \&Carp::cluck; # Load CGI.pm and call its compile() method to precompile # (but not to import) its autoloaded methods. use CGI (); CGI->compile(':all');
Note that starting with $CGI::VERSION
2.46, the recommended method to precompile the code in CGI.pm
is:
use CGI qw(-compile :all);
But the old method is still available for backward compatibility.
See also Apache::Status
Modules that are being loaded at the server startup will be shared among
server children, so only one copy of each module will be loaded, thus
saving a lot of RAM for you. Usually I put most of the code I develop into
modules and preload them from here. You can even preload your CGI script
with Apache::RegistryLoader
and preopen the DB connections with Apache::DBI
. (See Preload Perl modules at server startup).
Many people wonder, why there is a need for duplication of use()
clause both in startup file and in the script itself. The question rises
from misunderstanding of the use()
operand. use()
consists of two other operands, namely require()
and import()
. So when you write:
use Foo qw(bar);
perl actually does:
require Foo.pm; import qw(bar);
When you write:
use Foo qw();
perl actually does:
require Foo.pm; import qw();
which means that the caller does not want any symbols to be imported. Why
is this important? Since some modules has @EXPORT
set to a list of tags to be exported by default and when you write:
use Foo;
and think nothing is being imported, the import()
call is being executed and probably some symbols do being imported. See the
docs/source of the module in question to make sure you use()
it correctly. When you write your own modules, always remember that it's
better to use @EXPORT_OK
instead of @EXPORT
, since the former doesn't export tags unless it was asked to.
Since the symbols that you might import into a startup's script namespace
will be visible by none of the children, scripts that need a
Foo
's module exported tags have to pull it in like if you did not preload Foo
at the startup file. For example, just because you have
use()d
Apache::Constants
in the startup script, does not mean you can have the following handler:
package MyModule; sub { my $r = shift; ## Cool stuff goes here return OK; }
1;
You would either need to add:
use Apache::Constants qw( OK );
Or instead of return OK;
say:
return Apache::Constants::OK;
See the manpage/perldoc on Exporter
and perlmod
for more on
import()
.
PerlRequire
allows you to execute code that preloads modules and does more things.
Imported or defined variables are visible in the scope of the startup file.
It is a wrong assumption that global variables that were defined in the
startup file, will be accessible by child processes.
You do have to define/import variables in your scripts and they will be
visible inside a child process who run this script. They will be not shared
between siblings. Remember that every script is running in a specially
(uniquely) named package - so it cannot access variables from other
packages unless it inherits from them or use()
's them.
apachectl configtest
tests the configuration file without starting the server. You can safely
modify the configuration file on your production server, if you run this
test before you restart the server. Of course it is not 100% error prone,
but it will reveal any syntax errors you might do while editing the file.
'apachectl configtest
' is the same as 'httpd -t
' and it actually executes the code in startup.pl, not just parses it. <Perl
> configuration has always started Perl during the configuration read,
Perl{Require,Module}
do so as well.
If you want your startup code to get a control over the -t
(configtest
) server launch, start the server configuration test with:
httpd -t -Dsyntax_check
and in your startup file, add (at the top):
return if Apache->define('syntax_check');
if you want to prevent the code in the file from being executed.
For PerlWarn and PerlTaintCheck see Switches -w, -T
See Tuning the Apache's configuration variables for the best performance
It is advised not to publish the 8080 (or alike) port number in URLs, but rather using a proxying rewrite rule in the thin (httpd_docs) server:
RewriteRule .*/perl/(.*) http://my.url:8080/perl/$1 [P]
One problem with publishing 8080 port numbers is that I was told that IE 4.x has a bug when re-posting data to a non-port-80 url. It drops the port designator, and uses port 80 anyway.
With <Perl></Perl>
sections, it is possible to configure your server entirely in Perl.
<Perl> sections can contain *any* and as much Perl code as you wish. These
sections are compiled into a special package whose symbol table mod_perl
can then walk and grind the names and values of Perl variables/structures
through the apache core configuration gears. Most of the configurations
directives can be represented as scalars ($scalar) or lists (@list). An @List
inside these sections is simply converted into a space delimited string for
you inside. Here is an example:
#httpd.conf <Perl> @PerlModule = qw(Mail::Send Devel::Peek); #run the server as whoever starts it $User = getpwuid($>) || $>; $Group = getgrgid($)) || $); $ServerAdmin = $User; </Perl>
Block sections such as <Location
..</Location>> are represented in a
%Location
hash, e.g.:
$Location{"/~dougm/"} = { AuthUserFile => '/tmp/htpasswd', AuthType => 'Basic', AuthName => 'test', DirectoryIndex => [qw(index.html index.htm)], Limit => { METHODS => 'GET POST', require => 'user dougm', }, };
If a Directive can take two *or* three arguments you may push strings and
the lowest number of arguments will be shifted off the @List
or use array reference to handle any number greater than the minimum for
that directive:
push @Redirect, "/foo", "http://www.foo.com/"; push @Redirect, "/imdb", "http://www.imdb.com/"; push @Redirect, [qw(temp "/here" "http://www.there.com")];
Other section counterparts include %VirtualHost
, %Directory
and
%Files
.
To pass all environment variables to the children with a single configuration directive, rather than listing each one via PassEnv or PerlPassEnv, a <Perl> section could read in a file and:
push @PerlPassEnv, [$key => $val];
or
Apache->httpd_conf("PerlPassEnv $key $val");
These are somewhat simple examples, but they should give you the basic
idea. You can mix in any Perl code your heart desires. See
eg/httpd.conf.pl
and eg/perl_sections.txt
in mod_perl distribution for some examples.
A tip for syntax checking outside of httpd:
<Perl> # !perl #... code here ... __END__ </Perl>
Now you may run:
perl -cx httpd.conf
To enable <Perl
> sections you should build mod_perl with perl
Makefile.PL PERL_SECTIONS=1.
You can watch how have you configured the <Perl
> sections through the /perl-status location, by choosing the Perl Sections from the menu.
You can dump the configuration by <Perl
> sections configuration this way:
<Perl> use Apache::PerlSections(); ... print STDERR Apache->PerlSections->dump(); </Perl>
Alternatively you can store it in a file:
Apache::PerlSections->store("httpd_config.pl");
You can then require()
that file in some other <Perl
> section.
mod_macro is an Apache module written by Fabien Coelho that lets you define and use macros in the Apache configuration file.
mod_macro proved really useful when you have many virtual hosts, each virtual host has a number of scripts/modules, most of them with a moderately complex configuration setup.
First download the latest version of mod_macro from http://www.cri.ensmp.fr/~coelho/mod_macro/ , and configure your Apache server to use this module.
Here are some useful macros for mod_perl users:
# set up a registry script <Macro registry> SetHandler "perl-script" PerlHandler Apache::Registry Options +ExecCGI </Macro>
# example Alias /stuff /usr/www/scripts/stuff <Location /stuff> Use registry </Location>
If your registry scripts are all located in the same directory, and your aliasing rules consistent, you can use this macro:
# set up a registry script for a specific location <Macro registry $location $script> Alias /script /usr/www/scripts/$script <Location $location> SetHandler "perl-script" PerlHandler Apache::Registry Options +ExecCGI </Location> </Macro>
# example Use registry stuff stuff.pl
If you're using content handlers packaged as modules, you can use the following macro:
# set up a mod_perl content handler module <Macro modperl $module> SetHandler "perl-script" Options +ExecCGI PerlHandler $module </Macro>
#examples <Location /perl-status> PerlSetVar StatusPeek On PerlSetVar StatusGraph On PerlSetVar StatusDumper On Use modperl Apache::Status </Location>
The following macro sets up a Location for use with HTML::Embperl. Here we define all ``.html'' files to be processed by Embperl.
<Macro embperl> SetHandler "perl-script" Options +ExecCGI PerlHandler HTML::Embperl PerlSetEnv EMBPERL_FILESMATCH \.html$ </Macro>
# examples <Location /mrtg> Use embperl </Location>
Macros are also very useful for things that tend to be verbose, such as setting up Basic Authentication:
# Sets up Basic Authentication <Macro BasicAuth $realm $group> Order deny,allow Satisfy any AuthType Basic AuthName $realm AuthGroupFile /usr/www/auth/groups AuthUserFile /usr/www/auth/users Require group $group Deny from all </Macro>
# example of use <Location /stats> Use BasicAuth WebStats Admin </Location>
Finally, here is a complete example that uses macros to set up simple virtual hosts. It uses the BasicAuth macro defined previously (yes, macros can be nested!).
<Macro vhost $ip $domain $docroot $admingroup> <VirtualHost $ip> ServerAdmin webmaster@$domain DocumentRoot /usr/www/htdocs/$docroot ServerName www.$domain <Location /stats> Use BasicAuth Stats-$domain $admingroup </Location> </VirtualHost> </Macro>
# define some virtual hosts Use vhost 10.1.1.1 example.com example example-admin Use vhost 10.1.1.2 example.net examplenet examplenet-admin
mod_macro also useful in a non vhost setting. Some sites for example have lots of scripts where people use to view various statistics, email settings and etc. It is much easier to read things like:
use /forwards email/showforwards use /webstats web/showstats
Check your configuration files and make sure that the ``ExecCGI'' is turned on in your configurations.
<Location /perl> SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI allow from all PerlSendHeader On </Location>
Did you put PerlSendHeader On in the configuration part of the <Location foo></Location>?
No. Any virtual host will be able to see the routines from a startup.pl loaded for any other virtual host.
You can use 'PerlSetEnv PERL5LIB ...' or a PerlFixupHandler w/ the lib pragma.
Even a better way is to use Apache::PerlVINC
This has been a bug before, last fixed in 1.15_01, i.e. if you are running 1.15, that could be the problem. You should set this variable in a startup file (PerlRequire):
$Apache::Registry::NameWithVirtualHost = 1;
But, as we know sometimes bug turns into a feature. If there is the same script running for more than one Virtual host on the same machine, this can be a waste, right? Set it to 0 in a startup script if you want to turn it off and have this bug as a feature. (Only makes sense if you are sure that there will be no otherscripts named by the same path/name). It also saves you some memory on the way.
$Apache::Registry::NameWithVirtualHost = 0;
The problem was reported by users who declared mod_perl configuration inside a <Directory> section for all files matching to *.pl. The problem has gone away after placing the usage of mod_perl in a <File>- section.
It is better not to advertise the port mod_perl server running at to the
outside world for it creates a potential security risk by revealing which
module(s)
and/or OS you are running your web server on.
The more modules you have in your web server, the more complex the code in your webserver.
The more complex the code in your web server, the more chances for bugs.
The more chance for bugs, the more chance that some of those bugs may involve security.
Never was completely sure why the default of the ServerToken directive in Apache is Full rather than Minimal. Seems like you would only make it full if you are debugging.
For more information see Publishing port numbers different from 80
Another approach is to modify httpd sources to reveal no unwanted
information, so if you know the port the HEAD
request will return an empty or phony Server:
field.
Let's say that you want all the file in a specific directory and below to be handled the same way, but a few of them to be handled somewhat different. For example:
<Directory /home/foo> <FilesMatch "\.(html|txt)$"> SetHandler perl-script PerlHandler Apache::AddrMunge </FilesMatch> </Directory>
Alternatively you can use <Files> inside an .htaccess
file.
Note that you cannot have Files
derective inside Location
, but you can have Files
inside Directory
.
When the server is restarted. the configuration and module initialization phases are called again (twice in total). To ensure that the future restart will workout correctly, Apache actually runs these two phases twice during server startup, to check that all modules can survive a restart.
(META: And add an example that writes to the log file - I was restarted 1, 2 times)
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This new document was born because some problems come up so often on the mailing list that should be stressed in the guide as one of the most important things to read/beware of. So I have tried to enlist them in this document. If you think some important problem that is being reported frequently on the list and covered in the guide but not included below, please tell.
See my() scoped variable in nested subroutines.
See Evil things might happen when using PerlFreshRestart
|
||
Written by Stas Bekman.
Last Modified at 07/09/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
All of these techniques require that you know the server PID (Process ID).
The easiest way to find the PID is to look it up in the httpd.pid
file. With my configuration it exists as
/usr/local/var/httpd_perl/run/httpd.pid
. It's easy to discover where to look at, by checking out the httpd.conf
file. Open the file and locate the entry PidFile
:
PidFile /usr/local/var/httpd_perl/run/httpd.pid
Another way is to use the ps
and grep
utilities:
% ps auxc | grep httpd_perl
or maybe:
% ps -ef | grep httpd_perl
This will produce a list of all httpd_perl
(the parent and the children) processes. You are looking for the parent
process. If you run your server as root - you will easily locate it, since
it belongs to root. If you run the server as user (when you don't have a root access, most likely all the processes will belong to that user (unless defined
differently in the
httpd.conf
), but it's still easy to know 'who is the parent' -- the one of the
smallest size...
You will notice many httpd_perl
executables running on your system, but you should not send signals to any
of them except the parent, whose pid is in the PidFile
. That is to say you shouldn't ever need to send signals to any process
except the parent. There are three signals that you can send the parent: TERM, HUP, and USR1.
We will concentrate here on the implications of sending these signals to a mod_perl enabled server. For documentation on the implications of sending these signals to a plain Apache server see http://www.apache.org/docs/stopping.html .
Sending the TERM signal to the parent causes it to immediately attempt to kill off all of its children. This process may take several seconds to complete, following which the parent itself exits. Any requests in progress are terminated, and no further requests are served.
That's the moment that the accumulated END
blocks will be executed! Note that if you use Apache::Registry
or Apache::PerlRun
, then
END
blocks are being executed upon each request (at the end).
Sending the HUP signal to the parent causes it to kill off its children like in TERM (Any requests in progress are terminated) but the parent doesn't exit. It re-reads its configuration files, and re-opens any log files. Then it spawns a new set of children and continues serving hits.
The server will reread its configuration files, flush all the compiled and preloaded modules, and rerun any startup files. It's equivalent to stopping, then restarting a server.
Note: If your configuration file has errors in it when you issue a restart then your parent will not restart but exit with an error. See below for a method of avoiding this.
The USR1 signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.
The only difference between USR1 and HUP is that USR1 allows children to complete any in-progress request prior to killing them off.
By default, if a server is restarted (ala kill -USR1 `cat
logs/httpd.pid`
or with HUP signal), Perl scripts and modules are not reloaded. To reload PerlRequire's, PerlModule's, other
use()
'd modules and flush the Apache::Registry
cache, enable with this command:
PerlFreshRestart On (in httpd.conf)
Make sure you read Evil things might happen when using PerlFreshRestart.
It's worth mentioning that restart or termination can sometimes take quite
a lot of time. Check out the PERL_DESTRUCT_LEVEL=-1
option during the mod_perl perl Makefile.PL
stage, which speeds this up and leads to more robust operation in the face
of problems, like running out of memory. It is only usable if no
significant cleanup has to be done by perl END
blocks and DESTROY
methods when the child terminates, of course. What constitutes significant
cleanup? Any change of state outside of the current process that would not
be handled by the operating system itself. So committing database
transactions is significant but closing an ordinary file isn't.
Some folks prefer to specify signals using numerical values, rather than
symbolics. If you are looking for these, check out your
kill(3)
man page. My page points to /usr/include/sys/signal.h
, the relevant entries are:
#define SIGHUP 1 /* hangup, generated when terminal disconnects */ #define SIGTERM 15 /* software termination signal */ #define SIGUSR1 30 /* user defined signal 1 */
Apache's distribution provides a nice script to control the server. It's
called apachectl and it's installed into the same location with httpd. In our scenario -
it's
/usr/local/sbin/httpd_perl/apachectl
.
Start httpd:
% /usr/local/sbin/httpd_perl/apachectl start
Stop httpd:
% /usr/local/sbin/httpd_perl/apachectl stop
Restart httpd if running by sending a SIGHUP or start if not running:
% /usr/local/sbin/httpd_perl/apachectl restart
Do a graceful restart by sending a SIGUSR1 or start if not running:
% /usr/local/sbin/httpd_perl/apachectl graceful
Do a configuration syntax test:
% /usr/local/sbin/httpd_perl/apachectl configtest
Replace httpd_perl
with httpd_docs
in the above calls to control the httpd_docs server.
There are other options for apachectl, use help
option to see them all.
It's important to understand that this script is based on the PID file
which is PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid
. If you delete the file by hand - apachectl will fail to run.
Also, notice that apachectl is suitable to use from within your Unix system's startup files so that
your web server is automatically restarted upon system reboot. Either copy
the apachectl file to the appropriate location (/etc/rc.d/rc3.d/S99apache
works on my RedHat Linux system) or create a symlink with that name
pointing to the the canonical location. (If you do this, make certain that
the script is writable only by root -- the startup scripts have root
privileges during init processing, and you don't want to be opening any
security holes.)
For those who wants to use SUID startup script, here is an example for you. This script is SUID to root, and should be executable only by members of some special group at your site. Note the 10th line, which ``fixes an obscure error when starting apache/mod_perl'' by setting the real to the effective UID. As others have pointed out, it is the mismatch between the real and the effective UIDs that causes Perl to croak on the -e switch.
Note that you must be using a version of Perl that recognizes and emulates
the suid bits in order for this to work. The script will do different
things depending on whether it is named start_http
,
stop_http
or restart_http
. You can use symbolic links for this purpose.
#!/usr/bin/perl # These constants will need to be adjusted. $PID_FILE = '/home/www/logs/httpd.pid'; $HTTPD = '/home/www/httpd -d /home/www'; # These prevent taint warnings while running suid $ENV{PATH}='/bin:/usr/bin'; $ENV{IFS}=''; # This sets the real to the effective ID, and prevents # an obscure error when starting apache/mod_perl $< = $>; $( = $) = 0; # set the group to root too # Do different things depending on our name ($name) = $0 =~ m|([^/]+)$|; if ($name eq 'start_http') { system $HTTPD and die "Unable to start HTTP"; print "HTTP started.\n"; exit 0; } # extract the process id and confirm that it is numeric $pid = `cat $PID_FILE`; $pid =~ /(\d+)/ or die "PID $pid not numeric"; $pid = $1; if ($name eq 'stop_http') { kill 'TERM',$pid or die "Unable to signal HTTP"; print "HTTP stopped.\n"; exit 0; } if ($name eq 'restart_http') { kill 'HUP',$pid or die "Unable to signal HTTP"; print "HTTP restarted.\n"; exit 0; } die "Script must be named start_http, stop_http, or restart_http.\n";
With mod_perl many things can happen to your server. The worst one is the possibility that the server will die when you will be not around. As with any other critical service you need to run some kind of watchdog.
One simple solution is to use a slightly modified apachectl script which I called apache.watchdog and to put it into the crontab to be called every 30 minutes or even every minute - if it's so critical to make sure the server will be up all the time.
The crontab entry:
0,30 * * * * /path/to/the/apache.watchdog >/dev/null 2>&1
The script:
#!/bin/sh # this script is a watchdog to see whether the server is online # It tries to restart the server if it's # down and sends an email alert to admin # admin's email EMAIL=webmaster@somewhere.far #EMAIL=root@localhost # the path to your PID file PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid # the path to your httpd binary, including options if necessary HTTPD=/usr/local/sbin/httpd_perl/httpd_perl # check for pidfile if [ -f $PIDFILE ] ; then PID=`cat $PIDFILE` if kill -0 $PID; then STATUS="httpd (pid $PID) running" RUNNING=1 else STATUS="httpd (pid $PID?) not running" RUNNING=0 fi else STATUS="httpd (no pid file) not running" RUNNING=0 fi if [ $RUNNING -eq 0 ]; then echo "$0 $ARG: httpd not running, trying to start" if $HTTPD ; then echo "$0 $ARG: httpd started" mail $EMAIL -s "$0 $ARG: httpd started" </dev/null >& /dev/null else echo "$0 $ARG: httpd could not be started" mail $EMAIL -s "$0 $ARG: httpd could not be started" </dev/null >& /dev/null fi fi
Another approach, probably even more practical, is to use the cool LWP
perl package , to test the server by trying to fetch some document (script)
served by the server. Why is it more practical? Because, while server can
be up as a process, it can be stuck and not working, So failing to get the
document will trigger restart, and ``probably'' the problem will go away.
(Just replace start
with restart
in the $restart_command
below.
Again we put this script into a crontab to call it every 30 minutes.
Personally I call it every minute, to fetch some very light script. Why so
often? If your server starts to spin and trash your disk's space with
multiply error messages, in a 5 minutes you might run out of free space,
which might bring your system to its knees. And most chances that no other
child will be able to serve requests, since the system will be too busy,
writing to an error_log
file. Think big -- if you are running a heavy service, which is very fast,
since you are running under mod_perl, adding one more request every minute,
will be not felt by the server at all.
So we end up with crontab entry:
* * * * * /path/to/the/watchdog.pl >/dev/null 2>&1
And the watchdog itself:
#!/usr/local/bin/perl -w use strict; use diagnostics; use URI::URL; use LWP::MediaTypes qw(media_suffix); my $VERSION = '0.01'; use vars qw($ua $proxy); $proxy = '';
require LWP::UserAgent; use HTTP::Status; ###### Config ######## my $test_script_url = 'http://www.stas.com:81/perl/test.pl'; my $monitor_email = 'root@localhost'; my $restart_command = '/usr/local/sbin/httpd_perl/apachectl restart'; my $mail_program = '/usr/lib/sendmail -t -n'; ###################### $ua = new LWP::UserAgent; $ua->agent("$0/Stas " . $ua->agent); # Uncomment the proxy if you don't use it! # $proxy="http://www-proxy.com"; $ua->proxy('http', $proxy) if $proxy; # If returns '1' it's we are alive exit 1 if checkurl($test_script_url); # We have got the problem - the server seems to be down. Try to # restart it. my $status = system $restart_command; # print "Status $status\n"; my $message = ($status == 0) ? "Server was down and successfully restarted!" : "Server is down. Can't restart."; my $subject = ($status == 0) ? "Attention! Webserver restarted" : "Attention! Webserver is down. can't restart"; # email the monitoring person my $to = $monitor_email; my $from = $monitor_email; send_mail($from,$to,$subject,$message); # input: URL to check # output: 1 if success, o for fail ####################### sub checkurl{ my ($url) = @_; # Fetch document my $res = $ua->request(HTTP::Request->new(GET => $url)); # Check the result status return 1 if is_success($res->code); # failed return 0; } # end of sub checkurl # sends email about the problem ####################### sub send_mail{ my($from,$to,$subject,$messagebody) = @_; open MAIL, "|$mail_program" or die "Can't open a pipe to a $mail_program :$!\n"; print MAIL <<__END_OF_MAIL__; To: $to From: $from Subject: $subject $messagebody __END_OF_MAIL__ close MAIL; }
Often while developing new code, you will want to run the server in single process mode. See Sometimes it works Sometimes it does Not and Names collisions with Modules and libs Running in single process mode inhibits the server from ``daemonizing'', allowing you to run it more easily under debugger control.
% /usr/local/sbin/httpd_perl/httpd_perl -X
When you execute the above the server will run in the fg (foreground) of the shell you have called it from. So to kill you just kill it with Ctrl-C.
Note that in -X
mode the server will run very slowly while fetching images. If you use
Netscape while your server is running in single-process mode, HTTP's KeepAlive
feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off
KeepAlive
in httpd.conf
to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the
Netscape will be able to render the rest of the page).
In addition you should know that when running with -X
you will not see any control messages that the parent server normally
writes to the error_log. (Like ``server started, server stopped and etc''.)
Since
httpd -X
causes the server to handle all requests itself, without forking any
children, there is no controlling parent to write status messages.
If you are the only developer working on the specific server:port - you have no problems, since you have a complete control over the server. However, many times you have a group of developers who need to concurrently develop their own mod_perl scripts. This means that each one will want to have control over the server - to kill it, to run it in single server mode, to restart it again, etc., as well to have control over the location of the log files and other configuration settings like MaxClients, etc. You can work around this problem by preparing a few httpd.conf file and forcing each developer to use:
httpd_perl -f /path/to/httpd.conf
I have approached it in other way. I have used the -Dparameter
startup option of the server. I call my version of the server
% http_perl -Dsbekman
In httpd.conf
I wrote:
# Personal development Server for sbekman # sbekman use the server running on port 8000 <IfDefine sbekman> Port 8000 PidFile /usr/local/var/httpd_perl/run/httpd.pid.sbekman ErrorLog /usr/local/var/httpd_perl/logs/error_log.sbekman Timeout 300 KeepAlive On MinSpareServers 2 MaxSpareServers 2 StartServers 1 MaxClients 3 MaxRequestsPerChild 15 </IfDefine> # Personal development Server for userfoo # userfoo use the server running on port 8001 <IfDefine userfoo> Port 8001 PidFile /usr/local/var/httpd_perl/run/httpd.pid.userfoo ErrorLog /usr/local/var/httpd_perl/logs/error_log.userfoo Timeout 300 KeepAlive Off MinSpareServers 1 MaxSpareServers 2 StartServers 1 MaxClients 5 MaxRequestsPerChild 0 </IfDefine>
What we have achieved with this technique: Full control over start/stop, number of children, separate error log file, and port selection. This saves me from getting called every few minutes - ``Stas, I'm going to restart the server''.
To make things even easier. (In the above technique, you have to discover
the PID of your parent httpd_perl
process - written in
/usr/local/var/httpd_perl/run/httpd.pid.userfoo
) . We change the
apachectl script to do the work for us. We make a copy for each developer called apachectl.username and we change 2 lines in script:
PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid.sbekman HTTPD='/usr/local/sbin/httpd_perl/httpd_perl -Dsbekman'
Of course you think you can use only one control file and know who is calling by using uid, but since you have to be root to start the server - it is not so simple.
The last thing was to let developers an option to run in single process mode by:
/usr/local/sbin/httpd_perl/httpd_perl -Dsbekman -X
In addition to making life easier, we decided to use relative links everywhere in the static docs (including the calls to CGIs). You may ask how using the relative link you will get to the right server? Very simple - we have utilized the mod_rewrite to solve our problems:
In access.conf
of the httpd_docs
server we have the following code: (you have to configure your httpd_docs
server with
--enable-module=rewrite
)
# sbekman' server # port = 8000 RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl) RewriteCond %{REMOTE_ADDR} 123.34.45.56 RewriteRule ^(.*) http://nowhere.com:8000/$1 [R,L] # userfoo's server # port = 8001 RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl) RewriteCond %{REMOTE_ADDR} 123.34.45.57 RewriteRule ^(.*) http://nowhere.com:8001/$1 [R,L] # all the rest RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl) RewriteRule ^(.*) http://nowhere.com:81/$1 [R]
where IP numbers are the IPs of the developer client machines (where they
are running their web browser.) (I have tried to use
REMOTE_USER
since we have all the users authenticated but it did not work for me)
So if I have a relative URL like /perl/test.pl
written in some html or even http://www.nowhere.com/perl/test.pl
in my case (user at machine of sbekman
) it will be redirected by httpd_docs
to
http://www.nowhere.com:8000/perl/test.pl
.
Of course you have another problem: The CGI generates some html, which should be called again. If it generates a URL with hard coded PORT the above scheme will not work. There 2 solutions:
First, generate relative URL so it will reuse the technique above, with
redirect (which is transparent for user) but it will not work if you have
something to POST
(redirect looses all the data!).
Second, use a general configuration module which generates a correct full
URL according to REMOTE_USER
, so if $ENV{REMOTE_USER} eq
'sbekman'
, I return http://www.nowhere.com:8000/perl/
as
cgi_base_url
. Again this will work if the user is authenticated.
All this is good for development. It is better to use the full URLs in
production, since if you have a static form and the Action
is relative but the static document located on another server, pressing the
form's submit will cause a redirect to mod_perl server, but all the form's
data will be lost during the redirect.
Many times you start off debugging your script by running it from your
favorite shell. Sometimes you encounter a very weird situation when script
runs from the shell but dies when called as a CGI. The real problem lies in
the difference between the environment that is being used by your server
and your shell. An example can be a different perl path or having PERL5LIB
env variable which includes paths that are not in the @INC
of the perl compiled with mod_perl server and configured during the
startup.
The best debugging approach is to write a wrapper that emulates the exact
environment of the server, by first deleting the environment variables like PERL5LIB
and calling the same perl binary that it is being used by the server. Next,
set the environment identical to the server's by copying the perl run
directives from server startup and configuration files. It will also allow
you to remove completely the first line of the script - since mod_perl
skips it and the wrapper knows how to call the script.
Below is the example of such a script. Note that we force the -Tw
when we call the real script. (I have also added the ability to pass
params, which will not happen when you call the cgi from the web)
#!/usr/local/bin/perl -w # This is a wrapper example # It simulates the web server environment by setting the @INC and other # stuff, so what will run under this wrapper will run under web and # vice versa. # # Usage: wrap.pl some_cgi.pl # BEGIN{ use vars qw($basedir); $basedir = "/usr/local"; # we want to make a complete emulation, # so we must remove the user's environment @INC = (); # local perl libs push @INC, qw($basedir/lib/perl5/5.00502/aix $basedir/lib/perl5/5.00502 $basedir/lib/perl5/site_perl/5.005/aix $basedir/lib/perl5/site_perl/5.005 ); } use strict; use File::Basename; # process the passed params my $cgi = shift || ''; my $params = (@ARGV) ? join(" ", @ARGV) : ''; die "Usage:\n\t$0 some_cgi.pl\n" unless $cgi; # Set the environment my $PERL5LIB = join ":", @INC; # if the path includes the directory # we extract it and chdir there if ($cgi =~ m|/|) { my $dirname = dirname($cgi); chdir $dirname or die "Can't chdir to $dirname: $! \n"; $cgi =~ m|$dirname/(.*)|; $cgi = $1; } # run the cgi from the script's directory # Note that we invoke warnings and Taint mode ON!!! system qq{$basedir/bin/perl -I$PERL5LIB -Tw $cgi $params};
A little bit off topic but good to know and use with mod_perl where your error_log can grow at a 10-100Mb per day rate if your scripts spit out lots of warnings...
To rotate the logs do:
mv access_log access_log.renamed kill -HUP `cat httpd.pid` sleep 10; # allow some children to complete requests and logging # now it's safe to use access_log.renamed .....
The effect of SIGUSR1 and SIGHUP is detailed in: http://www.apache.org/docs/stopping.html .
I use this script:
#!/usr/local/bin/perl -Tw # this script does a log rotation. Called from crontab. use strict; $ENV{PATH}='/bin:/usr/bin'; ### configuration my @logfiles = qw(access_log error_log); umask 0; my $server = "httpd_perl"; my $logs_dir = "/usr/local/var/$server/logs"; my $restart_command = "/usr/local/sbin/$server/apachectl restart"; my $gzip_exec = "/usr/bin/gzip"; my ($sec,$min,$hour,$mday,$mon,$year) = localtime(time); my $time = sprintf "%0.2d.%0.2d.%0.2d-%0.2d.%0.2d.%0.2d", $year,++$mon,$mday,$hour,$min,$sec; $^I = ".".$time; # rename log files chdir $logs_dir; @ARGV = @logfiles; while (<>) { close ARGV; } # now restart the server so the logs will be restarted system $restart_command; # compress log files foreach (@logfiles) { system "$gzip_exec $_.$time"; }
Randal L. Schwartz contributed this:
Cron fires off setuid script called log-roller that looks like this:
#!/usr/bin/perl -Tw use strict; use File::Basename; $ENV{PATH} = "/usr/ucb:/bin:/usr/bin"; my $ROOT = "/WWW/apache"; # names are relative to this my $CONF = "$ROOT/conf/httpd.conf"; # master conf my $MIDNIGHT = "MIDNIGHT"; # name of program in each logdir my ($user_id, $group_id, $pidfile); # will be set during parse of conf die "not running as root" if $>; chdir $ROOT or die "Cannot chdir $ROOT: $!"; my %midnights; open CONF, "<$CONF" or die "Cannot open $CONF: $!"; while (<CONF>) { if (/^User (\w+)/i) { $user_id = getpwnam($1); next; } if (/^Group (\w+)/i) { $group_id = getgrnam($1); next; } if (/^PidFile (.*)/i) { $pidfile = $1; next; } next unless /^ErrorLog (.*)/i; my $midnight = (dirname $1)."/$MIDNIGHT"; next unless -x $midnight; $midnights{$midnight}++; } close CONF; die "missing User definition" unless defined $user_id; die "missing Group definition" unless defined $group_id; die "missing PidFile definition" unless defined $pidfile; open PID, $pidfile or die "Cannot open $pidfile: $!"; <PID> =~ /(\d+)/; my $httpd_pid = $1; close PID; die "missing pid definition" unless defined $httpd_pid and $httpd_pid; kill 0, $httpd_pid or die "cannot find pid $httpd_pid: $!"; for (sort keys %midnights) { defined(my $pid = fork) or die "cannot fork: $!"; if ($pid) { ## parent: waitpid $pid, 0; } else { my $dir = dirname $_; ($(,$)) = ($group_id,$group_id); ($<,$>) = ($user_id,$user_id); chdir $dir or die "cannot chdir $dir: $!"; exec "./$MIDNIGHT"; die "cannot exec $MIDNIGHT: $!"; } } kill 1, $httpd_pid or die "Cannot sighup $httpd_pid: $!";And then individual MIDNIGHT scripts can look like this:
#!/usr/bin/perl -Tw use strict; die "bad guy" unless getpwuid($<) =~ /^(root|nobody)$/; my @LOGFILES = qw(access_log error_log); umask 0; $^I = ".".time; @ARGV = @LOGFILES; while (<>) { close ARGV; }Can you spot the security holes? Our trusted user base can't or won't. :) But these shouldn't be used in hostile situations.
Sometimes calling an undefined subroutine in a module can cause a tight loop that consumes all memory. Here is a way to catch such errors. Define an autoload subroutine:
sub UNIVERSAL::AUTOLOAD { my $class = shift; warn "$class can't \$UNIVERSAL::AUTOLOAD!\n"; }
It will produce a nice error in error_log, giving the line number of the call and the name of the undefined subroutine.
Sometimes an error happens and causes the server to write millions of lines
into your error_log
file and in a few minutes to put your server down on its knees. For example
I get an error Callback called
exit
show up in my error_log file many times. The error_log
file grows to 300 Mbytes in size in a few minutes. You should run a cron
job to make sure this does not happen and if it does to take care of it.
Andreas J. Koenig is running this shell script every minute:
S=`ls -s /usr/local/apache/logs/error_log | awk '{print $1}'` if [ "$S" -gt 100000 ] ; then mv /usr/local/apache/logs/error_log /usr/local/apache/logs/error_log.old /etc/rc.d/init.d/httpd restart date | /bin/mail -s "error_log $S kB on inx" myemail@domain.com fi
It seems that his script will trigger restart every minute, since once the logfile grows to be of 100000 lines, it will stay of this size, unless you remove or rename it, before you do restart. On my server I run a watchdog every five minutes which restarts the server if it is getting stuck (it always works since when some modperl child process goes wild, the I/O it causes is so heavy that other brother processes cannot normally to serve the requests.) See Monitoring the Server for more hints.
Also check out the daemontools from ftp://koobera.math.uic.edu/www/daemontools.html :
,----- | cyclog writes a log to disk. It automatically synchronizes the log | every 100KB (by default) to guarantee data integrity after a crash. It | automatically rotates the log to keep it below 1MB (by default). If | the disk fills up, cyclog pauses and then tries again, without losing | any data. `-----
|
||
Written by Stas Bekman.
Last Modified at 08/05/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This document describes ``special'' traps you may encounter when running
your plain CGIs under Apache::Registry
and Apache::PerlRun
.
In a non modperl script (stand alone or CGI), there is no problem writing code like this:
use CGI qw/param/; my $x = param('x'); sub printit { print "$x\n"; }
However, the script is run under Apache::Registry
, it will in fact be repackaged into something like this:
package $mangled_package_name; sub handler { #line1 $original_filename use CGI qw/param/; my $x = param('x'); sub printit { print "$x\n"; } }
Now printit()
is an inner named subroutine. Because it is referencing a lexical variable
from an enclosing scope, a closure is created.
The first time the script is run, the correct value of $x
will
be printed. However on subsequent runs, printit()
will retain the initial value of $x
-- not what you want.
Always use -w
(or/and PerlWarn ON
)! Perl will then emit a warning like:
Value of $x will not stay shared at - line 5.
NOTE: Subroutines defined inside BEGIN{}
and END{}
cannot trigger this message, since each BEGIN{}
and END{}
is defined to be called exactly once. (To understand why, read about the
closures at
perlref
or perlfaq
13.12)
PERLDIAG manpage says:
An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.
Check your code by running Apache in single-child mode (httpd
-X
). Since the value of a my variable retain its initial value per
child process
, the closure problem can be difficult to track down in multi-user mode. It
will appear to work fine until you have cycled through all the httpd
children.
If a variable needs file scope, use a global variable:
use vars qw/$x/; use CGI qw/param/; $x = param('x'); sub printit { print "$x\n"; }
You can safely use a my()
scoped variable if its value is constant:
use vars qw/$x/; use CGI qw/param/; $x = param('x'); my $y = 5; sub printit { print "$x, $y\n"; }
Also see the clarification of my()
vs. use vars
- Ken Williams writes:
Yes, there is quite a bit of difference! With use vars(), you are making an entry in the symbol table, and you are telling the compiler that you are going to be referencing that entry without an explicit package name. With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler figures out _at_ _compile_time_ which my() variables (i.e. lexical variables) are the same as each other, and once you hit execute time you can not go looking those variables up in the symbol table.
And my()
vs. local()
- Randal Schwartz writes:
local() creates a temporal-limited package-based scalar, array, hash, or glob -- when the scope of definition is exited at runtime, the previous value (if any) is restored. References to such a variable are *also* global... only the value changes. (Aside: that is what causes variable suicide. :) my() creates a lexically-limited non-package-based scalar, array, or hash -- when the scope of definition is exited at compile-time, the variable ceases to be accessible. Any references to such a variable at runtime turn into unique anonymous variables on each scope exit.
For more information see: Using global variables and sharing them between modules/packages and an article by Mark-Jason Dominus about how Perl handles variables and
namespaces, and the difference between use vars()
and my()
- http://www.plover.com/~mjd/perl/FAQs/Namespaces.html
.
When using a regular expression that contains an interpolated Perl
variable, if it is known that the variable (or variables) will not vary
during the execution of the program, a standard optimization technique
consists of adding the /o
modifier to the regexp pattern. This directs the compiler to build the
internal table once, for the entire lifetime of the script, rather than
every time the pattern is executed. Consider:
my $pat = '^foo$'; # likely to be input from an HTML form field foreach( @list ) { print if /$pat/o; }
This is usually a big win in loops over lists, or when using grep()
or map()
operators.
In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken.
There are two solutions to this problem:
The first -- is to use eval q//
, to force the code to be evaluated each time. Just make sure that the eval
block covers the entire loop of processing, and not just the pattern match
itself.
The above code fragment would be rewritten as:
my $pat = '^foo$'; eval q{ foreach( @list ) { print if /$pat/o; } }
Just saying:
foreach( @list ) { eval q{ print if /$pat/o; }; }
is going to be a horribly expensive proposition.
You can use this approach if you require more than one pattern match
operator in a given section of code. If the section contains only one
operator (be it an m//
or s///
), you can rely on the property of the null pattern, that reuses the last
pattern seen. This leads to the second solution, which also eliminates the
use of eval.
The above code fragment becomes:
my $pat = '^foo$'; "something" =~ /$pat/; # dummy match (MUST NOT FAIL!) foreach( @list ) { print if //; }
The only gotcha is that the dummy match that boots the regular expression
engine must absolutely, positively succeed, otherwise the pattern will not
be cached, and the //
will match everything. If you can't count on fixed text to ensure the match
succeeds, you have two possibilities.
If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:
"$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present
If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the unsearchable \377 character as follows:
"\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present
Another approach:
It depends on the complexity of the regexp you apply this technique to. One common usage where compiled regexp is usually more efficient is to ``match any one of a group of patterns'' over and over again.
Maybe with some helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book ``Mastering Regex''.
##################################################### # Build_MatchMany_Function # -- Input: list of patterns # -- Output: A code ref which matches its $_[0] # against ANY of the patterns given in the # "Input", efficiently. # sub Build_MatchMany_Function { my @R = @_; my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R ); my $matchsub = eval "sub { $expr }"; die "Failed in building regex @R: $@" if $@; $matchsub; }
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww); $Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) { # ... $browser = get_browser_field($_); if ( ! &$Known_Browser($browser) ) { print STDERR "Unknown Browser: $browser\n"; } # ... }
Running in httpd -X mode. (good only for testing during development phase).
You want to test that your application correctly handles global variables
(if you have any - the less you have of them the better, but sometimes you
just can't without them). It's hard to test with multiple servers serving
your cgi since each child has a different value for its global variables.
Imagine that you have a random()
sub that returns a random number and you have the following script.
use vars qw($num); $num ||= random(); print ++$num;
This script initializes the variable $num
with a random value, then increments it on each request and prints it out.
Running this script in multiple server environments will result in
something like 1
,
9
, 4
, 19
(number per reload), since each time your script will be served by a
different child. (On some OSes, the parent httpd process will assign all of
the requests to the same child process if all of the children are idle...
AIX...). But if you run in httpd -X
single server mode you will get 2
, 3
, 4
, 5
... (assuming that the random()
returned 1
at the first call)
But do not get too obsessive with this mode, since working only in single server mode sometimes hides problems that show up when you switch to a normal (multi) server mode. Consider an application that allows you to change the configuration at run time.
Let's say the script produces a form to change the background color of the page. It's not a good design, but for the sake of demonstrating the potential problem, we will assume that our script doesn't write the changed background color to the disk, but simply changes it in memory, like:
use vars qw($bgcolor); # assign default value at first invocation $bgcolor ||= "white"; # modify the color if requested to $bgcolor = $q->param('bgcolor') || $bgcolor;
So you have typed in a new color, and in response, your script prints back the html with a new color - you think that's it! It was so simple. And if you keep running in single server mode you will never notice that you have a problem...
If you run the same code in the normal server mode, after you submit the color change you will get the result as expected, but when you will call the same URL again (not reload!) chances are that you will get back the original default color (white in our case), since except the child who processed the color change request no one knows about their global variable change. Just remember that children can't share information, other than that which they inherited from their parent on their load. Of course you should use a hidden variable for the color to be remembered or store it on the server side (database, shared memory, etc).
Also note that since the server is running in single mode, if the output
returns HTML with <IMG
> tags, then the load of these will take a lot of time.
When you use Netscape client while your server is running in single-process
mode, if the output returns a HTML with <IMG
> tags, then the load of these will take a lot of time, since the KeepAlive
feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off KeepAlive
in httpd.conf
to avoid this effect.
Also note that since the server is running in single mode, if the output
returns HTML with <IMG
> tags, then the load of these will take a lot of time. If you use
Netscape while your server is running in single-process mode, HTTP's KeepAlive
feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off
KeepAlive
in httpd.conf
to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the
Netscape will be able to render the rest of the page).
In addition you should know that when running with -X
you will not see any control messages that the parent server normally
writes to the error_log. (Like ``server started, server stopped and etc''.)
Since
httpd -X
causes the server to handle all requests itself, without forking any
children, there is no controlling parent to write status messages.
Under mod_perl, files that have been created after the server's (child?)
startup are being reported with negative age with -M
(-C
-A
) test. This is obvious if you remember that you will get the negative
result if the server was started before the file was created and it's a
normal behavior with any perl.
If you want to have -M
test to count the time relative to the current request, you should reset
the $^T
variable as with any other perl script. Just add $^T=time;
at the beginning of the scripts.
When a user presses the STOP button, Apache will detect that via
$SIG{PIPE}
and will cease the script execution. When we are talking about mod_cgi,
there is generally no problem, since all opened files will be closed and
all the resources will be freed (almost all -- if you happened to use
external lock files, most likely the resources that are being locked by
these will be left blocked and non-usable by any others who use the same
advisory locking scheme.)
It's important to notice that when the user hits the browser's STOP button, the mod_perl script is blissfully unaware until it tries to send some data to the browser. At that point, Apache realizes that the browser is gone, and all the good cleanup stuff happens.
Starting from apache 1.3.6 apache will not catch SIGPIPE anymore and modperl will do it much better. Here is something from CHANGES from Apache 1.3.6.
*) SIGPIPE is now ignored by the server core. The request write routines (ap_rputc, ap_rputs, ap_rvputs, ap_rwrite, ap_rprintf, ap_rflush) now correctly check for output errors and mark the connection as aborted. Replaced many direct (unchecked) calls to ap_b* routines with the analogous ap_r* calls. [Roy Fielding]
What happens if your mod_perl script has some global variables, that are being used for resource locking?
It's possible not to notice the pitfall if the critical code section between lock and unlock is very short and finishes fast, so you never see this happens (you aren't fast enough to stop the code in the middle). But look at the following scenario:
1. lock resource <critical section starts> 2. sleep 20 (== do some time consuming processing) <critical section ends> 3. unlock resource
If user presses STOP and Apache sends SIGPIPE
before step 3, since we are in the mod_perl mode and we want the lock
variable to be cached, it will be not unlocked. A kind of deadlock exists.
Here is the working example. Run the server with -X
, Press STOP
before the count-up to 10 has been finished. Then rerun the script, it'll
hang in while(1)
! The resource is not available anymore to this child.
use vars qw(%CACHE); use CGI; $|=1; my $q = new CGI; print $q->header,$q->start_html; print $q->p("$$ Going to lock!\n"); # actually the while loop below is not needed # (since it's an internal lock and accessible only # by the same process and it if it's locked... it's locked for the # whole child's life while (1) { unless (defined $CACHE{LOCK} and $CACHE{LOCK} == 1) { $CACHE{LOCK} = 1; print $q->p("Got the lock!\n"); last; } } print $q->p("Going to sleep (I mean working)!"); my $c=0; foreach (1..10) { sleep 1; print $c++,"\n<BR>"; } print $q->p("Going to unlock!"); $CACHE{LOCK} = 0; print $q->p("Unlock!\n");
You may ask, what is the solution for this problem? As noted in the
END blocks any END
blocks that are encountered during compilation of Apache::Registry
scripts are called after the script done is running, including subsequent
invocations when the script is cached in memory. So if you are running in Apache::Registry
mode, the following is your remedy:
END { $CACHE{LOCK} = 0; }
Notice that the END
block will be run after the
Apache::Registry::handler
is finished (not during the cleanup phase though).
If you are into a perl API, use the register_cleanup()
method of Apache.
$r->register_cleanup(sub {$CACHE{LOCK} = 0;});
If you are into Apache API
Apache->request->connection->aborted()
construct can be used to test for the aborted connection.
I hope you noticed, that this example is very misleading, since there is a
different instance of %CACHE
in every child, so if you modify it -- it is known only inside the same
child, none of global %CACHE
variables in other children is getting affected. But if you are going to
work with code that allows you to control variables that are being visible
to every child (some external shared memory or other approach) -- the
hazard this example still applies. Make sure you unlock the resources
either when you stop using them or when the script is being aborted in the
middle, before the actual unlocking is being happening.
A similar situation to Pressed Stop button disease happens when client (browser) timeouts the connection (is it about 2
minutes?) . There are cases when your script is about to perform a very
long operation and there is a chance that its duration will be longer than
the client's timeout. One case I can think about is the DataBase
interaction, where the DB engine hangs or needs a lot of time to return
results. If this is the case, use $SIG{ALRM}
to prevent the timeouts:
$timeout = 10; # seconds eval { local $SIG{ALRM} = sub { die "Sorry timed out. Please try again\n" }; alarm $timeout; ... db stuff ... alarm 0; }; die $@ if $@;
But, as lately it was discovered local $SIG{'ALRM'}
does not restore the original underlying C handler. It was fixed in the
mod_perl 1.19_01 (CVS version). As a matter of fact none of the
local $SIG{FOO}
restore the original C handler - read Debugging Signal Handlers ($SIG{FOO}) for a debug technique and a possible workaround.
Your CGI does not work and you want to see what the problem is. The best idea is to check out any errors that the server may be reporting. Where I can find these errors?
Generally all errors are logged into an error_log file. The exact file
location and name are defined in the http.conf file. Look for the
ErrorLog
parameter. My httpd.conf says:
ErrorLog var/logs/error_log
Hey, where is the beginning of the path? There is another Apache parameter
called ServerRoot
. Every time apache sees a value of the parameter with no absolute path
(e.g /tmp/my.txt
) but with relative path (e.g my.txt
) it prepends the value of the ServerRoot
to this value. I have:
ServerRoot /usr/local/apache
So I will look for error_log file at
/usr/local/apache/var/logs/error_log
. Of course you can also use an absolute path to define the file's location
at the file system.
<META>: is this 100% correct?
But there are cases when errors don't go to the error_log file. For example some errors are being printed to the console (tty) you have executed the httpd from (unless you redirected the httpd's stderr flow). This happens when the server didn't open the error_log file for writing yet.
For example, if you have mistakenly entered a non-existent directory path
in your ErrorLog
directive, the error message will be printed on the controlling tty. Or, if
the error happens when server executes
PerlRequire
or PerlModule
directive you might see the errors here also.
You are probably wonder where all the errors go when you are running the
server in single mode (httpd -X
). They go to the console. That is because when running in the single mode
there is no parent httpd process to perform all the logging. It includes
all the status messages that generally show up in the error_log file.
</META>
Perl uses sh()
for its iteractions for system()
and open()
calls. So when you want to set a temporary variable when you call a script
from your CGI you do:
open UTIL, "USER=stas ; script.pl | " or die "...: $!\n";
or
system "USER=stas ; script.pl";
This is useful for example if you need to invoke a script that uses CGI.pm from within a mod_perl script. We are tricking the perl script to think it's a simple CGI, which is not running under mod_perl.
open(PUBLISH, "GATEWAY_INTERFACE=CGI/1.1 ; script.cgi \"param1=value1¶m2=value2\" |") or die "...: $!\n";
Make sure, that the parameters you pass are shell safe (All ``unsafe'' characters like single-tick should be properly escaped).
However you are fork-ing to run a Perl script, so you have thrown the so hardly gained performance out the window. Whatever script.cgi is now, it should be moved to a module with a subroutine you can call directly from your script, to avoid the fork.
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Enabling use diagnostics;
generally helps you to determine the source of the problem and how to solve
it. See diagnostics pragma for more info.
See Use of uninitialized value at (eval 80) line 12.
my() scoped variable in nested subroutines.
my() scoped variable in nested subroutines.
That message happens when the client breaks the connection while your script is trying to write to the client. With Apache 1.3.x, you should only see the rwrite messages if LogLevel is set to debug.
This bug has been fixed in mod_perl 1.19_01 (CVS version).
[modperl] caught SIGPIPE in process 1234 [modperl] process 1234 going to Apache::exit with status...
That's the $SIG{PIPE} handler installed by mod_perl/Apache::SIG, called if a connection timesout or Client presses the 'Stop' button. It gives you an opportunity to do cleanups if the script was aborted in the middle of its execution. See Handling the 'User pressed Stop button' case for more info.
If your mod_perl version < 1.17 you might get the message in the following section...
Client hit STOP or Netscape bit it! Process 2493 going to Apache::exit with status=-2
You will see this message in mod_perl < 1.17. See caught SIGPIPE in process.
That's a mandatory warning inside Perl. It happens only if you modify your
script and Apache::Registry reloads it. Perl is warning you that the
subroutine(s)
were redefined. It is mostly harmless. If you
don't like seeing those, just kill -USR2
(graceful restart) apache when you modify your scripts.
<META> Someone said: You won't see that warning in this case with 5.004_05 or 5.005+.
I'm running perl5.00502 and I still get these warnings???
Who is right? </META>
The script below will print a warning like above, moreover it will print the whole script as a part of the warning message:
#!/usr/bin/perl -w use strict; print "Content-type: text/html\n\n"; print "Hello $undefined";
The warning:
Global symbol "$undefined" requires explicit package name at /usr/apps/foo/cgi/tmp.pl line 4. eval 'package Apache::ROOT::perl::tmp_2epl;use Apache qw(exit);sub handler { #line 1 /usr/apps/foo/cgi/tmp.pl BEGIN {$^W = 1;}#!/usr/bin/perl -w use strict; print "Content-type: text/html\\n\\n"; print "Hello $undefined"; } ;' called at /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 168 Apache::Registry::compile('package Apache::ROOT::perl::tmp_2epl;use Apache qw(exit);sub han...') called at /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 121 Apache::Registry::handler('Apache=SCALAR(0x205026c0)') called at /usr/apps/foo/cgi/tmp.pl line 4 eval {...} called at /usr/apps/foo/cgi/tmp.pl line 4 [Sun Nov 15 15:15:30 1998] [error] Undefined subroutine &Apache::ROOT::perl::tmp_2epl::handler called at / usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 135. [Sun Nov 15 15:15:30 1998] [error] Goto undefined subroutine &Apache::Constants::SERVER_ERROR at /usr/apps /lib/perl5/site_perl/5.005/aix/Apache/Constants.pm line 23.
The error is simple to fix. When you use the use strict;
pragma (and you should...), all variables should be defined before being
used.
The bad thing is that sometimes the whole script (possibly, thousands of lines) is printed to error_log file as a code that the server has tried to eval()uate.
As Doug answered to this question:
Looks like you have a $SIG{__DIE__} handler installed (Carp::confess?). That's what's expected if so.
It wasn't in my case, but may be yours.
Bryan Miller said:
You might wish to try something more terse such as "local $SIG{__WARN__} = \&Carp::cluck;" The confess method is _very_ verbose and will tell you more than you might wish to know including full source.
Can't undef active subroutine at /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 102. Called from package Apache::Registry, filename /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm, line 102
This problem is caused when, a client drops the connection while httpd is in the middle of a write, httpd timeout happens, sending a SIGPIPE, and Perl in that child is stuck in the middle of its eval context. This is fixed by the Apache::SIG module which is called by default. This should not happen unless you have code that is messing with $SIG{PIPE}. It's also triggered only when you've changed your script on disk and mod_perl is trying to reload it.
Your code includes some undefined variable that you have used as if it was already defined and initialized. For example:
$param = $q->param('test'); print $param;
vs.
$param = $q->param('test') || ''; print $param;
In the second case, $param
will always be defined
, either
$q->param('test')
returns some value or undef
.
Also read about finding the line number the error/warning has been triggered at.
See Names collisions with Modules and libs.
Check that all your modules are compiled with the same perl that is being compiled into mod_perl. perl 5.005 and 5.004 are not binary compatible by default.
Other known causes of this problem: OS distributions that ship with a (broken) binary Perl installation. The `perl' program and `libperl.a' library are somehow built with different binary compatibility flags.
The solution to these problems is to rebuild Perl and any extension modules from a fresh source tree. Tip for running Perl's Configure script: use the `-des' flags to accepts defaults and `-D' flag to override certain attributes:
% ./Configure -des -Dcc=gcc ... && make test && make install Read Perl's INSTALL doc for more details.
Solaris OS specific:
Can't load DBI or similar Error for the IO module or whatever dynamic module mod_perl tries to pull in first. The solution is to re-configure, re-build and re-install Perl and dynamic modules with the following flags when Configure asks for ``additional LD flags'': -Xlinker --export-dynamic or
-Xlinker -E
This problem is only known to be caused by installing gnu ld under Solaris.
See Out_of_memory!
I've just discovered that my server is not responding and its error log has filled up the remaining space on the file system (about a gig's worth). The error_log includes millions of lines:
Callback called exit at -e line 33, <HTML> chunk 1.
Why the looping?
Perl can get *very* confused inside an endless loop in your code, it
doesn't mean your code called 'exit', but Perl's malloc went haywire and
called croak(),
but no memory is left to properly report the
error, so Perl is stuck in a loop writing that same message to stderr.
Perl 5.005+ plus is recommended for its improved malloc.c and features mentioned in mod_perl_traps.pod on by default.
If something goes really wrong with your code, Perl may die with an ``Out of memory!'' message and/or ``Callback called exit''. Common causes of this are never-ending loops, deep recursion, or calling an undefined subroutine. Here's one way to catch the problem: See Perl's INSTALL document for this item:
=item -DPERL_EMERGENCY_SBRK
If PERL_EMERGENCY_SBRK is defined, running out of memory need not be a fatal error: a memory pool can allocated by assigning to the special variable $^M. See perlvar(1) for more details.
If you compile with that option and add 'use Apache::Debug level =>
4;
' to your PerlScript, it will allocate the $^M emergency pool and the
$SIG{__DIE__} handler will call Carp::confess, giving you a stack trace
which should reveal where the problem is. See the Apache::Resource module
for prevention of spinning httpds.
From mod_perl.pod: With Apache versions 1.3.0 and higher, mod_perl will
call the perl_destruct()
Perl API function during the child
exit phase. This will cause proper execution of END blocks found during server startup along with invoking the DESTROY method on global objects who are still alive. It is possible that this
operation may take a long time to finish, causing problems during a
restart. If your code does not contain and END blocks or DESTROY methods which need to be run during child server shutdown, this destruction
can be avoided by setting the PERL_DESTRUCT_LEVEL environment variable to -1
.
RegistryLoader: Cannot translate the URI /home/httpd/perl/test.pl into a real path to the filename. Please refer to the manpage for more information or use the complete method's call like: $r->handler(uri,filename);\n";
This warning shows up when RegistryLoader fails to translate the URI into
the corresponding filesystem path. Most of failures happen when one passes
a file path instead of URI. (A reminder: /home/httpd/perl/test.pl is a file
path, while /perl/test.pl is an URI). In most cases all you have to do is
to pass something that RegistryLoader expects to get - the URI, but there
are more complex cases. RegistryLoader's man page shows how to handle these
cases as well (watch for the trans()
sub).
Unfortunately, not all perl modules are robust enough to survive reload, for them, unusual situation. PerlFreshRestart does not much more than:
while (my($k,$v) = each %INC) { delete $INC{$k}; require $k; }
Besides that, it flushes the Apache::Registry cache, and empties any dynamic stacked handlers (e.g. PerlChildInitHandler).
Lots of SegFaults and other problems were reported by users who have turned PerlFreshRestart
On. Most of them have gone away when it was turned off. It doesn't mean that
you shouldn't use it, if it works for you. Just be aware of the dragons...
See Choosing MaxClients.
syntax error at /dev/null line 1, near "line arguments:" Execution of /dev/null aborted due to compilation errors. parse: Undefined error: 0
There is a chance that your /dev/null
device is broken. Try:
% sudo echo > /dev/null
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Lets state it, your site or service can easily become a target for some Internet ``terrorists'', for something that you said, because of success your site has, or for no obvious reason. Your site can be broken in, the whole data can be deleted, some important information can be stolen, or a much easier the site can be turned into a useless with a _simple_ Services Deny Attack. What can you do about it? Protect yourself! Whatever you do, your site will be still vulnerable as long as you are connected to the network. Cut the connections down, turn off your machine and put it into a safe - now it is more protected, but very useless.
Let's first get acquainted with security related terminology:
When you want to make sure that a user is the one he claims to be, you generally ask her for a username and a password, right? Once you have both: the username and the password, you can validate them in your database of username/password pairs, and if both match - you know that user has passed the Authentication stage. From this moment on if you keep the session open all you need is to remember the username.
You might want to let user foo to access to some resource, but restrict her from accessing to another resource, which in turn is accessible only for user bar. The process of checking access rights is being called Authorization. For Authorization all you need is a username or some other attribute you authorize upon. For example you can authorize upon IP number, for example allowing only your local users to use some service. Be warned that IP numbers or session_ids can be spoofed, and that is why you would not do Authorization without Authentication.
Actually you are familiar with both terms for a long time - when you telnet to your account on some machine you go through a login process (Authentication). When you try to read some file on your file systems the kernel checks first the permissions on this file. (you go through Authorization). That's why you could hear about Access control which is another name for the same term.
I am going to present some real world security requirements and their implementations.
If you run an Extranet (Very similar to Intranet but partly accessible from the outside, e.g. read only) you might want to let your internal users an non-restricted access to your web server, but if these users are calling from the outside of your organization you want to make sure these are your employees.
First one is achieved very simply using the IP patterns of the organization
in a Perl Access Handler in an .htaccess file
, which consequently sets the REMOTE_USER environmental variable to a
generic organization's username, so that certain scripts which rely on
REMOTE_USER
environment variable will work properly.
The second one should detect that the IP comes from the outside and user should be authenticated, before allowed in.
As you understood this is a pure authentication problem. Once user passes
the authentication, either bypassing it because of his IP address, or after
entering correct login/password pair, the
REMOTE_USER
variable is being set. Now having it set we can talk about authorization.
OK let's see the implementation. First we modify the <httpd.conf>:
PerlModule My::Auth <Location /private> PerlAccessHandler My::Auth::access_handler PerlSetVar Intranet "100.100.100.1 => userA, 100.100.100.2 => userB" PerlAuthenHandler My::Auth::authen_handler AuthName realm AuthType Basic Require valid-user </Location>
Now the code of My/Auth.pm:
sub access_handler {
my $r = shift;
unless ($r->some_auth_required) { $r->log_reason("No authentication has been configured"); return FORBIDDEN; } # get list of IP addresses my %ips = split /\s*(?:=>|,)\s*/, $r->dir_config("Intranet");
if (my $user = $ips{$r->connection->remote_ip}) {
# update connection record $r->connection->user($user);
# do not ask for a password $r->set_handlers(PerlAuthenHandler => [\&OK]); } return OK; } sub authen_handler {
my $r = shift;
# get user's authentication credentials my ($res, $sent_pw) = $r->get_basic_auth_pw; return $res if $res != OK; my $user = $r->connection->user;
# authenticate through DBI my $reason = authen_dbi ($r, $user, $sent_pw, $niveau);
if ($reason) { $r->note_basic_auth_failure; $r->log_reason ($reason, $r->uri); return AUTH_REQUIRED; } return OK; } sub authen_dbi{ my ($r, $user, $sent_pw, $niveau) = @_;
# validate username/passwd
return 0 if (*PASSED*) return "Failed for X reason";
}
Either implement your authen_dbi()
routine, or replace
authen_handler()
with any authentication handler such as
Apache::AuthenDBI
.
access_handler()
sets REMOTE_USER
to be either userA
or
userB
according on the IP, if non matched PerlAuthenHandler
will be not set to OK, and the next Authentication stage will ask the user
for a login and password.
To force authenticated user to reauthenticate just send the following header to the browser:
WWW-Authenticate: Basic realm="My Realm" HTTP/1.0 401 Unauthorized
This will pop-up (in Netscape at least) a window saying that the Authorization Failed. Retry? And an OK and a Cancel buttons. When that window pops up you know that the password has been cleared. If the user hits the Cancel button the username will also be cleared, and you can have a page that will be output written after the header if this is a CGI (or PHP, or anything else like that). If the user hits the OK button, the authentication window will be brought up again with their previous username already in the username field.
In Perl API you would use note_basic_auth_failure()
method to
force reauthentication.
Since the browser's behaviour is in no way guaranteed, it also may not work, and that should be noted.
When your authentication handler returns OK, it means that user has
correctly authenticated and now $r->connection->user
will have the username set for all the subsequent requests. For
Apache::Registry
and friends, where the environment variables setting weren't turned off, an
equivalent $ENV{REMOTE_USER}
variable will be available. Password is available only through Perl API
with help of get_basic_auth_pw()
method.
If there is a failure, returned AUTH_REQUIRED
flag will tell the browser to pop up an authentication window, to try
again, unless it's a first time. For example:
my($status, $sent_pw) = $r->get_basic_auth_pw; unless($r->connection->user and $sent_pw) { $r->note_basic_auth_failure; $r->log_reason("Both a username and password must be provided"); return AUTH_REQUIRED; }
Let's say that you have a mod_perl authen handler, where user credentials
are checked against a database, and it either returns
OK
or AUTH_REQUIRED
. One of the possible authentication failure case might happen when the
username/password are correct, but the user's account has been suspended
temporarily.
If this is the case you would like to make the user aware of this, through a page display instead of having the browser pop up the auth dialog again. At the same time you need to refuse the authentication, of course.
The solution is to return FORBIDDEN
, but before that you should set a custom error page for this specific
handler, with help of
$r->custom_response
. Something like:
use Apache::Constants qw(:common); $r->custom_response(SERVER_ERROR, "/errors/suspended_account.html"); return FORBIDDEN if $suspended;
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Nowadays millions of users surf the Internet. There are millions of
Terabytes of data laying around. To manipulate that data new smart
techniques and technologies were invented. One of the major inventions was
a relational database, which allows to search and modify huge data storages
in zero time. It uses SQL (Structured Query Language) to manipulate contents of these databases. Of
course once we started to use the web, we have found a need to write web
interfaces to these databases and CGI was and is the mostly used technology
for building such interfaces. The main limitation for a CGI script driving
a database versus application, is its statelessness - on every request the
CGI script has to initiate a connection to the database, when the request
is completed the connection is lost. Apache::DBI
was written to remove this limitation. When you use it, you have a
persistent database connection over the process' life. As you understand
this possible only with CGI running under mod_perl enabled server, since
the child process does not quit when the request has been served. So when a
mod_perl script needs to _talk_ to a database, he starts _talking_ right
away, without initiating a database connection first, Apache::DBI
worries to provide a valid connection immediately. Of course the are more
nuances, which will be talked about in the following sections.
This module initiates a persistent database connection. It is possible only with mod_perl.
When loading the DBI module (do not confuse this with the Apache::DBI module) it looks if the environment variable GATEWAY_INTERFACE starts with 'CGI-Perl' and if the module Apache::DBI has been loaded. In this case every connect request will be forwarded to the Apache::DBI module. This looks if a database handle from a previous connect request is already stored and if this handle is still valid using the ping method. If these two conditions are fulfilled it just returns the database handle. If there is no appropriate database handle or if the ping method fails, a new connection is established and the handle is stored for later re-use. In other words when the script is run again from a child that has already (and is still) connected, the host/username/password is checked against the cache of open connections, and if one is available, uses that one. There is no need to delete the disconnect statements from your code. They won't do anything because the Apache::DBI module overloads the disconnect method with a NOP (like an empty call).
You want to use this module if you are opening a few DB connections to the server. Apache::DBI
will make them persistent per child, so if you have 10 children and each
opens 2 different connections you will have in total 20 opened persistent
connections. Thus after initial connect you will save up the connection
time for every connect request from your DBI module. Which is a huge
benefit for the mod_perl apache server with high traffic of users deploying
the relational DB.
As you understand you must NOT use this module if you are opening a special
connection for each of your users, since each of them will stay persistent
and in a short time the number of connections will be so big that your
machine will scream and die. If you want to use
Apache::DBI
in both situations, as of this moment the only available solution is to run
2 mod_perl servers, one using
Apache::DBI
and one not.
After installing this module, the configuration is simple - add to the
httpd.conf
the following directive.
PerlModule Apache::DBI
Note that it is important, to load this module before any other ApacheDBI module !
You can skip preloading DBI
, since Apache::DBI
does that. But there is no harm if you leave it in.
If you want that when you call the script after server restart, the
connection will be already preopened, you should use
connect_on_init()
method in the startup file to preload every connection you are going to
use. For example:
Apache::DBI->connect_on_init ("DBI:mysql:myDB::myserver", "username", "passwd", { PrintError => 1, # warn() on errors RaiseError => 0, # don't die on error AutoCommit => 1, # commit executes immediately } );
As noted before, it is wise to you this method only if you only want all of apache to be able to connect to the database server as one user (or few users).
Be warned though, that if your database is down, apache children will get
delayed trying to connect_on_init()
. They won't begin serving requests until they get logged in, or the login
attempt fails (which can take several minutes (depending on your DBD
driver).
If you are not sure this module is working as advertised, you should enable the Debug mode in the startup script by:
$Apache::DBI::DEBUG = 1;
Since now on you will see in the error.log
file when Apache::DBI initializes a connection and when it just returns it
from its cache. Use the following command to see it in the real time (your
error.log
file might be locate at a different path):
tail -f /usr/local/apache/logs/error_log
I use alias
(in tcsh
) so I will not have to remember the path:
alias err "tail -f /usr/local/apache/logs/error_log"
Another approach is to add to httpd.conf
(which does the same):
PerlModule Apache::DebugDBI
SQL server keeps the connection to the client open for a limited period of
time. So many developers were bitten by so called Morning
bug when every morning the first users to use the site were receiving:
No Data Returned
message, but then everything worked as usual. The error caused by Apache::DBI
returning a handle of the invalid connection (server closed it because of
timeout), and the script was dying on that error. The infamous and well
documented in the man page,
ping()
method was introduced to solve this problem. But seems that people are
still being beaten by this problem. Another solution was found - to rise
the timeout parameter at the SQL server startup. Currently I startup mySQL
server with safe_mysql
script, so I have updated it to use this option:
nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800
Where 172800 secs equal to 48 hours. And it works.
Note that starting from ver. 0.82
, Apache::DBI
implements ping inside the eval
block, so if the handle has been timed out, it should reconnect without
creating the morning bug.
Q: Currently I am running into a problem where I found out that
Apache::DBI
insists that the connection will opened exactly the same way before it will
decide to use a cached connection. I.e. if I have one script that sets LongReadLen
and one that does not, it will be two different connections. Instead of
having a max of 40 open connections, I end up with 80.
A: indeed, Apache::DBI
returns a cached database handle, if and only if all parameters including
all options are identical. But you are free to modify the handle right
after you got it from the cache. Initiate the connections always using the
same parameters and set LongReadLen
afterwards.
Q: I cannot find the handler name with which to manipulate my connection; hence I seem to be unable to do anything to my database.
A: You did not use DBI::connect()
as with usual DBI usage to get your $dbh
database handler.
Using the Apache::DBI
does not eliminate the need to write a proper DBI
code. As the man page states - you should program as if you did not use Apache::DBI
at all. Apache::DBI
will override and return you a cached connection. And in case of disconnect()
call it will be just ignored.
Make sure you have it installed.
Make sure you configured mod_perl with EVERYTHING=1.
Use the example script eg/startup.pl (remove the comment from the line
#use Apache::DebugDBI;
and adapt the connect string. Do not change anything in your scripts, for using Apache::DBI.
Does your error_log looks like this:
10169 Apache::DBI PerlChildInitHandler 10169 Apache::DBI skipping connection cache during server startup Database handle destroyed without explicit disconnect at /usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 29.
then you are trying to open a database connection in the parent httpd process. If you do, children will get a copy of this handle, causing clashes when the handle is used by two processes at the same time. Each child must have its own unique connection handle.
To avoid this problem, Apache::DBI
checks whether it is called during server startup. In this case the module
skips the connection cache and returns immediately without a database
handle.
You have to use the method Apache::DBI->connect_on_init()
in the startup file.
Since many mod_perl developers uses mysql as their preferable engine, these
notes explain the difference between mysql_use_result()
and
mysql_store_result()
. The two influence the speed/size of the processes. DBD::mysql
documentation includes the following (version 2.0217) snippet:
mysql_use_result attribute: This forces the driver to use mysql_use_result rather than mysql_store_result. The former is faster and less memory consuming, but tends to block other processes. (That's why mysql_store_result is the default.)
Think about it in client/server terms. When you ask the server to
spoon-feed you the data as you use it, the server process must now buffer
the data and tie up that thread and possibly keep any database locks open
for a much longer time. So if you read a row of data, and ponder it for a
while, the tables you have locked are still locked, and the server is busy
talking to you every so often. That is
mysql_use_result()
.
If you just suck down the whole dataset to the client, then the server is
free to go about its business serving other requests. This results in
parallelism since the server and client are doing work at the same time,
rather than blocking on each other doing frequent I/O. That is
mysql_store_result()
.
As mysql manual suggests: you should not use mysql_use_result()
if you are doing a lot of processing for each row on the client side. This
will tie up the server and prevent other threads from updating any tables
from which the data are fetched.
In this section you will find scripts, modules and code snippets to help
get yourself started to use relational Databases with mod_perl scripts.
Note that I work with mysql
( http://www.mysql.com ), so the code
you will find will work out of box with mysql, if you use some other SQL
engine, it might work for you as well, or some changes should be applied.
Having to write many queries in my CGI scripts, made me to write a stand alone module that saves me a lot of time in writing and debugging my code. It also makes my scripts are much smaller and easier to read. I will present the module here, afterwards examples will follow:
Notice the DESTROY
block at the end of the module, which makes various cleanups and allows
this module to be used under mod_cgi
as well.
(note that you will not find it on CPAN)
package My::DB; use strict; use 5.004; use DBI; use vars qw(%c); %c = ( # DB debug #db_debug => 1, db_debug => 0, db => { DB_NAME => 'foo', SERVER => 'localhost', USER => 'put_username_here', USER_PASSWD => 'put_passwd_here', }, ); use Carp qw(croak verbose); #local $SIG{__WARN__} = \&Carp::cluck; # untaint the path by explicit setting local $ENV{PATH} = '/bin:/usr/bin'; ####### sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; # connect to the DB, Apache::DBI worries to cache the connections # save into a dbh - Database handle object $self->{dbh} = DBI->connect("DBI:mysql:$c{db}{DB_NAME}::$c{db}{SERVER}", $c{db}{USER}, $c{db}{USER_PASSWD}, { PrintError => 1, # warn() on errors RaiseError => 0, # don't die on error AutoCommit => 1, # commit executes immediately } ) or DBI->disconnect("Cannot connect to database: $DBI::errstr\n"); # we want to die on errors if in debug mode $self->{dbh}->{RaiseError} = 1 if $c{'db_debug'}; # init the sth - Statement handle object $self->{sth} = ''; bless ($self, $class); $self; } # end of sub new ###################################################################### ################################### ### ### ### SQL Functions ### ### ### ################################### ###################################################################### # print debug messages sub d{ # we want to print in debug mode the trace print "<DT><B>".join("<BR>", @_)."</B>\n" if $c{'db_debug'}; } # end of sub d ###################################################################### # return a count of matched rows, by conditions # # $count = sql_count_matched($table_name,\@conditions); # # conditions must be an array so we can path more than one column with # the same name. # # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # # The sub knows automatically to detect and quote strings # ########################## sub sql_count_matched{ my $self = shift; my $table = shift || ''; my $r_conds = shift || []; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "SELECT COUNT(*) FROM $table "; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { push @where, join " ", $$r_conds[$i], $$r_conds[$i+1][0], sql_quote(sql_escape($$r_conds[$i+1][1])); } # Add the where clause if we have one $do_sql .= "WHERE ". join " AND ", @where if @where; d("SQL: $do_sql"); # do query $self->{sth} = $self->{dbh}->prepare($do_sql); $self->{sth}->execute(); my ($count) = $self->{sth}->fetchrow_array; d("Result: $count"); $self->{sth}->finish; return $count; } # end of sub sql_count_matched ###################################################################### # return a single (first) matched value or undef, by conditions and # restrictions # # sql_get_matched_value($table_name,$column,\@conditions,\@restrictions); # # column is a name of the column # # conditions must be an array so we can path more than one column with # the same name. # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # The sub knows automatically to detect and quote strings # # restrictions is a list of restrictions like ('order by email') # ########################## sub sql_get_matched_value{ my $self = shift; my $table = shift || ''; my $column = shift || ''; my $r_conds = shift || []; my $r_restr = shift || []; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "SELECT $column FROM $table "; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { push @where, join " ", $$r_conds[$i], $$r_conds[$i+1][0], sql_quote(sql_escape($$r_conds[$i+1][1])); } # Add the where clause if we have one $do_sql .= " WHERE ". join " AND ", @where if @where; # restrictions (DONT put commas!) $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr}; d("SQL: $do_sql"); # do query return $self->{dbh}->selectrow_array($do_sql); } # end of sub sql_get_matched_value ###################################################################### # return a single row of first matched rows, by conditions and # restrictions. The row is being inserted into @results_row array # (value1,value2,...) or empty () if non matched # # sql_get_matched_row(\@results_row,$table_name,\@columns,\@conditions,\@restrictions); # # columns is a list of columns to be returned (username, fname,...) # # conditions must be an array so we can path more than one column with # the same name. # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # The sub knows automatically to detect and quote strings # # restrictions is a list of restrictions like ('order by email') # ########################## sub sql_get_matched_row{ my $self = shift; my $r_row = shift || {}; my $table = shift || ''; my $r_cols = shift || []; my $r_conds = shift || []; my $r_restr = shift || []; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "SELECT "; $do_sql .= join ",", @{$r_cols} if @{$r_cols}; $do_sql .= " FROM $table "; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { push @where, join " ", $$r_conds[$i], $$r_conds[$i+1][0], sql_quote(sql_escape($$r_conds[$i+1][1])); } # Add the where clause if we have one $do_sql .= " WHERE ". join " AND ", @where if @where; # restrictions (DONT put commas!) $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr}; d("SQL: $do_sql"); # do query @{$r_row} = $self->{dbh}->selectrow_array($do_sql); } # end of sub sql_get_matched_row ###################################################################### # return a ref to hash of single matched row, by conditions # and restrictions. return undef if nothing matched. # (column1 => value1, column2 => value2) or empty () if non matched # # sql_get_hash_ref($table_name,\@columns,\@conditions,\@restrictions); # # columns is a list of columns to be returned (username, fname,...) # # conditions must be an array so we can path more than one column with # the same name. # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # The sub knows automatically to detect and quote strings # # restrictions is a list of restrictions like ('order by email') # ########################## sub sql_get_hash_ref{ my $self = shift; my $table = shift || ''; my $r_cols = shift || []; my $r_conds = shift || []; my $r_restr = shift || []; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "SELECT "; $do_sql .= join ",", @{$r_cols} if @{$r_cols}; $do_sql .= " FROM $table "; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { push @where, join " ", $$r_conds[$i], $$r_conds[$i+1][0], sql_quote(sql_escape($$r_conds[$i+1][1])); } # Add the where clause if we have one $do_sql .= " WHERE ". join " AND ", @where if @where; # restrictions (DONT put commas!) $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr}; d("SQL: $do_sql"); # do query $self->{sth} = $self->{dbh}->prepare($do_sql); $self->{sth}->execute(); return $self->{sth}->fetchrow_hashref; } # end of sub sql_get_hash_ref ###################################################################### # returns a reference to an array, matched by conditions and # restrictions, which contains one reference to array per row. If # there are no rows to return, returns a reference to an empty array: # [ # [array1], # ...... # [arrayN], # ]; # # $ref = sql_get_matched_rows_ary_ref($table_name,\@columns,\@conditions,\@restrictions); # # columns is a list of columns to be returned (username, fname,...) # # conditions must be an array so we can path more than one column with # the same name. @conditions are being cancatenated with AND # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # results in # WHERE foo > 15 AND foo < 30 # # to make an OR logic use (then ANDed ) # @conditions = ( column => ['comp_sign',['value1','value2']], # foo => ['=',[15,24] ], # bar => ['=',[16,21] ], # ); # results in # WHERE (foo = 15 OR foo = 24) AND (bar = 16 OR bar = 21) # # The sub knows automatically to detect and quote strings # # restrictions is a list of restrictions like ('order by email') # ########################## sub sql_get_matched_rows_ary_ref{ my $self = shift; my $table = shift || ''; my $r_cols = shift || []; my $r_conds = shift || []; my $r_restr = shift || []; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "SELECT "; $do_sql .= join ",", @{$r_cols} if @{$r_cols}; $do_sql .= " FROM $table "; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { if (ref $$r_conds[$i+1][1] eq 'ARRAY') { # multi condition for the same field/comparator to be ORed push @where, map {"($_)"} join " OR ", map { join " ", $r_conds->[$i], $r_conds->[$i+1][0], sql_quote(sql_escape($_)); } @{$r_conds->[$i+1][1]}; } else { # single condition for the same field/comparator push @where, join " ", $r_conds->[$i], $r_conds->[$i+1][0], sql_quote(sql_escape($r_conds->[$i+1][1])); } } # end of for(my $i=0;$i<@{$r_conds};$i=$i+2 # Add the where clause if we have one $do_sql .= " WHERE ". join " AND ", @where if @where; # restrictions (DONT put commas!) $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr}; d("SQL: $do_sql"); # do query return $self->{dbh}->selectall_arrayref($do_sql); } # end of sub sql_get_matched_rows_ary_ref ###################################################################### # insert a single row into a DB # # sql_insert_row($table_name,\%data,$delayed); # # data is hash of type (column1 => value1 ,column2 => value2 , ) # # $delayed: 1 => do delayed insert, 0 or none passed => immediate # # * The sub knows automatically to detect and quote strings # # * The insert id delayed, so the user will not wait untill the insert # will be completed, if many select queries are running # ########################## sub sql_insert_row{ my $self = shift; my $table = shift || ''; my $r_data = shift || {}; my $delayed = (shift) ? 'DELAYED' : ''; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "INSERT $delayed INTO $table "; $do_sql .= "(".join(",",keys %{$r_data}).")"; $do_sql .= " VALUES ("; $do_sql .= join ",", sql_quote(sql_escape( values %{$r_data} ) ); $do_sql .= ")"; d("SQL: $do_sql"); # do query $self->{sth} = $self->{dbh}->prepare($do_sql); $self->{sth}->execute(); } # end of sub sql_insert_row ###################################################################### # update rows in a DB by condition # # sql_update_rows($table_name,\%data,\@conditions,$delayed); # # data is hash of type (column1 => value1 ,column2 => value2 , ) # # conditions must be an array so we can path more than one column with # the same name. # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # # $delayed: 1 => do delayed insert, 0 or none passed => immediate # # * The sub knows automatically to detect and quote strings # # ########################## sub sql_update_rows{ my $self = shift; my $table = shift || ''; my $r_data = shift || {}; my $r_conds = shift || []; my $delayed = (shift) ? 'LOW_PRIORITY' : ''; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "UPDATE $delayed $table SET "; $do_sql .= join ",", map { "$_=".join "",sql_quote(sql_escape($$r_data{$_})) } keys %{$r_data}; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { push @where, join " ", $$r_conds[$i], $$r_conds[$i+1][0], sql_quote(sql_escape($$r_conds[$i+1][1])); } # Add the where clause if we have one $do_sql .= " WHERE ". join " AND ", @where if @where; d("SQL: $do_sql"); # do query $self->{sth} = $self->{dbh}->prepare($do_sql); $self->{sth}->execute(); # my ($count) = $self->{sth}->fetchrow_array; # # d("Result: $count"); } # end of sub sql_update_rows ###################################################################### # delete rows from DB by condition # # sql_delete_rows($table_name,\@conditions); # # conditions must be an array so we can path more than one column with # the same name. # @conditions = ( column => ['comp_sign','value'], # foo => ['>',15], # foo => ['<',30], # ); # # * The sub knows automatically to detect and quote strings # # ########################## sub sql_delete_rows{ my $self = shift; my $table = shift || ''; my $r_conds = shift || []; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); # build the query my $do_sql = "DELETE FROM $table "; my @where = (); for(my $i=0;$i<@{$r_conds};$i=$i+2) { push @where, join " ", $$r_conds[$i], $$r_conds[$i+1][0], sql_quote(sql_escape($$r_conds[$i+1][1])); } # Must be very carefull with deletes, imagine somehow @where is # not getting set, "DELETE FROM NAME" deletes the contents of the table warn("Attempt to delete a whole table $table from DB\n!!!"),return unless @where; # Add the where clause if we have one $do_sql .= " WHERE ". join " AND ", @where; d("SQL: $do_sql"); # do query $self->{sth} = $self->{dbh}->prepare($do_sql); $self->{sth}->execute(); } # end of sub sql_delete_rows ###################################################################### # executes the passed query and returns a reference to an array which # contains one reference per row. If there are no rows to return, # returns a reference to an empty array. # # $r_array = sql_execute_and_get_r_array($query); # # ########################## sub sql_execute_and_get_r_array{ my $self = shift; my $do_sql = shift || ''; # we want to print in debug mode the trace d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]"); d("SQL: $do_sql"); $self->{dbh}->selectall_arrayref($do_sql); } # end of sub sql_execute_and_get_r_array # # # return current date formatted for a DATE field type # YYYYMMDD # ############ sub sql_date{ my $self = shift; my ($sec,$min,$hour,$mday,$mon,$year) = localtime(); $year = ($year>70) ? sprintf "19%0.2d",$year : sprintf "20%0.2d",$year; return sprintf "%0.4d%0.2d%0.2d",$year,++$mon,$mday; } # end of sub sql_date # # # return current date formatted for a DATE field type # YYYYMMDDHHMMSS # ############ sub sql_datetime{ my $self = shift; my ($sec,$min,$hour,$mday,$mon,$year) = localtime(); $year = ($year>70) ? sprintf "19%0.2d",$year : sprintf "20%0.2d",$year; return sprintf "%0.4d%0.2d%0.2d%0.2d%0.2d%0.2d",$year,++$mon,$mday,$hour,$min,$sec; } # end of sub sql_datetime # Quote the list of parameters , alldigits parameters are unquoted (int) # print sql_quote("one",2,"three"); => 'one' 2 'three' ############# sub sql_quote{ map{ /^(\d+|NULL)$/ ? $_ : "\'$_\'" } @_ } # Escape the list of parameters (all unsafe chars like ",' are escaped ) # must make a copy of @_ since we might try to change the passed # (Modification of a read-only value attempted) ############## sub sql_escape{ my @a = @_; map { s/([\'])/\\$1/g;$_} @a } # DESTROY makes all kinds of cleanups if the fuctions were interuppted # before their completion and haven't had a chance to make a clean up. ########### sub DESTROY{ my $self = shift; $self->{sth}->finish if defined $self->{sth} and $self->{sth}; $self->{dbh}->disconnect if defined $self->{dbh} and $self->{dbh}; } # end of sub DESTROY # Don't remove 1;
In your code that wants to use My::DB
, you have to create a
My::DB
object first:
use vars qw($db_obj); my $db_obj = new My::DB or croak "Can't initialize My::DB object: $!\n";
From this moment, you can use any My::DB
's methods. I will start from a very simple query - I want to know where
the users are and produce statistics. tracker
is the name of the table.
# fetch the statistics of where users are my $r_ary = $db_obj->sql_get_matched_rows_ary_ref ("tracker", [qw(where_user_are)], ); my %stats = (); my $total = 0; foreach my $r_row (@$r_ary){ $stats{$r_row->[0]}++; $total++; }
Now let's count how many users do we have (in users
table):
my $count = $db_obj->sql_count_matched("users");
Check whether user exists:
my $username = 'stas'; my $exists = $db_obj->sql_count_matched ("users", [username => ["=",$username]] );
Check whether user online and get time since when she is online (since
a column in the tracker
table telling since when user is online):
my @row = (); $db_obj->sql_get_matched_row (\@row, "tracker", ['UNIX_TIMESTAMP(since)'], [username => ["=",$username]] ); if (@row) { my $idle = int( (time() - $row[0]) / 60); return "Current status: Is Online and idle for $idle minutes."; }
A complex query. I do join of 2 tables, and want to get a reference to
array, which will store a slice of the matched query (LIMIT
$offset,$hits
), sorted by username
and each row in array_ref to include the fields from the users
table, but only those listed in
@verbose_cols
. Then we print it out.
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref ( "tracker STRAIGHT_JOIN users", [map {"users.$_"} @verbose_cols], [], ["WHERE tracker.username=users.username", "ORDER BY users.username", "LIMIT $offset,$hits"], ); foreach my $r_row (@$r_ary){ print ... }
Another complex query. User checks checkboxes to be queried by, selects
from lists and types in match strings, we process input and build the @where
array. Then we want to get the number of matches and the matched rows as
well.
my @where = (); # process chekoxes - we turn them into REGEXP foreach (keys %search_keys) { next unless defined $q->param($_) and $q->param($_); my $regexp = "[".join("",$q->param($_))."]"; push @where, ($_ => ['REGEXP',$regexp]); } # Now add all the single answer , selected => exact macth push @where,(country => ['=',$q->param('country')]) if $q->param('country'); # Now add all the typed params foreach (qw(city state)) { push @where,($_ => ['LIKE',$q->param($_)]) if $q->param($_); } # Do the count all matched query my $total_matched_users = $db_obj->sql_count_matched ( "users", \@where, ); # Now process the orderby my $orderby = $q->param('orderby') || 'username'; # Do the query and fetch the data my $r_ary = $db_obj->sql_get_matched_rows_ary_ref ( "users", \@display_columns, \@where, ["ORDER BY $orderby", "LIMIT $offset,$hits"], );
sql_get_matched_rows_ary_ref
knows to handle both OR
ed and
AND
ed params. This example shows how to use OR
on parameters:
This snippet is an implementation of the watchdog. Users register usernames
of the people they want to know when these are going online, so we have to
make 2 queries - one to get a list of these usernames, second to query
whether any of these users is online. In the second query we use OR
keyword.
# check who we are looking for $r_ary = $db_obj->sql_get_matched_rows_ary_ref ("watchdog", [qw(watched)], [username => ['=',$username)], ], ); # put them into an array my @watched = map {$_->[0]} @{$r_ary}; my %matched = (); # Do user has some registered usernames? if (@watched) { # try to bring all the users who match (exactly) the usernames - put # it into array and compare with a hash! $r_ary = $db_obj->sql_get_matched_rows_ary_ref ("tracker", [qw(username)], [username => ['=',\@watched], ] ); map {$matched{$_->[0]} = 1} @{$r_ary}; } # Now %matched includes the usernames of the users who are being # watched by $username and currently are online.
|
||
Written by Stas Bekman.
Last Modified at 08/16/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
dbm files are the first implementations of the databases, which originated on Unix systems, and currently being used in many Unix applications where simple key-value pairs should be stored and manipulated. As of this writing Berkeley DB is the most powerful dbm implementation. If you need a light database, with easy API to work with this is a solution that should be considered as a first one. Of course only if you are sure the DB you are going to use will stay small, I would say under 5000-10000 records, but it depends on your hardware, which can rise and lower the numbers above. It is a much better solution over the flat file databases which become pretty slow on insert, update and delete operations when the number of records grows beyond 1000. The situation is even worse when we need to run sort on this kind of DB.
dbm files are being manipulated much faster than their flat file brothers,
since almost never the whole DB is being read into a memory and because of
smart storage technique. You can use a HASH
algorithm which allows a 0(1)
complexity of search and update, fast insert and delete, but slow sort,
since you have to do it yourself. BTREE
allows arbitrary key/value pairs to be stored in a sorted, balanced binary
tree, which allows us to get a sorted sequence of data pairs in 0(1)
, but much slower insert, update, delete operations. RECNO
algorithm is more complicated one, and enables for both fixed-length and
variable-length flat text files to be manipulated using the same key/value
pair interface as in HASH
and
BTREE
. In this case the key will consist of a record (line) number. Most chances
you will want to use the HASH
format, but your choice is very dependent on a kind of your application.
dbm databases are not limited for key and value pairs storages, but can store
more complicated structures with help of MLDBM
module. Which can dump and restore the whole symbol table of your script,
including arrays, hashes and other complicated data HASH
structures.
Another important thing to say, is that you cannot convert a dbm file from one storing algorithm to another, by simply tying it using a wanted format. The only way is to dump it into a flat file and then restore it using a new format. You can use a script like:
#!/usr/bin/perl -w # # This script gets as a parameter a Berkeley DB file(s) which is stored # with DB_BTREE algorithm, and will backup it with .bak and create # instead the db with the same records but stored with DB_HASH # algorithm # # Usage: btree2hash.pl filename(s) use strict; use DB_File; use File::Copy; # Do checks die "Usage: btree2hash.pl filename(s))\n" unless @ARGV; foreach my $filename (@ARGV) { die "Can't find $filename: $!\n" unless -e $filename and -r $filename; # First backup the filename move("$filename","$filename.btree") or die "can't move $filename $filename.btree:$!\n"; my %hash; my %btree; # tie both dbs (db_hash is a fresh one!) tie %btree , 'DB_File',"$filename.btree", O_RDWR|O_CREAT, 0660, $DB_BTREE or die "Can't tie %btree"; tie %hash , 'DB_File',"$filename" , O_RDWR|O_CREAT, 0660, $DB_HASH or die "Can't tie %hash"; # copy DB %hash = %btree; # untie untie %btree ; untie %hash ; }
Note that some dbm implementations come with other conversion utilities as well.
Where mod_perl enters into a picture? If you are using a read only dbm file
you can have it work faster if you keep it open (tied) all the time, so
when your CGI script wants to access the database it is already tied and
ready to be used. It will work as well with your dynamic dbm databases as
well but you need to use locking to avoid data corruptions. Of course this
feature can make a huge speedup to your CGIs, but you should be very
careful. What should be taken into account is a db locking, handling
possible die()
cases and child quits. A stale lock can deactivate your whole site, if your
locking mechanism cannot handle dropped locks. You can enter a deadlock
situations if 2 processes are trying to acquire locks on 2 databases, but
get stuck because each has got hands on one of the 2 databases, and to
release it, each process needs the second one, which will never be freed,
because that is the condition for the first one to be released (possible
only if processes do not all ask for their DB files in the same order). If
you modify the DB you should be very careful to flush and synchronize it,
especially when your CGI unexpectedly dies. In general your application
should be tested very thoroughly before you put it into production to
handle important data.
Let's have a lock status as a global variable, so it will persist from request to request. If we are requesting a lock - READ (shared) or WRITE (exclusive), the current lock status is being obtained first.
If we get a READ lock request, it is granted as soon as file becomes or is locked or already locked for READ. Lock status is READ now.
If we get a WRITE lock request, it is granted as soon as file becomes or is unlocked. Lock status is WRITE now.
What happens to the WRITE lock request, is the most important. If the DB is being READ locked, request that request to write will poll until there will be no reading or writing process left. Lots of processes can successfully read the file, since they do not block each other from doing so. This means that a process that wants to write to the file (first obtaining an exclusive lock) never gets a chance to squeeze in. The following diagram represents a possible scenario where everybody read but no one can write:
[-p1-] [--p1--] [--p2--] [---------p3---------] [------p4-----] [--p5--] [----p5----]
So you get a starving process, which most certainly will timeout the request, and the DB will be not updated.
So you have another reason not to cache the dbm handle with dynamic dbm
files. But it will work perfect with the static DBM files without a need to
lock files at all. Ken Williams solved the above problem in his Tie::DB_Lock
module, and I will present it in the next section.
Tie::DB_Lock
- ties hashes to databases using shared and exclusive locks. A module by
Ken Williams. which solves the problem raised in the previous section.
The main difference from what I have described before is that
Tie::DB_Lock
copies a dbm file on read so that reader processes do not have to keep the
file locked while they read it, and writers can still access it while
others are reading. It works best when you have lots of long-duration
reading, and a few short bursts of writing.
The drawback of this module is a heavy IO performed when every reader makes a fresh copy of the DB. With big dbm files this can be quite a disadvantage and slowdown. An improvement that can cut a number of files that are being copied, would be to have only one copy of the dbm image that will be shared by all the reader processes. So it would put the responsibility of copying the read-only file on the writer, not the reader. It would take some care to make sure it does not disturb readers when putting a new read-only copy into place.
I have discussed what can be achieved with mod_perl and dbm files, the cons
and pros. Now it is a time to show some code. I wrote a simple wrapper for
a DB_File
module, and extended it to handle locking, and proper exits. Note that this
code still demands some testing, so be careful if you use it on your
production machine as is.
So the DB_File::Wrap
(note that you will not find it on CPAN):
package DB_File::Wrap; require 5.004; use strict; BEGIN { # RCS/CVS complient: must be all one line, for MakeMaker $DB_File::Wrap::VERSION = do { my @r = (q$Revision: 1.1.1.1 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; } use DB_File; use Fcntl qw(:flock O_RDWR O_CREAT); use Carp qw(croak carp verbose); use IO::File; use vars qw($debug); #$debug = 1; $debug = 0; # my $db = DB_File::Wrap \%hash, $filename, [lockmode]; # from now one we can work with both %hash (tie API) and $db (direct API) ######### sub new{ my $class = shift; my $hr_hash = shift; my $file = shift; my $lock_mode = shift || ''; my $db_type = shift || 'HASH'; my $self; $self = bless { db_type => 'DB_File', flags => O_RDWR|O_CREAT, mode => 0660, hash => $hr_hash, how => $DB_HASH, }, $class ; # by default we tie with HASH alg and if requested with BTREE $self->{'how'} = ($db_type eq 'BTREE') ? $DB_BTREE : $DB_HASH; # tie the object $self->{'db_obj'} = tie %{$hr_hash}, $self->{'db_type'},$file, $self->{'flags'},$self->{'mode'}, $self->{'how'} or croak "Can't tie $file:$!\n"; ; my $fd = $self->{'db_obj'}->fd; croak "Can't get fd :$!" unless defined $fd and $fd; $self->{'fh'}= new IO::File "+<&=$fd" or croak "[".__PACKAGE__."] Can't dup: $!"; # set the lock status to unlocked $self->{'lock'} = 0; # do the lock here if requested $self->lock($lock_mode) if $lock_mode; return $self; } # end of sub new # lock the fd either exclusive or shared lock (write/read) # default is read (shared) ########### sub lock{ my $self = shift; my $lock_mode = shift || 'read'; # lock codes: # 0 == not locked # 1 == read locked # 2 == write locked if ($lock_mode eq 'write') { # Get the exclusive write lock unless (flock ($self->{'fh'}, LOCK_EX | LOCK_NB)) { unless (flock ($self->{'fh'}, LOCK_EX)) { croak "exclusive flock: $!" } } # save the status of lock $self->{'lock'} = 2; } elsif ($lock_mode eq 'read'){ # Get the shared read lock unless (flock ($self->{'fh'}, LOCK_SH | LOCK_NB)) { unless (flock ($self->{'fh'}, LOCK_SH)) { croak "shared flock: $!" } } # save the status of lock $self->{'lock'} = 1; } else { # incorrect mode carp "Can't lock. Unknown mode: $lock_mode\n"; } } # end of sub lock # unlock ########### sub unlock{ my $self = shift; $self->{'db_obj'}->sync() if defined $self->{'db_obj'}; # flush flock($self->{'fh'}, LOCK_UN); $self->{'lock'} = 0; } # untie the hash # and close all the handlers # if wasn't unlocked, end() will unlock as well ########### sub end{ my $self = shift; # unlock if stilllocked $self->unlock() if $self->{'lock'}; delete $self->{'db_obj'} if $self->{'db_obj'}; untie %{$self->{'hr_hash'}} if $self->{'hr_hash'}; $self->{'fh'}->close if $self->{'fh'}; } # DESTROY makes all kinds of cleanups if the fuctions were interuppted # before their completion and haven't had a chance to make a clean up. ########### sub DESTROY{ my $self = shift; # just to be sure that we properly closed everything $self->end(); print "Destroying ".__PACKAGE__."\n" if $debug; undef $self if $self; } #### END { print "Calling the END from ".__PACKAGE__."\n" if $debug; } 1;
And you use it :
use DB_File::Wrap ();
A simple tie, READ lock and untie
my $dbfile = "/tmp/test"; my %mydb = (); my $db = new DB_File::Wrap \%mydb, $dbfile, 'read'; print $mydb{'stas'} if exists $mydb{'stas'}; # sync and untie $db->end();
You can even skip the end()
call, if leave the scope $db
defined in:
sub user_exists{ my $user = shift; my $result = 0; my %mydb = (); my $db = new DB_File::Wrap \%mydb, $dbfile, 'read'; # if we match the username return 1 $result = 1 if $mydb{$user}; $result; } # end of sub user_exists
Perform both, read and write operations:
my $dbfile = "/tmp/test"; my %mydb = (); my $db = new DB_File::Wrap \%mydb, $dbfile; print $mydb{'stas'} if exists $mydb{'stas'}; # lock the db, we gonna change it! $db->lock('write'); $mydb{'stas'} = 1; # unlock the db for write # sync and untie $db->end();
If your CGI was interrupted in the middle, DESTROY
block will worry to unlock the dbm file and flush the changes. Note that I
have got db corruptions even with this code on huge dbm files 10000+
records, so be careful when you use it. I thought that I have covered all
the possible failures but seems that not all of them. At the end I have
moved everything to work with mysql. So if you figure out where the problem is you are very welcome to tell me
about it.
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
You have fallen in love with mod_perl from the first sight, since the moment you have installed it at your home box. But when you wanted to convert your CGI scripts, currently running on your favorite ISPs machine, to run under mod_perl - you have discovered, your ISPs either have never heard of such a beast, or refuse to install it for you.
You are an old sailor in the ISP business, you have seen it all, you know how many ISPs are out there and you know that the sales margins are too low to keep you happy. You are looking for some new service almost no one provides, to attract more clients to become your users and hopefully to have a bigger slice than a neighbor ISP.
If you are a user asking for a mod_perl service or an ISP considering to provide this service, this section should make things clear for both of you.
an ISP has 3 choices to choose from:
ISP cannot afford having a user, running scripts under mod_perl, on the main server, since it will die very soon for one of the many reasons: either sloppy programming, or user testing just updated script which probably has some syntax errors and etc, no need to explain why if you are familiar with mod_perl peculiarities. The only scripts that CAN BE ALLOWED to use, are the ones that were written by ISP and are not being modified by user (guest books, counters and etc - the same standard scripts ISPs providing since they were born). So you have to say NO for this choice.
More things to think about are file permissions (any user who is allowed to
write and run CGI script, can at least read if not write any other files
that has a permissions of the web server. This has nothing to do with
mod_perl, and there are solutions for that
suEXEC
and cgiwrap
for example) and Apache::DBI
connections (You can pick a connection from the pool of cached
connenctions, opened by someone else by hacking the Apache::DBI
code).
But, hey why I cannot let my user to run his own server, so I clean my hands off and do not care how dirty and sloppy user's code is (assuming that user is running the server by his own username).
This option is fine as long as you are concerned about your new system requirements. If you have even some very limited experience with mod_perl, you know that mod_perl enabled apache servers while freeing up your CPU and lets you run scripts much much faster, has a huge memory demands (5-20 times the plain apache uses). The size depends on the code length, sloppiness of the programmer, possible memory leaks the code might have and all that multiplied by the number of children each server spawns. A very simple example : a server demanding 10Mb of memory which spawns 10 children, already rises your memory requirements by 100Mb (the real requirement are actually smaller if your OS allows code sharing between processes and a programmer exploits these features in her code). Now multiply the received number by the number of users you intend to have and you will get the memory requirements. Since ISPs never say no, you better use an opposite approach - think of a largest memory size you can afford then divide it by one user's requirements as I have shown in example, and you will know how much mod_perl users you can afford :)
But who am I to prognosticate how much memory your user may use. His requirement from a single server can be very modest, but do you know how many of servers he will run (after all she has all the control over httpd.conf - and it has to be that way, since this is very essential for the user running mod_perl)?
All this rumbling about memory leads to a single question: Can you restrict user from using more than X memory? Or another variation of the question: Assuming you have as much memory as you want, can you charge user for the average memory usage?
If the answer for either of the above question is positive, you are all set
and your clients will prize your name for letting them run mod_perl! There
are tools to restrict resources' usage (See for example man pages for ulimit(3)
, getrlimit(2)
, setrlimit(2)
and sysconf(3)
).
<META> If you have an experience with some restriction techniques please share with us. Thank you! </META>
If you have picked this choice, you have to provide your client:
Shutdown/startup scripts installed together with the rest of your daemon
startup scripts (e.g /etc/rc.d
directory) scripts, so when you reboot your machine user's server will be
correctly shutdowned and will be back online the moment your system comes
back online. Also make sure to start each server under username the server
belongs to, if you are not looking for a big trouble.
Proxy (in a forward or httpd accelerator mode) services for user's virtual
host. Since user will have to run her server on unprivileged port
(>1024), you will have to forward all requests from
user.given.virtual.hostname:80
(which is
user.given.virtual.hostname
without port - 80 is a default) to
your.machine.ip:port_assigned_to_user
and user to code his scripts to write self referencing URLs to be of user.given.virtual.hostname
base of course.
Letting user to run a mod_perl server, immediately adds a requirement for user to be able to restart and configure their own server. But only root can bind port 80. That is why user has to use ports numbers >1024.
Another problem you will have to solve is how to assign ports between
users. Since user can pick any port above 1024 to run his server on, you
will have to make some regulation here. A simple example will stress the
importance of this problem: I am a malicious user or I just a rival of some
fellow who runs his own server on your ISP. All I should do is to find out
what port his server is listening to (e.g. with help of netstat(8)
) and configure my own server to listen on the same port. While I am unable
to bind to this same port, imagine what will happen when you reboot your
system and my startup script happen to be run before my rivals! I get the
port first, now all requests will be redirected to my server and let your
imagination go wild about what nasty things might happen then. Of course
the ugly things will be revealed pretty soon, but the damage has been done.
A much better, but costly solution is co-location. Let user to hook her (or ISP's) stand alone machine into your network, and forget about this user. Of course either user or you will have to make all the system administration chores and it will cost your client more money.
All in all, who are the people who seek the mod_perl support? The ones who run serious projects/businesses, who can afford a stand alone box, thus gaining their goal of self autonomy and keeping their ISP happy. So money is not an obstacle.
If you are about to use Virtual Hosts you might want to read these sections:
Easing the chores of configuring the virtual hosts with mod_macro
Is there a way to provide a different startup.pl file for each individual virtual host
Is there a way to modify @INC on a per-virtual-host basis
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This is a very useful feature. You can watch what happens to the Perl parts of the server. Here are the instructions for configuring and using this feature:
Add this to http.conf:
<Location /perl-status> SetHandler perl-script PerlHandler Apache::Status order deny,allow #deny from all #allow from </Location>
If you are going to use Apache::Status
it's important to put it as the first module in the start-up file, or in
the httpd.conf:
# startup.pl use Apache::Registry (); use Apache::Status (); use Apache::DBI ();
If you don't put Apache::Status
before Apache::DBI
, you wouldn't get Apache::DBI
's menu entry in status.
For more about Apache::DBI
see Persistent DB Connections.
Assuming that your mod_perl server listens on port 81, fetch http://www.myserver.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 process 187138, running since Thu Nov 19 09:50:33 1998
Below all sections should be links:
Signal Handlers Enabled mod_perl Hooks PerlRequire'd Files Environment Perl Section Configuration Loaded Modules Perl Configuration ISA Tree Inheritance Tree Compiled Registry Scripts Symbol Table Dump
Let's follow, for example, PerlRequire
'd Files. We see:
PerlRequire Location /home/perl/apache-startup.pl /home/perl/apache-startup.pl
From some menus you can continue deeper to peek into the internals of the server, to see the values of the global variables in the packages, to the cached scripts and modules, and much more. Just click around...
Sometimes when you fetch /perl-status
and follow the Compiled
Registry Scripts you see no listing of scripts at all. This is absolutely correct: Apache::Status
shows the registry scripts compiled in the httpd child which is serving
your request for /perl-status. If a child has not compiled yet the script
you are asking for, /perl-status will just show you the main menu.
|
||
Written by Stas Bekman.
Last Modified at 07/29/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
See Sometimes it Works Sometimes it does Not
To debug scripts running under mod_perl either use Apache::DB (interactive Perl debugging) or an older non-interactive method as described below.
NonStop
debugger option enables us to get some decent debug info when running under
mod_perl. For example, before starting the server:
% setenv PERL5OPT -d % setenv PERLDB_OPTS "NonStop=1 LineInfo=db.out AutoTrace=1 frame=2"
Now watch db.out for line:filename info. This is most useful for tracking
those core dumps that normally leave us guessing, even with a stack trace
from gdb. db.out will show you what Perl code triggered the core. 'man
perldebug' for more PERLDB_OPTS. Note, Perl will ignore PERL5OPT if PerlTaintCheck
is On
.
Perl ships with a very useful interactive debugger, however, it does not
run ``out-of-the-box'' in the Apache/mod_perl environment.
Apache::DB
makes a few adjustments so the two will cooperate.
To configure it use:
<Location /perl> PerlFixupHandler +Apache::DB SetHandler perl-script PerlHandler +Apache::Registry Options +ExecCGI </Location>
You must run the server in the single mode (with -X) to use
Apache::DB
.
When you run the script for the first time, you should let it run until it finishes. Starting from the second run you can run it as if it was a regular perl script.
Module and Scripts that were preloaded and compiled during the server startup will be not debuggable.
The filename and lines of Apache::Registry scripts are not displayed.
See the SUPPORT doc for hints on getting a stacktrace. To do this with
make test, build mod_perl giving Makefile.PL
the PERL_DEBUG=1
flag. Now execute
gdb ../apache_x.x.x/src/httpd (gdb) thttpd
(thttpd is defined in .gdbinit)
then, in another term run make run_tests, and at the gdb prompt:
(gdb) bt
To enable mod_perl debug tracing configure mod_perl with the PERL_TRACE option:
perl Makefile.PL PERL_TRACE=1
The trace levels can then be enabled via the MOD_PERL_TRACE
environment variable which can contain any combination of:
d - Trace directive handling during configuration read s - Trace processing of perl sections h - Trace Perl*Handler callbacks g - Trace global variable handling, interpreter construction, END blocks, etc. all - all of the above
add to httpd.conf:
PerlSetVar MOD_PERL_TRACE all
For example if you want to see a trace of the PerlRequire's and PerlModule's as they are loaded, use:
PerlSetVar MOD_PERL_TRACE d
As you know you need an unstriped executable to be able to debug it. While
you can compile the mod_perl with -g
(or PERL_DEBUG=1
) the apache install
strips the symbols.
Makefile.tmpl contains a line:
IFLAGS_PROGRAM = -m 755 -s
Removing the -s does the trick.
As I have mentioned it before, error_log
file is your best friend in CGI code debugging process.
While debugging my mod_perl and general CGI code, I keep my
error_log
file open in a dedicated terminal window (xterm), so I can see what
errors/warnings are being reported by server right away. I do it with:
tail -f /usr/local/apache/logs/error_log
which shows all the lines that are being added to the file.
If you cannot access your error_log
file because you are unable to telnet to your machine (generally a case
with some ISPs who provides user CGI support but no telnet access), you
might want to use a CGI script I wrote to fetch the latest lines from the
file (with a bonus of colored output for an easier reading). You might need
to ask your ISP to install this script for a general usage. See Watching the error_log file without telneting to the server.
Current perl implementation does not restore the original apache's C
handler when you use local $SIG{FOO}
clause. While save/restore of
$SIG{ALRM}
was fixed in the mod_perl 1.19_01 (CVS version), other signals are not yet
fixed. The real fix should probably be in Perl itself.
Untill recent local $SIG{ALRM}
restored the SIGALRM
handler to Perl's handler, not the handler it was in the first place
(apache's
alrm_handler()
). if you build mod_perl with PERL_TRACE=1
and set the MOD_PERL_TRACE
environment variable to g, you will see this in the error_log
file:
mod_perl: saving SIGALRM (14) handler 0x80b1ff0 mod_perl: restoring SIGALRM (14) handler from: 0x0 to: 0x80b1ff0
If nobody touched $SIG{ALRM}
, 0x0
would be the same address as the others.
If you work with signal handlers take a look at Sys::Signal
module, which solves the problem:
Sys::Signal
- Set signal handlers with restoration of existing C sighandler. Get it
from the CPAN.
The usage is simple, if the original code was:
eval { local $SIG{ALRM} = sub { die "timeout\n" }; alarm $timeout; ... db stuff ... alarm 0; }; die $@ if $@;
If a timeout happens and SIGALRM
is thrown, the alarm()
will be reset, otherwise alarm 0
is reached and timer is being reset as well.
Now you would write:
use Sys::Signal (); eval { my $h = Sys::Signal->set(ALRM => sub { die "timeout\n" }); alarm $timeout; ... do something thay may timeout ... alarm 0; }; die $@ if $@;
To see where an httpd is ``spinning'', try adding this to your script or a startup file:
use Carp (); $SIG{'USR1'} = sub { Carp::confess("caught SIGUSR1!"); };
Then issue the command line:
kill -USR1 <spinning_httpd_pid>
It is possible to profile code run under mod_perl with the
Devel::DProf
module available on CPAN. However, you must have apache version 1.3b3 or
higher and the PerlChildExitHandler
enabled. When the server is started, Devel::DProf
installs an
END
block to write the tmon.out
file, which will be run when the server is shutdown. Here's how to start
and stop a server with the profiler enabled:
% setenv PERL5OPT -d:DProf % httpd -X -d `pwd` & ... make some requests to the server here ... % kill `cat logs/httpd.pid` % unsetenv PERL5OPT % dprofpp
See also: Apache::DProf
(META: below are some snippets of strace outputs from list's emails)
[there was a talk about Streaming LWP through mod_perl and the topic was suggested optimal buffer size]
Optimal buffer size depends on your system configuration, watch apache with strace -p
(or truss
) when its sending a static file, here perlfunc.pod on my laptop (linux
2.2.7):
writev(4, [{"HTTP/1.1 200 OK\r\nDate: Wed, 02"..., 289}, {"=head1 NAME\n\nperlfunc - Perl b"..., 32768}], 2) = 33057 alarm(300) = 300 write(4, "m. In older versions of Perl, i"..., 32768) = 32768 alarm(300) = 300 write(4, "hout waiting for the user to hit"..., 32768) = 32768 alarm(300) = 300 write(4, ">&STDOUT") || die "Can't dup "..., 32768) = 32768 alarm(300) = 300 write(4, "LEHANDLE is supplied. This has "..., 32768) = 32768 alarm(300) = 300 write(4, "ite>,\nC<seek>, C<tell>, or C<eo"..., 25657) = 25657
Devel::Peek - A data debugging tool for the XS programmer
Let's see an example of Perl allocating buffer size only once, regardless
of my()
scoping, although it will realloc if the size is >
SvLEN:
use Devel::Peek; for (1..3) { foo(); } sub foo { my $sv; Dump $sv; $sv = 'x' x 100_000; $sv = ""; }
The output:
SV = NULL(0x0) at 0x8138008 REFCNT = 1 FLAGS = (PADBUSY,PADMY) SV = PV(0x80e5794) at 0x8138008 REFCNT = 1 FLAGS = (PADBUSY,PADMY) PV = 0x815f808 ""\0 CUR = 0 LEN = 100001 SV = PV(0x80e5794) at 0x8138008 REFCNT = 1 FLAGS = (PADBUSY,PADMY) PV = 0x815f808 ""\0 CUR = 0
We can see that on subsequent calls (after the first one) $sv
already has a preallocated memory.
so, if you can afford the memory, the larger the buffer means less
brk()
syscalls. if you watch that example with strace, you will only see calls to brk()
in the first time through the loop. So, this is a case where you module
might want to pre-allocate the buffer for example for LWP, a file scope
lexical, like so:
package Your::Proxy; my $buffer = ' ' x 100_000; $buffer = "";
This way, only the parent has to brk()
at server startup, each
child already will already have an allocated buffer, just reset to ``'',
when you are done.
|
||
Written by Stas Bekman.
Last Modified at 09/22/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
In a URL such as http://my.site.com/foo.pl?foo=bar®=foobar
, some browsers will interpret ®
as a magic entity, and encode it as
®
, which will result in a corrupted QUERY_STRING
. If you encounter this problem you should either avoid using such keys or
separate parameter pairs with ;
instead of &
. Both CGI.pm
and
Apache::Request
support a semicolon instead of an ampersand as a separator. So your URI
should look like:
http://my.site.com/foo.pl?foo=bar;reg=foobar
.
Note that this is only an issue when you are building your own URLs with query strings. It is not a problem when the URL is the result of submitting a form because the browsers _have_ to get that right.
One problem with publishing 8080 port numbers is that (so I was told) IE 4.x has a bug when re-posting data to a non-port-80 URL. It drops the port designator and uses port 80 anyway.
See Publishing port numbers different from 80
|
||
Written by Stas Bekman.
Last Modified at 07/29/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This module provides the Apache/mod_perl user a mechanism for storing
persistent user data in a global hash, which is independent of its real
storage mechanism. Currently you can choose from these storage mechanisms Apache::Session::DBI
, Apache::Session::Win32
,
Apache::Session::File
, Apache::Session::IPC
. Read the man page of the mechanism you want to use for a complete
reference.
What Apache::Session
does is provide persistence to a data structure. The data structure has an
ID number, and you can retrieve it by using the ID number. In the case of
Apache, you would store the ID number in a cookie or the URL to associate
it with one browser, but the method of dealing with the ID is completely up
to you. The flow of things is generally:
Tie a session to Apache::Session. Get the ID number. Store the ID number in a cookie. End of Request 1.
(time passes)
Get the cookie. Restore your hash using the ID number in the cookie. Use whatever data you put in the hash. End of Request 2.
Using Apache::Session
is easy: simply tie a hash to the session object, stick any data structure
into the hash, and the data you put in automatically persists until the
next invocation. Here is a quick example which uses cookies to track the
user's session.
# Pull in the require packages use Apache::Session::DBI; use Apache; use strict; # Read in the cookie if this is an old session my $r = Apache->request; my $cookie = $r->header_in('Cookie'); $cookie =~ s/SESSION_ID=(\w*)/$1/; # Create a session object based on the cookie we got from the # browser, or a new session if we got no cookie my %session; tie %session, 'Apache::Session::DBI', $cookie, {DataSource => 'dbi:mysql:sessions', UserName => $db_user, Password => $db_pass }; # Might be a new session, so lets give them their cookie back my $session_cookie = "SESSION_ID=$session{_session_id};"; $r->header_out("Set-Cookie" => $session_cookie);
After setting this up, you can stick anything you want into
%session
(except file handles), and it will still be there
when the user invokes the next page.
It is possible to write an Apache authen handler using
Apache::Session
. You can put your authentication token into the session. When a user
invokes a page, you open their session, check to see if they have a valid
token, and approve or deny their authorization based on that.
As for IIS, let's compare. IIS's sessions are only valid on the same web
server as the one that issued the session.
Apache::Session
's session objects can be shared amongst a farm of many machines running
different operating systems, including even Win32. IIS stores session
information in RAM. Apache::Session
stores sessions in databases, file systems, or RAM. IIS's sessions are only
good for storing scalars or arrays. Apache::Session
's sessions allow you to store arbitrarily complex objects. IIS sets up the
session and automatically tracks it for you. With
Apache::Session
, you setup and track the session yourself. IIS is proprietary. Apache::Session
is open-source.
Apache::Session::DBI
can issue 400+ session requests per second on light Celeron 300A running
Linux. IIS?
An alternative to Apache::Session
is Apache::ASP, which has session tracking abilities. HTML::Embperl hooks
into Apache::Session
for you.
See mod_perl and relational Databases
This package contains modules for manipulating client request data via the Apache API with Perl and C. Functionality includes:
- parsing of application/x-www-form-urlencoded data
- parsing of multipart/form-data
- parsing of HTTP Cookies
The Perl modules are simply a thin xs layer on top of libapreq, making them
a lighter and faster alternative to CGI.pm and CGI::Cookie. See the Apache::Request
and Apache::Cookie
documentation for more details and eg/perl/ for examples.
Apache::Request
and the libapreq are tied tight to the Apache API, which there is no access
to in a process running under mod_cgi.
See Apache::PerlRun - a closer look.
Have you ever served a huge HTML file (e.g. a file bloated with JavaScript code) and wandered how could you send it compressed, thus drammatically cutting down the download times. After all java applets can be compressed into a jar and benefit from a faster download times. Why cannot we do the same with a plain ASCII (HTML,JS and etc), it is a known fact that ASCII text can be compressed by a factor of 10.
Apache::GzipChain
comes to help you with this task. If a client (browser) understands gzip
encoding this module compresses the output and sends it downstream. A
client decompresses the data upon receive and renders the HTML as if it was
a plain HTML fetch.
For example to compress all html files on the fly, do:
<Files *.html> SetHandler perl-script PerlHandler Apache::OutputChain Apache::GzipChain Apache::PassFile </Files>
Remember that it will work only if the browser claims to accept compressed
input, thru Accept-Encoding
header. Apache::GzipChain
keeps a list of user-agents, thus it also looks at User-Agent
header, for known to accept compressed output browsers.
For example if you want to return compressed files which should pass in addition through Embperl module, you would write:
<Location /test> SetHandler perl-script PerlHandler Apache::OutputChain Apache::GzipChain Apache::EmbperlChain Apache::PassFile </Location>
Hint: Watch an access_log
file to see how many bytes were actually send, compare with a regular
configuration send.
(See perldoc Apache::GzipChain
).
With that module, you can configure @INC
and have modules reloaded for a given Location
, e.g. say two versions of Apache::Status
are being hacked on in the same server, this fixup handler will simply
delete $INC{ $filename }
, unshift the prefered PerlINC
path into
@INC
, and reload the file with require()
:
PerlModule Apache::PerlVINC
<Location /dougm-status> SetHandler perl-script PerlHandler Apache::Status PerlINC /home/dougm/dev/modperl/lib PerlVersionINC On PerlFixupHandler Apache::PerlVINC PerlRequire Apache/Status.pm </Location>
<Location /other-status> SetHandler perl-script PerlHandler Apache::Status PerlINC /home/other/current/modperl/lib PerlVersionINC On PerlFixupHandler Apache::PerlVINC PerlRequire Apache/Status.pm </Location>
To address possible issues of namespace clashes during reload, the handler could call $r->child_terminate() so the next server to load the different versions will have a fresh namespace. (not a good idea in a high load environment, of course.)
If it is still absent from CPAN get it at: http://perl.apache.org/~dougm/Apache-PerlVINC-0.01.tar.gz
It works just like Apache::Registry
, but does not test the x bit, only compiles the file once, and does not chdir()
into the script parent directory.
Configuration:
PerlModule Apache::RegistryBB <Location /perl> SetHandler perl-script PerlHandler ApacheRegistryBB->handler </Location>
When Apache's builtin syslog support is used, the stderr stream is
redirected to /dev/null
. This means Perl warnings, any messages from die()
, croak()
, etc., will also end up in the black hole. The HookStderr directive will hook the stderr stream to a file of your choice, the default
is shown in this example:
PerlModule Apache::LogSTDERR HookStderr logs/stderr_log
|
||
Written by Stas Bekman.
Last Modified at 08/17/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This new document was born because some users reluctant to learn perl prior to jumping into a mod_perl. I will try to cover some of the most frequent pure perl questions being asked at the list.
To find what functions perl has, you would execute:
perldoc perlfunc
To learn the syntax and to find an example of specific known function, you
would execute (e.g. for open()
):
perldoc -f open
There is a bug in this option, for it wouldn't call pod2man and display the section in POD. But it's still readable and very useful.
To search the Perl FAQ (perlfaq
) sections you would do (e.g for an
open keyword):
perldoc -q open
will return you all the matching Q&A sections, still in POD.
When you first wrote $x
in your code you created a global
variable. It is visible everywhere in the file you have use it. or if
defined it inside a package - it is visible inside this package. But it
will work only if you do not use strict
pragma and you HAVE to use this pragma if you want to run your scripts under mod_perl. Read Strict pragma to find out why.
First you use :
use strict;
Then you use:
use vars qw($scalar %hash @array);
Starting from this moment the variables are global in the package you defined them, if you want to share global variables between packages, here what you can do.
Assume that you want to share the CGI.pm
's object (I will use $q
) between your modules. For example you create it in the script.pl
, but want it to be visible in My::HTML
. First - you make $q
global.
script.pl: ---------------- use vars qw($q); use CGI; use lib qw(.); use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl $q = new CGI; My::HTML::printmyheader(); ----------------
Note that we have imported $q
from My::HTML
. And the My::HTML
which does the export of $q
:
My/HTML.pm ---------------- package My::HTML; use strict; BEGIN { use Exporter (); @My::HTML::ISA = qw(Exporter); @My::HTML::EXPORT = qw(); @My::HTML::EXPORT_OK = qw($q); } use vars qw($q); sub printmyheader{ # Whatever you want to do with $q... e.g. print $q->header(); } 1; -------------------
So the $q
is being shared between the My::HTML
package and the
script.pl
. It will work vice versa as well, if you create the object in the My::HTML
but use it in the script.pl
. You have a true sharing, since if you change $q
in script.pl
, it will be changed in My::HTML
as well.
What if you need to share $q
between more than 2 packages? For example you want My::Doc to share $q
as well.
You leave the My::HTML
untouched, modify the script.pl to include:
use My::Doc qw($q);
And write the My::Doc
exactly like My::HTML
- of course that the content is different :).
One possible pitfall is when you want to use the My::Doc
in both
My::HTML
and script.pl
. Only if you add:
use My::Doc qw($q);
Into a My::HTML
, the $q
will be shared. Otherwise My::Doc
will not share the $q
anymore. To make things clear here is the code:
script.pl: ---------------- use vars qw($q); use CGI; use lib qw(.); use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl use My::Doc qw($q); # Ditto $q = new CGI; My::HTML::printmyheader(); ----------------
My/HTML.pm ---------------- package My::HTML; use strict; BEGIN { use Exporter (); @My::HTML::ISA = qw(Exporter); @My::HTML::EXPORT = qw(); @My::HTML::EXPORT_OK = qw($q); } use vars qw($q); use My::Doc qw($q); sub printmyheader{ # Whatever you want to do with $q... e.g. print $q->header(); My::Doc::printtitle('Guide'); } 1; -------------------
My/Doc.pm ---------------- package My::Doc; use strict; BEGIN { use Exporter (); @My::Doc::ISA = qw(Exporter); @My::Doc::EXPORT = qw(); @My::Doc::EXPORT_OK = qw($q); } use vars qw($q); sub printtitle{ my $title = shift || 'None'; print $q->h1($title); } 1; -------------------
As the title says you can import a variable into a script/module without
using an Exporter.pm. I have found it useful to keep all the configuration
variables in one module My::Config
. But then I have to export all the variables in order to use them in other
modules, which is bad for two reasons: polluting other packages' name
spaces with extra tags which rise up the memory requirements, adding an
overhead of keeping track of what variables should be exported from the
configuration module and what imported for some particular package. I solve
this problem by keeping all the variables in one hash %c
and exporting only it. Here is an example of My::Config
:
package My::Config; use strict; use vars qw(%c); %c = ( # All the configs go here scalar_var => 5, array_var => [ foo, bar, ], hash_var => { foo => 'Foo', bar => 'BARRR', }, ); 1;
Now in packages that want to use the configuration variables I have either
to use the fully qualified names like $My::Config::test
, which I dislike or import them as described in the previous section. But
hey, since we have only one variable to handle, we can make things even
simpler and save the loading of the Exporter.pm
package. We will use aliasing perl feature for exporting and saving the
keystrokes:
package My::HTML; use strict; use lib qw(.); # Global Configuration now aliased to global %c use My::Config (); # My/Config.pm in the same dir as script.pl use vars qw(%c); *c = \%My::Config::c; # Now you can access the variables from the My::Config print $c{scalar_val}; print $c{array_val}[0]; print $c{hash_val}{foo};
Of course $c
is global everywhere you use it as described
above, and if you change it somewhere it will affect any other packages you
have aliased $My::Config::c
to.
Note that aliases work either with global or local()
vars - you cannot write:
my *c = \%My::Config::c;
Which is an error. But you can:
local *c = \%My::Config::c;
|
||
Written by Stas Bekman.
Last Modified at 08/16/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Many people use relative paths for require
, use
, etc., or open files in the current directory or relative to the current
directory. But this will fail if you don't chdir()
into the correct directory first (e.g when you call the script by its full
path). This code would work:
/home/httpd/perl/test.pl: ------------------------- #!/usr/bin/perl open IN, "./foo.txt"; -------------------------
if we call the script by:
% chdir /home/httpd/perl % ./test.pl
since foo.txt
is located at the same directory the script is being called from. if we
call the script by:
% /home/httpd/perl/test.pl
when we aren't chdir to the /home/httpd/perl
, the script will fail to find foo.txt
. If you don't want to use hardcoded directories in your scripts, FindBin.pm
package will come to rescue.
use FindBin qw($Bin); use lib $Bin; open IN, "./foo.txt";
or
use FindBin qw($Bin); open IN, "$Bin/foo.txt";
Now $Bin
includes the path of the directory the script resides in, so you can move
the script from one directory to the other and call it from anywhere else.
The paths will be always correct.
It's different from using "./foo"
, for you first have to chdir
to the directory in which the script is located. (Think about crontab
s!!!)
I wrote this script a long time ago, when I had to debug my CGI scripts but
didn't have the access to the error_log
file. I asked the admin to install this script and have used it happily
since then.
If your scripts are running on these 'Get-free-site' servers, and you
cannot debug your script because you can't telnet to the server or can't
see the error_log
, you can ask your sysadmin to install this script.
Ok, here is the code:
#!/usr/bin/perl -Tw use strict; $|=1; my $default = 10; my $error_log = "/usr/local/apache/var/logs/error_log.1"; use CGI; # untaint $ENV{PATH} $ENV{'PATH'} = '/bin:/usr/bin'; delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'}; my $q = new CGI; my $counts = (defined $q->param('count') and $q->param('count')) ? $q->param('count') : $default; print $q->header, $q->start_html(-bgcolor => "white", -title => "Error logs"), $q->start_form, $q->center( $q->b('How many lines to fetch? '), $q->textfield('count',10,3,3), $q->submit('', 'Fetch'), $q->reset, ), $q->end_form, $q->hr; print($q->b("$error_log doesn't exist!!!")),exit unless -e $error_log; open LOG, "tail -$counts $error_log|" or die "Can't open tail on $error_log :$!\n"; my @logs = <LOG>; print $q->b('Note: Latest logs on the top'),$q->br; print "<UL>\n"; # format and colorize each line nicely foreach (reverse @logs) { s{ \[(.*?)\]\s* # date \[(.*?)\]\s* # type of error \[(.*?)\]\s* # client (.*) # the message } { "[$1] <BR> [". colorize($2,$2). "] <BR> [$3] <PRE>". colorize($2,$4). "</PRE>" }ex; print "<BR><LI>$_<BR>"; } print "</UL>\n"; close LOG; ############# sub colorize{ my ($type,$context) = @_; my %colors = ( error => 'red', crit => 'black', notice => 'green', warn => 'brown', ); return exists $colors{$type} ? qq{<B><FONT COLOR="$colors{$type}">$context</FONT></B>} : $context; }
Sometimes you want to access variables from the caller's package. One way is to do:
my $caller = caller; print qq[$caller --- ${"${caller}::var"}];
Unless you use some well known module like CGI.pm you can handle the cookies yourself.
Cookies come in the $ENV{HTTP_COOKIE}
variable. You can print the raw cookie string as $ENV{HTTP_COOKIE}
.
Here is a fairly well-known bit of code to take cookie values and put them into a hash:
sub getCookies { # cookies are seperated by a semicolon and a space, this will # split them and return a hash of cookies local(@rawCookies) = split (/; /,$ENV{'HTTP_COOKIE'}); local(%cookies); foreach(@rawCookies){ ($key, $val) = split (/=/,$_); $cookies{$key} = $val; } return %cookies; }
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
You can invest a lot of time and money into server tuning and code rewriting according the guidelines you have just learned, but your performance will be really bad if you do not take into account the hardware demands, and do not wisely choose the operating system suited for your needs. While the tips below apply to any webserver, they are written for an administrator of a mod_perl-enabled webserver
First let's talk about Operating Systems (OS). While I am personally a
Linux devotee, I do not want to start yet another OS war. Assuming this, I
will try to define what you should be looking for, then when you know what
do you want from your OS, go find it. Visit the Web sites of operating
systems you are interested in. You can gauge user's opinions by searching
relevant discussions in newsgroup and mailing list archives such as Deja -
http://deja.com and eGroups - http://egroups.com . I will leave this fan
research up to you. But I would use Linux or something from the
*BSD
family.
Probably the most desired features in an OS are stability and robustness. You are in an Internet business, which does not have normal working hours, like many conventional businesses you know about (9am to 5pm). You are open 24 hours a day. You cannot afford to be off-line, for your customers will go shop at another service like yours, unless you have a monopoly :) . If the OS of your choice crashes every day or so, I would throw it away, after doing a little investigation, for there might be a reason for a system crash. Like a runaway server that eats up all the memory and disk, so you cannot blame the OS for that. Generally, people who use the OS for some time can tell you a lot about its stability.
You want an OS with a good memory management, some OSes are well known as memory hogs. The same code can use twice as much memory on one OS compared to the other. If the size of the mod_perl process is 10Mb and you have tens of these running, it definitely adds up!
Some OSes and/or the libraries (like C runtime libraries) suffer from
memory leaks. You cannot afford such a system, for you are already know
that a single mod_perl process sometimes serves thousands of requests
before itimer terminates. So if a leak occurs on every request, your memory
demands will be huge. Of course your code can be the cause of the memory
leaks as well (check out the Apache::Leak
module). Certainly, you can lower the number of requests to be served over
the process' life, but that can degrade performance.
You want an OS with good memory sharing capabilities. As you have learned, if you preload the modules and scripts at server startup, they are shared between the spawned children, at least for a part of a process' life span, since memory pages become ``dirty'' and cease to be shared. This feature can save you up a lot of memory!
If you are in a big business you are probably do not mind paying another
$1000
for some fancy OS and to get the bundled support for it.
But if your resources are low, you will look for cheaper and free OS. Free
does not mean bad, it can be quite opposite as we all either know from our
own experience or read about in news. Free OSes could have and do have the
best support you can find. It is very easy to understand - most of the
people are not rich and will try to use a cheaper or free OS first if it
does the work for them. Since it really fits their needs, many people keep
using it and eventually know it well enough to be able to provide support
for others in trouble. Why would they do this for free? For the spirit of
the first days of the Internet, when there was no commercial Internet and
people helped each other, because someone helped them in first place. I was
there, I was touched by that spirit and I will do anything to keep that
spirit alive.
But, let's get back to our world. We are living in material world, and our bosses pay us to keep the systems running. So if you feel that you cannot provide the support yourself and you do not trust the available free resources, you must pay for an OS backed by a company, and blame them for any problem. Your boss wants to be able to sue someone if the project has a problem caused by the external product that is being used in the project. If you buy a product and the company selling it, claims support, you have someone to sue. You do not have someone to sue other than getting yourself fired if you go with Open Source and it fails.
Also remember that if you spend less or zero money on OS and Software, you will be able to buy a better and stronger hardware.
You have invested a lot of time and money into developing some proprietary software that is bundled with the OS you were developing on. Like writing a mod_perl handler that takes advantage of some proprietary features of the OS and it will not run on any other OS. Things are under control, the performance is great and you sing from happiness. But... one day the company who wrote your beloved OS goes bankrupt, which is not unlikely to happen nowadays. You are stuck with their last masterpiece and no support! What you are going to do then? Invest more into porting the software to another OS...
Everyone can be hit by this mini-disaster, so it is better to check the background of the company when making your choice, but still you never know what will happen tomorrow. The OSes in this hazard group are completely developed by a single companies. Free OSes are probably less susceptible to this, for development is distributed between many companies and developers, so if a person who developed a really important part of the kernel lost interest in continuing, someone else will pick the falling flag and carry on. Of course if tomorrow some better project showed up, developers might migrate there and finally drop the development, but we are here not to let this happen.
In the final analysis, the decision is yours.
Actively developed OSes generally try to keep the pace with the latest technology developments, and continually optimize the kernel and other parts of the OS to become better and faster. Nowadays, Internet and networking in general are the hottest targets for system developers. Sometimes a simple OS upgrade to a latest stable version, can save you an expensive hardware upgrade. Also, remember that when you buy new hardware, chances are that the latest software will make the most of it. Since the existing software (drivers) might support the brand new product because of its backwards compatibility with previous products of the same family, it might not reap all the benefits of the new features. It means that you could spend much less money for almost the same functionality if you were to buy a previous model of the same product.
While I am not fond of the idea of updating this section every day a new processor or memory type comes out, I will only hint what should you look for and suggest that sometimes the most expensive machine is not the one which provides the best performance.
Your demands are based on many aspects and components. Let's discuss some of them.
In discussion course you might meet some unfamiliar terms, here are some of them:
Clustering - a bunch of machines connected together to perform one big or many small computational tasks in a reasonable time.
Load balancing - users can remember only a name of one of your machines - namely of your server, but it cannot stand the heavy load, so you use a clustering approach, distributing the load over a number of machines. The central server, the one users access when they type the name of the service, works as a dispatcher, by redirecting requests to the rest of the machines, sometimes it also collects the results and return them to the users. One of the advantages is that you can take one of the machines down for a repair or upgrade, and your service will still work - the main server will not dispatch the requests to the machine that was taken down. I will just say that there are many load balancing techniques. (See High-Availability Linux Project for more info.)
NIC - Network Interface Card.
RAM - Random Access Memory
RAID - META
If you are building a fan site, but want to amaze your friends with a
mod_perl guest book, an old 486 machine will do it. If you are into a
serious business, it is very important to build a scalable server, so if
your service is successful and becomes popular, you get your server's
traffic doubled every few days, you should be ready to add more resources
dynamically. While we can define the webserver scalability more precisely,
the important thing is to make sure that you can add more power to your
webserver(s)
without investing additional money into a
software developing (almost, you will need a software to connect your
servers if you add more of them). It means that you should choose a
hardware/OS that can talk to other machines and become a part of the
cluster.
From the other hand if you prepare for a big traffic and buy a monster to do the work for you, what happens if your service does not prove to be as successful as you thought it would be. Then you spent too much money and meanwhile there were a new faster processors and other hardware components released, so you loose again.
Wisdom and prophecy , that's all it takes :)
Everybody knows that Internet is a cash hole, what you throw in, hardly comes back. This is not always true, but there is a lot of wisdom in these words. While you have to invest money to build a decent service, it can be cheaper! You can spend as much as 10 times more money on a strong new machine, but get only a 10% improvement in performance. Remember that a four year old processor is still very powerful.
If you really need a lot of power do not think about a single strong machine (unless you have money to throw away), think about clustering and load balancing. You can probably buy 10 times more older but very cheap machines and have a 8 times more power, then purchasing only one single new machine. Why is that? Because as I mentioned before generally the performance improvement is marginal while the price is much bigger. Because 10 machines will do faster disk I/O, than one single machine, even if the disk is much faster. Yes, you have more administration overhead, but there is a chance you will have it anyway, for in a short time the machine you have just invested in will not stand the load anyway and you will have to purchase more and think how to implement load balancing and file system distribution.
Why I am so convinced? Facts! Look at the most used services on the Internet: search engines, email servers and the like -- most of them are using a clustering approach. While you may not always notice that, they do it by hiding the real implementation behind the proxy servers.
You have the best hardware you can get, but the service is still crawling. Make sure you have a fast Internet connection. Not as fast as your ISP claims it to be, but fast as it should be. The ISP might have a very good connection to the Internet, but puts many clients on the same line. If these are heavy clients, your traffic will have to share the same line and the throughput will decline. Think about a dedicated connection and make sure it is truly dedicated. Trust the ISP but check it!
The idea of having a connection to The Internet is a little misleading. Many Web hosting and co-location companies have large amounts of bandwidth, but still have poor connectivity. The public exchanges, such as MAE-East and MAE-West, frequently become overloaded, yet many ISPs depend on these exchanges.
Private peering means that providers can exchange traffic much quicker.
Also, if your Web site is of global interest, check that the ISP has good global connectivity. If the Web site is going to be visited mostly by people in a certain country or region, your server should probably be located there.
If your service is I/O bound (does a lot of read/write operations to disk, remember that relational databases are sitting on disk as well) you need a very fast disk. So you should not spend money on Video card and monitor (monochrome card and 14`` B&W are perfectly adequate for a server -- you will probably be telnetted or ssh-ed in most of the time), but rather look for disks with the best price/performance ratio. Of course, ask around and avoid disks that have a reputation for headcrashes and other disasters.
With money in hand you should think about getting a RAID system. RAID is generally a box with many HDs. It is capable of reading and writing data much faster, and is protected against disk failures. It does this by duplicating the same data over a number of disks, so if one fails, the RAID controller detects it and the data is still correct on the duplicated disks. You must think about RAID or similar systems if you have an enormous data set to serve. (What is an enormous data set nowadays? Gigabytes, terabytes?).
Ok, we have a fast disk, what's next? You need a fast disk controller. So either you should use the one embedded on your motherboard or you should plug a controller card if the one you have onboard is not good enough.
How much RAM (Randomly Accessed Memory) do you need? Nowadays, chances are you will hear: ``Memory is cheap, the more you buy the better''. But how much is enough? The answer pretty straightforward: ``You do not want your machine to swap''. When the CPU needs to write something into memory, but notices that it is already full, it takes the least frequently used memory pages and swaps them out. Swapping out means writing the data to disk. Another process then references some of its own data, which happens to be on one of the pages that were just swapped out. The CPU, ever obliging, swaps it back in again, probably swapping out some other data that will be needed very shortly by another process. Carried to the extreme, the CPU and disk start to thrash hopelessly in circles, without getting any real work done. The less RAM there is, the more often this scenario arises. Worse, you can exhaust swap space as well, and then the troubles really set in...
How do you make a decision? You know the highest rate your server expects to serve pages and how long it takes to do so. Now you can calculate how many server processes you need. Knowing the maximum size any of your servers can get, you know how much memory you need. You probably need less memory than you have calculated if your OS supports memory sharing and you know how to make best use of this feature (preloading the modules and scripts at server startup). Do not forget that other essential system processes need memory as well, so you should plan not only for the web server, but also take into account the other players. Remember that requests can be queued, so you can afford to let your client wait for a few moments until a server is available to serve it, your numbers will be more correct, since you generally do not have the highest load, but you should be ready to bear the peaks. So you need to reserve at least 20% of free memory for peak situations. Many sites have crashed a few moments after a big scoop about them was posted and unexpected number of requests suddenly came in. (This is called a Slashdot effect, which was born at http://slashdot.org ) If you are about to announce something cool, be aware of the possible consequences.
The most important thing to understand is that you might use the most expensive components, but still get bad performance. Why? Let me introduce an annoying word: A bottleneck.
A machine is an aggregate of many big and small components. Each one of them may be a bottleneck. If you have a fast processor but a small amount of RAM (memory), the processor will be under-utilized waiting for the kernel to swap the memory pages in and out, because memory is too small to hold the most used ones. If you have a lot of memory and a fast processor and a fast disk, but a slow controller - the performance will be bad, and you have wasted money.
Use a fast NIC (Network Interface Card) that does not create a bottleneck. If it is slow, the whole service is slow. This is the most important component, since webservers are much more network-bound than disk-bound!
To use your money optimally you have to understand the hardware very well, so you will know what to pick. Otherwise, you should hire a knowledgeable hardware consultants and employ him/her on a regular basis, since your demands will probably change as time goes by and your hardware will likewise be forced to adapt as well.
|
||
Written by Stas Bekman.
Last Modified at 08/17/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Your need for scalability and flexibility depends on your needs from the web. If you want only a simple guest book or database gateway with no feature headroom, you can get away with any EASY_AND_FAST_TO_DEVELOP_TOOL (Exchange, MS IIS, Lotus Notes, etc).
Experience shows that you will soon want more functionality, that's the point you'll discover the limitations of these ``easy'' tools. Gradually, your boss will ask for increasing functionality and at some point you'll realize that the tool lacks flexibility and/or scalability. Then your boss will either buy another EASY_AND_FAST_TO_DEVELOP_TOOL and repeat the process (with different unforseen problems), or you'll start investing time learning how to use a powerful, flexible tool to make the long-term development cycle easier.
If you and your company are serious about delivering flexible Internet functionality, do your homework. Then urge your boss to invest a little extra time and resources to choose the right tool for the job. Your long-term Internet site will prove the results.
Each developer has a boss who participates in the decision-making process. Remember that the boss considers input from sales people, developers, the media and associates before handing down large decisions. Of course, results count! A sales brochure makes very little impact compared to a working demonstration, and demonstrations of company-specific and developer-specific results count big!
Personally, when I discovered mod_perl I did a lot of testing and coding at home and at work. Once I had a working heavy application, I came to my boss with 2 URLs - one for the plain CGI server and the other for the mod_perl-enabled server. It took about 30 secs for my boss to say: `Go with it''. Of course the moment I did it, I have had to provide all the support for other developers, that is why I took time to learn it in first place (that is how this guide was born!).
Chances are that if you've done your homework, you've learned the tools and can deliver results, you'll have a successful project. If you convince your boss to try a tool that you don't know very well, your results may suffer. If your boss follows your development process closely and sees much worse than expected progress, he might say ``forget it'' and wish never to give mod_perl a second chance.
Advocacy is a great thing for the open-source software movement, but it's best done quietly until you have confidence that you can show productivity. If you can demonstrate to your boss a heavy CGI which is running much faster under mod_perl, that may be a strong argument for further evaluation. Your company may even sponsor a portion of your learning process.
Learn the technology by working on sample projects. Learn how to support yourself and learn how to get support from the community; then advocate your ideas to your boss. Then you'll have the knowledge; your company will have the benefit; and mod_perl will have the reputation it deserves.
|
||
Written by Stas Bekman.
Last Modified at 07/29/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
If after reading this guide and other documents listed in this section, you feel that your question is not yet answered, please ask the apache/mod_perl mailing list to help you. But first try to browse the mailing list archive. Most of the time you will find the answer for your question by searching the mailing archive, since there is a big chance someone else has already encountered the same problem and found a solution for it. If you ignore this advice, do not be surprised if your question will be left unanswered - it bores people to answer the same question more than once. It does not mean that you should avoid asking questions. Just do not abuse the available help and RTFM before you call for HELP. (You have certainly heard the infamous fable of the shepherd boy and the wolves)
For more information See Get helped with mod_perl.
Hi, I wrote this document to help you with mod_perl. It does not mean that if you have any question regarding mod_perl, perl or whatever you think I might know, you should directly send it to me. Please see the Get helped with mod_perl section and follow the guidelines as prescribed there.
However, you are welcome to submit corrections and suggestions directly to me at sbekman@iname.com?subject=mod_perl%20guide%20corrections. If you are going to submit heavy corrections of the text (I love those!), please help me by downloading the source pages in POD (from the main age under the index) and directly editing them. I will use Emacs Ediff to perform an easy merge of your changes. Thank you! But PLEASE NO PERSONAL QUESTIONS, I didn't invite those by writing a guide. They all will be immediately deleted.
http://www.modperl.com is the home site of The Apache Modules Book, a book about creating Web server modules using the Apache API, written by Lincoln Stein and Doug MacEachern.
Now you can purchase the book at your local bookstore or from the online dealer. O'Reilly lists this book as:
Writing Apache Modules with Perl and C By Lincoln Stein & Doug MacEachern 1st Edition March 1999 1-56592-567-X, Order Number: 567X 746 pages, $34.95
by Frank Cringle at http://perl.apache.org/faq/ .
by Vivek Khera at http://perl.apache.org/tuning/ .
by Doug MacEachern at http://perl.apache.org/src/mod_perl.html .
http://www.refcards.com (Apache and other refcards are available from this link)
The Apache/Perl mailing list (modperl@apache.org) is available for
mod_perl users and developers to share ideas, solve problems and
discuss things related to mod_perl and the Apache::* modules. To subscribe to this list, send mail to majordomo@apache.org with empty Subject
and with Body
:
subscribe modperl
A searchable mod_perl mailing list archive available at http://forum.swarthmore.edu/epigone/modperl . We owe it to Ken Williams.
More archives available:
http://world.std.com/~swmcd/steven/perl/module_mechanics.html - This page describes the mechanics of creating, compiling, releasing and maintaining Perl modules.
http://www.singlesheaven.com/stas/TULARC/webmaster/myfaq.html
http://www.gunther.web66.com/FAQS/taintmode.html (by Gunther Birznieks)
http://www.refcards.com (Apache and other refcards are available from this link)
http://www.saturn5.com/~jwb/dbi-examples.html (by Jeffrey William Baker).
http://perl.apache.org/src/mod_perl.html#PERSISTENT_DATABASE_CONNECTIONS
Home page - http://squid.nlanr.net/
Users Guide - http://squid.nlanr.net/Squid/Users-Guide/
Mailing lists - http://squid.nlanr.net/Squid/mailing-lists.html
|
||
Written by Stas Bekman.
Last Modified at 09/25/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Here you will find instructions for downloading the software and the related documentation.
Perl is most likely already installed on your machine, but you should at least check the version you are using. It is highly recommended that you have at least Perl version 5.004. You can get the latest perl version from http://www.perl.com/ . Try the direct download link http://www.perl.com/pace/pub/perldocs/latest.html . You can get Perl documentation from the same location.
Get the latest Apache webserver and documentation from http://www.apache.org . Try the direct download link http://www.apache.org/dist/ .
Get the latest mod_perl sources and documentation from http://perl.apache.org . Try the direct download link http://perl.apache.org/dist/ .
Squid Linux 2.x Redhat RPMs : http://home.earthlink.net/~intrep/linux/
http://www.acme.com/software/thttpd/
Ask Bjoern Hansen has written a mod_proxy_add_forward.c
module for Apache that sets the X-Forwarded-For
field when doing a ProxyPass, similar to what Squid can do. His patch is
at: http://modules.apache.org/search?id=124
or at ftp://ftp.netcetera.dk/pub/apache/
http://www.hpl.hp.com/personal/David_Mosberger/httperf.html
Comes with the Apache distribution.
You will find the definite guide to load balancing techniques at the High-Availability Linux Project site -- http://www.henge.com/~alanr/ha/
Get it from CPAN at $CPAN/authors/id/DOUGM/libapreq-x.xx.tar.gz or from http://perl.apache.org/dist/libapreq-x.xx.tar.gz . (replace x.xx with the current version)
|
||
Written by Stas Bekman.
Last Modified at 08/17/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |