Table of Contents:
This document describes ``special'' traps you may encounter when running
your plain CGIs under Apache::Registry
and Apache::PerlRun
.
In a non modperl script (stand alone or CGI), there is no problem writing code like this:
use CGI qw/param/; my $x = param('x'); sub printit { print "$x\n"; }
However, the script is run under Apache::Registry
, it will in fact be repackaged into something like this:
package $mangled_package_name; sub handler { #line1 $original_filename use CGI qw/param/; my $x = param('x'); sub printit { print "$x\n"; } }
Now printit()
is an inner named subroutine. Because it is referencing a lexical variable
from an enclosing scope, a closure is created.
The first time the script is run, the correct value of $x
will
be printed. However on subsequent runs, printit()
will retain the initial value of $x
-- not what you want.
Always use -w
(or/and PerlWarn ON
)! Perl will then emit a warning like:
Value of $x will not stay shared at - line 5.
NOTE: Subroutines defined inside BEGIN{}
and END{}
cannot trigger this message, since each BEGIN{}
and END{}
is defined to be called exactly once. (To understand why, read about the
closures at
perlref
or perlfaq
13.12)
PERLDIAG manpage says:
An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.
Check your code by running Apache in single-child mode (httpd
-X
). Since the value of a my variable retain its initial value per
child process
, the closure problem can be difficult to track down in multi-user mode. It
will appear to work fine until you have cycled through all the httpd
children.
If a variable needs file scope, use a global variable:
use vars qw/$x/; use CGI qw/param/; $x = param('x'); sub printit { print "$x\n"; }
You can safely use a my()
scoped variable if its value is constant:
use vars qw/$x/; use CGI qw/param/; $x = param('x'); my $y = 5; sub printit { print "$x, $y\n"; }
Also see the clarification of my()
vs. use vars
- Ken Williams writes:
Yes, there is quite a bit of difference! With use vars(), you are making an entry in the symbol table, and you are telling the compiler that you are going to be referencing that entry without an explicit package name. With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler figures out _at_ _compile_time_ which my() variables (i.e. lexical variables) are the same as each other, and once you hit execute time you can not go looking those variables up in the symbol table.
And my()
vs. local()
- Randal Schwartz writes:
local() creates a temporal-limited package-based scalar, array, hash, or glob -- when the scope of definition is exited at runtime, the previous value (if any) is restored. References to such a variable are *also* global... only the value changes. (Aside: that is what causes variable suicide. :) my() creates a lexically-limited non-package-based scalar, array, or hash -- when the scope of definition is exited at compile-time, the variable ceases to be accessible. Any references to such a variable at runtime turn into unique anonymous variables on each scope exit.
For more information see: Using global variables and sharing them between modules/packages and an article by Mark-Jason Dominus about how Perl handles variables and
namespaces, and the difference between use vars()
and my()
- http://www.plover.com/~mjd/perl/FAQs/Namespaces.html
.
When using a regular expression that contains an interpolated Perl
variable, if it is known that the variable (or variables) will not vary
during the execution of the program, a standard optimization technique
consists of adding the /o
modifier to the regexp pattern. This directs the compiler to build the
internal table once, for the entire lifetime of the script, rather than
every time the pattern is executed. Consider:
my $pat = '^foo$'; # likely to be input from an HTML form field foreach( @list ) { print if /$pat/o; }
This is usually a big win in loops over lists, or when using grep()
or map()
operators.
In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken.
There are two solutions to this problem:
The first -- is to use eval q//
, to force the code to be evaluated each time. Just make sure that the eval
block covers the entire loop of processing, and not just the pattern match
itself.
The above code fragment would be rewritten as:
my $pat = '^foo$'; eval q{ foreach( @list ) { print if /$pat/o; } }
Just saying:
foreach( @list ) { eval q{ print if /$pat/o; }; }
is going to be a horribly expensive proposition.
You can use this approach if you require more than one pattern match
operator in a given section of code. If the section contains only one
operator (be it an m//
or s///
), you can rely on the property of the null pattern, that reuses the last
pattern seen. This leads to the second solution, which also eliminates the
use of eval.
The above code fragment becomes:
my $pat = '^foo$'; "something" =~ /$pat/; # dummy match (MUST NOT FAIL!) foreach( @list ) { print if //; }
The only gotcha is that the dummy match that boots the regular expression
engine must absolutely, positively succeed, otherwise the pattern will not
be cached, and the //
will match everything. If you can't count on fixed text to ensure the match
succeeds, you have two possibilities.
If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:
"$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present
If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the unsearchable \377 character as follows:
"\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present
Another approach:
It depends on the complexity of the regexp you apply this technique to. One common usage where compiled regexp is usually more efficient is to ``match any one of a group of patterns'' over and over again.
Maybe with some helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book ``Mastering Regex''.
##################################################### # Build_MatchMany_Function # -- Input: list of patterns # -- Output: A code ref which matches its $_[0] # against ANY of the patterns given in the # "Input", efficiently. # sub Build_MatchMany_Function { my @R = @_; my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R ); my $matchsub = eval "sub { $expr }"; die "Failed in building regex @R: $@" if $@; $matchsub; }
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww); $Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) { # ... $browser = get_browser_field($_); if ( ! &$Known_Browser($browser) ) { print STDERR "Unknown Browser: $browser\n"; } # ... }
Running in httpd -X mode. (good only for testing during development phase).
You want to test that your application correctly handles global variables
(if you have any - the less you have of them the better, but sometimes you
just can't without them). It's hard to test with multiple servers serving
your cgi since each child has a different value for its global variables.
Imagine that you have a random()
sub that returns a random number and you have the following script.
use vars qw($num); $num ||= random(); print ++$num;
This script initializes the variable $num
with a random value, then increments it on each request and prints it out.
Running this script in multiple server environments will result in
something like 1
,
9
, 4
, 19
(number per reload), since each time your script will be served by a
different child. (On some OSes, the parent httpd process will assign all of
the requests to the same child process if all of the children are idle...
AIX...). But if you run in httpd -X
single server mode you will get 2
, 3
, 4
, 5
... (assuming that the random()
returned 1
at the first call)
But do not get too obsessive with this mode, since working only in single server mode sometimes hides problems that show up when you switch to a normal (multi) server mode. Consider an application that allows you to change the configuration at run time.
Let's say the script produces a form to change the background color of the page. It's not a good design, but for the sake of demonstrating the potential problem, we will assume that our script doesn't write the changed background color to the disk, but simply changes it in memory, like:
use vars qw($bgcolor); # assign default value at first invocation $bgcolor ||= "white"; # modify the color if requested to $bgcolor = $q->param('bgcolor') || $bgcolor;
So you have typed in a new color, and in response, your script prints back the html with a new color - you think that's it! It was so simple. And if you keep running in single server mode you will never notice that you have a problem...
If you run the same code in the normal server mode, after you submit the color change you will get the result as expected, but when you will call the same URL again (not reload!) chances are that you will get back the original default color (white in our case), since except the child who processed the color change request no one knows about their global variable change. Just remember that children can't share information, other than that which they inherited from their parent on their load. Of course you should use a hidden variable for the color to be remembered or store it on the server side (database, shared memory, etc).
Also note that since the server is running in single mode, if the output
returns HTML with <IMG
> tags, then the load of these will take a lot of time.
When you use Netscape client while your server is running in single-process
mode, if the output returns a HTML with <IMG
> tags, then the load of these will take a lot of time, since the KeepAlive
feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off KeepAlive
in httpd.conf
to avoid this effect.
Also note that since the server is running in single mode, if the output
returns HTML with <IMG
> tags, then the load of these will take a lot of time. If you use
Netscape while your server is running in single-process mode, HTTP's KeepAlive
feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off
KeepAlive
in httpd.conf
to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the
Netscape will be able to render the rest of the page).
In addition you should know that when running with -X
you will not see any control messages that the parent server normally
writes to the error_log. (Like ``server started, server stopped and etc''.)
Since
httpd -X
causes the server to handle all requests itself, without forking any
children, there is no controlling parent to write status messages.
Under mod_perl, files that have been created after the server's (child?)
startup are being reported with negative age with -M
(-C
-A
) test. This is obvious if you remember that you will get the negative
result if the server was started before the file was created and it's a
normal behavior with any perl.
If you want to have -M
test to count the time relative to the current request, you should reset
the $^T
variable as with any other perl script. Just add $^T=time;
at the beginning of the scripts.
When a user presses the STOP button, Apache will detect that via
$SIG{PIPE}
and will cease the script execution. When we are talking about mod_cgi,
there is generally no problem, since all opened files will be closed and
all the resources will be freed (almost all -- if you happened to use
external lock files, most likely the resources that are being locked by
these will be left blocked and non-usable by any others who use the same
advisory locking scheme.)
It's important to notice that when the user hits the browser's STOP button, the mod_perl script is blissfully unaware until it tries to send some data to the browser. At that point, Apache realizes that the browser is gone, and all the good cleanup stuff happens.
Starting from apache 1.3.6 apache will not catch SIGPIPE anymore and modperl will do it much better. Here is something from CHANGES from Apache 1.3.6.
*) SIGPIPE is now ignored by the server core. The request write routines (ap_rputc, ap_rputs, ap_rvputs, ap_rwrite, ap_rprintf, ap_rflush) now correctly check for output errors and mark the connection as aborted. Replaced many direct (unchecked) calls to ap_b* routines with the analogous ap_r* calls. [Roy Fielding]
What happens if your mod_perl script has some global variables, that are being used for resource locking?
It's possible not to notice the pitfall if the critical code section between lock and unlock is very short and finishes fast, so you never see this happens (you aren't fast enough to stop the code in the middle). But look at the following scenario:
1. lock resource <critical section starts> 2. sleep 20 (== do some time consuming processing) <critical section ends> 3. unlock resource
If user presses STOP and Apache sends SIGPIPE
before step 3, since we are in the mod_perl mode and we want the lock
variable to be cached, it will be not unlocked. A kind of deadlock exists.
Here is the working example. Run the server with -X
, Press STOP
before the count-up to 10 has been finished. Then rerun the script, it'll
hang in while(1)
! The resource is not available anymore to this child.
use vars qw(%CACHE); use CGI; $|=1; my $q = new CGI; print $q->header,$q->start_html; print $q->p("$$ Going to lock!\n"); # actually the while loop below is not needed # (since it's an internal lock and accessible only # by the same process and it if it's locked... it's locked for the # whole child's life while (1) { unless (defined $CACHE{LOCK} and $CACHE{LOCK} == 1) { $CACHE{LOCK} = 1; print $q->p("Got the lock!\n"); last; } } print $q->p("Going to sleep (I mean working)!"); my $c=0; foreach (1..10) { sleep 1; print $c++,"\n<BR>"; } print $q->p("Going to unlock!"); $CACHE{LOCK} = 0; print $q->p("Unlock!\n");
You may ask, what is the solution for this problem? As noted in the
END blocks any END
blocks that are encountered during compilation of Apache::Registry
scripts are called after the script done is running, including subsequent
invocations when the script is cached in memory. So if you are running in Apache::Registry
mode, the following is your remedy:
END { $CACHE{LOCK} = 0; }
Notice that the END
block will be run after the
Apache::Registry::handler
is finished (not during the cleanup phase though).
If you are into a perl API, use the register_cleanup()
method of Apache.
$r->register_cleanup(sub {$CACHE{LOCK} = 0;});
If you are into Apache API
Apache->request->connection->aborted()
construct can be used to test for the aborted connection.
I hope you noticed, that this example is very misleading, since there is a
different instance of %CACHE
in every child, so if you modify it -- it is known only inside the same
child, none of global %CACHE
variables in other children is getting affected. But if you are going to
work with code that allows you to control variables that are being visible
to every child (some external shared memory or other approach) -- the
hazard this example still applies. Make sure you unlock the resources
either when you stop using them or when the script is being aborted in the
middle, before the actual unlocking is being happening.
A similar situation to Pressed Stop button disease happens when client (browser) timeouts the connection (is it about 2
minutes?) . There are cases when your script is about to perform a very
long operation and there is a chance that its duration will be longer than
the client's timeout. One case I can think about is the DataBase
interaction, where the DB engine hangs or needs a lot of time to return
results. If this is the case, use $SIG{ALRM}
to prevent the timeouts:
$timeout = 10; # seconds eval { local $SIG{ALRM} = sub { die "Sorry timed out. Please try again\n" }; alarm $timeout; ... db stuff ... alarm 0; }; die $@ if $@;
But, as lately it was discovered local $SIG{'ALRM'}
does not restore the original underlying C handler. It was fixed in the
mod_perl 1.19_01 (CVS version). As a matter of fact none of the
local $SIG{FOO}
restore the original C handler - read Debugging Signal Handlers ($SIG{FOO}) for a debug technique and a possible workaround.
Your CGI does not work and you want to see what the problem is. The best idea is to check out any errors that the server may be reporting. Where I can find these errors?
Generally all errors are logged into an error_log file. The exact file
location and name are defined in the http.conf file. Look for the
ErrorLog
parameter. My httpd.conf says:
ErrorLog var/logs/error_log
Hey, where is the beginning of the path? There is another Apache parameter
called ServerRoot
. Every time apache sees a value of the parameter with no absolute path
(e.g /tmp/my.txt
) but with relative path (e.g my.txt
) it prepends the value of the ServerRoot
to this value. I have:
ServerRoot /usr/local/apache
So I will look for error_log file at
/usr/local/apache/var/logs/error_log
. Of course you can also use an absolute path to define the file's location
at the file system.
<META>: is this 100% correct?
But there are cases when errors don't go to the error_log file. For example some errors are being printed to the console (tty) you have executed the httpd from (unless you redirected the httpd's stderr flow). This happens when the server didn't open the error_log file for writing yet.
For example, if you have mistakenly entered a non-existent directory path
in your ErrorLog
directive, the error message will be printed on the controlling tty. Or, if
the error happens when server executes
PerlRequire
or PerlModule
directive you might see the errors here also.
You are probably wonder where all the errors go when you are running the
server in single mode (httpd -X
). They go to the console. That is because when running in the single mode
there is no parent httpd process to perform all the logging. It includes
all the status messages that generally show up in the error_log file.
</META>
Perl uses sh()
for its iteractions for system()
and open()
calls. So when you want to set a temporary variable when you call a script
from your CGI you do:
open UTIL, "USER=stas ; script.pl | " or die "...: $!\n";
or
system "USER=stas ; script.pl";
This is useful for example if you need to invoke a script that uses CGI.pm from within a mod_perl script. We are tricking the perl script to think it's a simple CGI, which is not running under mod_perl.
open(PUBLISH, "GATEWAY_INTERFACE=CGI/1.1 ; script.cgi \"param1=value1¶m2=value2\" |") or die "...: $!\n";
Make sure, that the parameters you pass are shell safe (All ``unsafe'' characters like single-tick should be properly escaped).
However you are fork-ing to run a Perl script, so you have thrown the so hardly gained performance out the window. Whatever script.cgi is now, it should be moved to a module with a subroutine you can call directly from your script, to avoid the fork.
|
||
Written by Stas Bekman.
Last Modified at 09/26/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |