Table of Contents:
There is no such a thing as a single RIGHT strategy in web server business, though there are many wrong ones. Never believe a person who says: "Do it this way, this is the best!". As the old saying goes: "Trust but verify". There are too many technologies out there to choose from, and it would take an enormous investment of time and money to try to validate each one before deciding which is the best choice for your situation. Keeping this idea in mind, I will present some different combinations of mod_perl and other technologies or just standalone mod_perl. I'll describe how these things work together, and offer my opinions on the pros and cons of each, the relative degree of difficulty in installing and maintaining them, some hints on approaches that should be used and things to avoid.
To be clear, I will not address all technologies and tools, but limit this discussion to those complementing mod_perl.
Please let me stress it again: DO NOT blindly copy someone's setup and hope for a good result. Choose what is best for your situation -- it might take some effort to find it out.
There are several different ways to build, configure and deploy your mod_perl enabled server. Some of them are:
Having one binary and one config file (one big binary for mod_perl).
Having two binaries and two config files (one big binary for mod_perl and one small for static objects like images.)
Having one DSO-style binary, mod_perl loadable object and two config files (Dynamic linking lets you compile once and have a big and a small binary in memory BUT you have to deal with a freshly made solution that has weak documentation and is still subject to change and is rather more complex.)
Any of the above plus a reverse proxy server in http accelerator mode.
If you are a newbie, I would recommend that you start with the first option and work on getting your feet wet with apache and mod_perl. Later, you can decide whether to move to the second one which allows better tuning at the expense of more complicated administration, or to the third option -- the more state-of-the-art-yet-suspiciously-new DSO system, or to the fourth option which gives you even more power.
The first option will kill your production site if you serve a lot of static data with ~2-12 MB webserver processes. On the other hand, while testing you will have no other server interaction to mask or add to your errors.
The second option allows you to seriously tune the two servers for maximum performance. On the other hand you have to deal with proxying or fancy site design to keep the two servers in synchronization. In this configuration, you also need to choose between running the two servers on multiple ports, multiple IPs, etc... This adds the burden of administrating more than one server.
The third option (DSO) -- as mentioned above -- means playing with the
bleeding edge. In addition mod_so
(the DSO module) adds size and complexity to your binaries. With DSO,
modules can be added and removed without recompiling the server, and
modules are even shared among multiple servers. Again, it is bleeding edge
and still somewhat platform specific, but your mileage may vary. See mod_perl server as DSO.
The fourth option (proxy in http accelerator mode), once correctly configured and tuned, improves the performance of any of the above three options by caching and buffering page results.
The rest of this chapter discusses the pros and the cons of each of these presented configurations. Real World Scenarios Implementaion describes the implementation techniques of these schemas.
The first approach is to implement a straightforward mod_perl server. Just take your plain apache server and add mod_perl, like you add any other apache module. You continue to run it at the port it was running before. You probably want to try this before you proceed to more sophisticated and complex techniques.
The advantages:
Simplicity. You just follow the installation instructions, configure it, restart the server and you are done.
No network changes. You do not have to worry about using additional ports as we will see later.
Speed. You get a very fast server, you see an enormous speedup from the first moment you start to use it.
The disadvantages:
The process size of a mod_perl-enabled Apache server is huge (starting from 4Mb at startup and growing to 10Mb and more, depending on how you use it) compared to the typical plain Apache. Of course if memory sharing is in place -- RAM requirements will be smaller.
You probably have a few tens of children processes. The additional memory requirements add up in direct relation to the number of children processes. Your memory demands are growing by an order of magnitude, but this is the price you pay for the additional performance boost of mod_perl. With memory prices so cheap nowadays, the additional cost is low -- especially when you consider the dramatic performance boost mod_perl gives to your services with every 100Mb of RAM you add.
While you will be happy to have these monster processes serving your scripts with monster speed, you should be very worried about having them serve static objects such as images and html files. Each static request served by a mod_perl-enabled server means another large process running, competing for system resources such as memory and CPU cycles. The real overhead depends on static objects request rate. Remember that if your mod_perl code produces HTML code which includes images, each one will turn into another static object request. Having another plain webserver to serve the static objects solves this not pleasant obstacle. Having a proxy server as a front end, caching the static objects and freeing the mod_perl processes from this burden is another solution. We will discuss both below.
Another drawback of this approach is that when serving output to a client with a slow connection, the huge mod_perl-enabled server process (with all of its system resources) will be tied up until the response is completely written to the client. While it might take a few milliseconds for your script to complete the request, there is a chance it will be still busy for some number of seconds or even minutes if the request is from a slow connection client. As in the previous drawback, a proxy solution can solve this problem. More on proxies later.
Proxying dynamic content is not going to help much if all the clients are on a fast local net (for example, if you are administering an Intranet.) On the contrary, it can decrease performance. Still, remember that some of your Intranet users might work from home through the slow modem links.
If you are new to mod_perl, this is probably the best way to get yourself started.
And of course, if your site is serving only mod_perl scripts (close to zero static objects, like images), this might be the perfect choice for you!
For implementation notes see : One Plain and One mod_perl enabled Apache Servers
As I have mentioned before, when running scripts under mod_perl, you will notice that the httpd processes consume a huge amount of virtual memory, from 5Mb to 15Mb and even more. That is the price you pay for the enormous speed improvements under mod_perl. (Again -- shared memory keeps the real memory that is being used much smaller :)
Using these large processes to serve static objects like images and html documents is overkill. A better approach is to run two servers: a very light, plain apache server to serve static objects and a heavier mod_perl-enabled apache server to serve requests for dynamic (generated) objects (aka CGI).
From here on, I will refer to these two servers as httpd_docs (vanilla apache) and httpd_perl (mod_perl enabled apache).
The advantages:
The heavy mod_perl processes serve only dynamic requests, which allows the deployment of fewer of these large servers.
MaxClients
, MaxRequestsPerChild
and related parameters can now be optimally tuned for both httpd_docs
and httpd_perl
servers, something we could not do before. This allows us to fine tune the
memory usage and get a better server performance.
Now we can run many lightweight httpd_docs
servers and just a few heavy httpd_perl
servers.
An important note: When user browses static pages and the base URL in the Location window points to the static server, for example
http://www.nowhere.com/index.html
-- all relative URLs (e.g. <A
HREF="/main/download.html"
>) are being served by the light plain apache server. But this is not
the case with dynamically generated pages. For example when the base URL in
the Location window points to the dynamic server -- (e.g. http://www.nowhere.com:8080/perl/index.pl
) all relative URLs in the dynamically generated HTML will be served by the
heavy mod_perl processes. You must use a fully qualified URLs and not the
relative ones! http://www.nowhere.com/icons/arrow.gif
is a full URL, while
/icons/arrow.gif
is a relative one. Using <BASE
HREF="http://www.nowhere.com/"
> in the generated HTML is another way to handle this problem. Also the httpd_perl
server could rewrite the requests back to httpd_docs
(much slower) and you still need an attention of the heavy servers. This is
not an issue if you hide the internal port implementations, so client sees
only one server running on port 80
. (See Publishing port numbers different from 80)
The disadvantages:
An administration overhead.
A need for two different sets of configuration, log and other files. We
need a special directory layout to manage these. While some directories can
be shared between the two servers (like the include
directory, containing the apache include files -- assuming that both are
built from the same source distribution), most of them should be separated
and the configuration files updated to reflect the changes.
A need for two sets of controlling scripts (startup/shutdown) and watchdogs.
If you are processing log files, now you probably will have to merge the two separate log files into one before processing them.
We still have the problem of a mod_perl process spending its precious time serving slow clients, when the processing portion of the request was completed long time ago, exactly as in the one server approach. Deploying a proxy solves this, and will be covered in the next sections.
As with only one server approach, this is not a major disadvantage if you are on a fast local Intranet. It is likely that you do not want a buffering server in this case.
Before you go on with this solution you really want to look at the Adding a Proxy Server in http Accelerator Mode section.
For implementation notes see : One Plain and One mod_perl enabled Apache Servers
If the only requirement from the light server is for it to serve static
objects, then you can get away with non-apache servers having an even
smaller memory footprint. thttpd
has been reported to be about 5 times faster then apache (especially under
a heavy load), since it is very simple and uses almost no memory (260k) and
does not spawn child processes.
Meta: Hey, No personal experience here, only rumours. Please let me know if I have missed some pros/cons here. Thanks!
The Advantages:
All the advantages of the 2 servers scenario.
More memory saving. Apache is about 4 times bigger then thttpd, if you spawn 30 children you use about 30M of memory, while thttpd uses only 260k - 100 times less! You could use the saved 30M to run more mod_perl servers.
Note that this is not true if your OS supports memory sharing and you configured apache to use it (it is a DSO approach. There is no memory sharing if apache modules are being statically compiled into httpd). If you do allow memory sharing -- 30 light apache servers ought to use about 3-4Mb only, because most of it will be shared. If this is the case -- the save ups are much smaller with thttpd.
Reported to be about 5 times faster then plain apache serving static objects.
The Disadvantages:
Lacks some of apache's features, like access control, error redirection, customizable log file formats, and so on.
At the beginning there were 2 servers: one - plain apache server, which was very light, and configured to serve static objects, the other -- mod_perl enabled,
which was very heavy and aimed to serve mod_perl scripts. We named them: httpd_docs
and httpd_perl
appropriately. The two servers coexisted at the same IP(DNS)
by listening to different ports: 80 -- for httpd_docs
(e.g. http://www.nowhere.com/images/test.gif
) and 8080 -- for
httpd_perl
(e.g. http://www.nowhere.com:8080/perl/test.pl
). Note that I did not write http://www.nowhere.com:80 for the
first example, since port 80 is a default http port. (Later on, I will be
moving the
httpd_docs
server to port 81.)
Now I am going to convince you that you want to use a proxy server (in the http accelerator mode). The advantages are:
Allow serving of static objects from the proxy's cache (objects that
previously were entirely served by the httpd_docs
server).
You get less I/O activity reading static objects from the disk (proxy serves the most ``popular'' objects from the RAM memory - of course you benefit more if you allow the proxy server to consume more RAM). Since you do not wait for the I/O to be completed you are able to serve the static objects much faster.
The proxy server acts as a sort of output buffer for the dynamic content. The mod_perl server sends the entire response to the proxy and is then free to deal with other requests. The proxy server is responsible for sending the response to the browser. So if the transfer is over a slow link, the mod_perl server is not waiting around for the data to move.
Using numbers is always more convincing :) Let's take a user connected to your site with 28.8 kbps (bps == bits/sec) modem. It means that the speed of the user's link is 28.8/8 = 3.6 kbytes/sec. I assume an average generated HTML page to be of 10kb (kb == kilobytes) and an average script that generates this output in 0.5 secs. How much time will the server wait before the user gets the whole output response? A simple calculation reveals pretty scary numbers - it will have to wait for another 6 secs (20kb/3.6kb), when it could serve another 12 (6/0.5) dynamic requests in this time. This very simple example shows us that we need a twelve the number of children running, which means you will need only one twelve of the memory (which is not quite true because some parts of the code are being shared). But you know that nowadays scripts return pages which sometimes are being blown up with javascript code and similar, which makes them of 100kb size and download time to be of... (This calculation is left to you as an exercise :)
To make your estimation of download time numbers even worse, let me remind you that many users like to open many browser windows and do many things at once (download files and browse heavy sites). So the speed of 3.6kb/sec we were assuming before, may many times be 5-10 times slower.
Also we are going to hide the details of the server's implementation. Users will never see ports in the URLs (more on that topic later). And you can have a few boxes serving the requests, and only one serving as a front end, which spreads the jobs between the servers in a way you configured it too. So you can actually put down one server down for upgrade, but end user will never notice that because the front end server will dispatch the jobs to other servers. (Of course this is a pretty big issue, and it would not be discussed in the scope of this document)
For security reasons, using any httpd accelerator (or a proxy in httpd accelerator mode) is essential because you do not let your internal server get directly attacked by arbitrary packets from whomever. The httpd accelerator and internal server communicate in expected HTTP requests. This allows for only your public ``bastion'' accelerating www server to get hosed in a successful attack, while leaving your internal data safe.
The disadvantages are:
Of course there are drawbacks. Luckily, these are not functionality drawbacks, but more of administration hassle. You add another daemon to worry about, and while proxies are generally stable, you have to make sure to prepare proper startup and shutdown scripts, which are being run at the boot and reboot appropriately. Also, maybe a watchdog script running at the crontab.
Proxy servers can be configured to be light or heavy, the admin must decide what gives the highest performance for his application. A proxy server like squid is light in the concept of having only one process serving all requests. But it can appear pretty heavy when it loads objects into memory for faster service.
Have I succeeded in convincing you that you want the proxy server?
If you are on a local area network (LAN), then the big benefit of the proxy buffering the output and feeding a slow client is gone. You are probably better off sticking with a straight mod_perl server in this case.
As of this writing the two proxy implementations are known to be used in bundle with mod_perl - squid proxy server and mod_proxy which is a part of the apache server. Let's compare the two of them.
The Advantages:
Caching of static objects. So these are being served much faster assuming that your cache size is big enough to keep the most requested objects in the cache.
Buffering of dynamic content, by taking the burden of returning the content generated by mod_perl servers to slow clients, thus freeing mod_perl servers from waiting for the slow clients to download the data. Freed servers immediately switch to serve other requests, thus your number of required servers goes dramatically down.
Non-linear URL space / server setup. You can use Squid to play some tricks with the URL space and/or domain based virtual server support.
The Disadvantages:
Proxying dynamic content is not going to help much if all the clients are on a fast local net. Also, a message on the squid mailing list implied that squid only buffers in 16k chunks so it would not allow a mod_perl to complete immediately if the output is larger.
Speed. Squid is not very fast today when compared to plain file based web servers available. Only if you are using a lot of dynamic features such as mod_perl or similar speed is a reason to use Squid, and then only if the application and server is designed with caching in mind.
Memory usage. Squid uses quite a bit of memory.
HTTP protocol level. Squid is pretty much a HTTP/1.0
server, which seriously limits the deployment of HTTP/1.1
features.
HTTP headers, dates and freshness. The squid server might give out ``old'' pages, confusing downstream/client caches. Also chances are that you will be giving out stale pages. (You update the some documents on the site, but squid will still serve the old ones.)
Stability. Compared to plain web servers Squid is not the most stable.
The presented pros and cons lead to an idea, that probably you might want squid more for its dynamic content buffering features, but only if your server serves mostly dynamic requests. So in this situation it is better to have a plain apache server serving static objects, and squid proxying the mod_perl enabled server only. At least when performance is the goal.
For implementation details see: Running 1 webserver and squid in httpd accelerator mode and Running 2 webservers and squid in httpd accelerator mode
I do not think the difference in speed between apache's mod_proxy and squid is relevant for most sites, since the real value of what they do is buffering for slow client connections. However squid runs as a single process and probably consumes fewer system resources. The trade-off is that mod_rewrite is easy to use if you want to spread parts of the site across different back end servers, and mod_proxy knows how to fix up redirects containing the back-end server's idea of the location. With squid you can run a redirector process to proxy to more than one back end, but there is a problem in fixing redirects in a way that keeps the client's view of both server names and port numbers in all cases. The difficult case being where you have DNS aliases that map to the same IP address for an alias and you want the redirect to use port 80 (when the server is really on a different port) but you want it to keep the specific name the browser sent so it does not change in the client's Location window.
The Advantages:
No additional server is needed. We keep the one plain plus one mod_perl
enabled apache servers. All you need is to enable the
mod_proxy
in the httpd_docs
server and add a few lines to
httpd.conf
file.
ProxyPass
and ProxyPassReverse
directives allow you to hide the internal redirects, so if http://nowhere.com/modperl/
is actually
http://localhost:81/modperl/
, it will be absolutely transparent for user. ProxyPass
redirects the request to the mod_perl server, and when it gets the respond, ProxyPassReverse
rewrites the URL back to the original one, e.g:
ProxyPass /modperl/ http://localhost:81/modperl/ ProxyPassReverse /modperl/ http://localhost:81/modperl/
It does mod_perl output buffering like squid does. See the Using mod_proxy notes for more details.
It even does caching. You have to produce correct Content-Length
,
Last-Modified
and Expires
http headers for it to work. If some dynamic content is not to change
constantly, you can dramatically increase performance by caching it with ProxyPass
.
ProxyPass
happens before the authentication phase, so you do not have to worry about
authenticating twice.
Apache is able to accel https (secure) requests completely, while also doing http accel. (with squid you have to use an external redirection program for that).
The latest (from apache 1.3.6) Apache proxy accel mode reported to be very stable.
The Disadvantages:
Users reported that it might be a bit slow, but the latest version is fast enough. (How fast is enough? :)
For implementation see Using mod_proxy.
|
||
Written by Stas Bekman.
Last Modified at 08/17/1999 |
![]() |
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |