Protecting sensitive data in memory

 

The goal is simple, but the pitfalls are numerous -- here's how to avoid exposing your goods

John Viega ([email protected])
Author
February 2001

Security-conscious programmers often need to protect sensitive data in memory, such as passwords and cryptographic keys. In order to do this effectively, a programmer should keep sensitive data in memory for as short a time as possible, and should try to ensure that the data never gets written to disk.

Introduction
Often, a programmer may not find it worthwhile to pay attention to security when it comes to protecting sensitive data -- such as passwords and cryptographic keys -- from the user. The programmer may feel that security measures are unwarranted because the data will only be used locally. However, programmers should consider the value of the data in use by their programs.

What if an attacker breaks into the machine through some means and captures, say, a password? In such a case, the attacker would have gained some kind of access to an account in your application. Additionally, people tend to reuse passwords on different kinds of accounts, just to keep things simple.

Some might say that those who use the same password for multiple accounts deserve what they get. This theory offloads too much of the security burden onto users, who are probably more concerned about other things; it is the responsibility of the application to be reasonably diligent with regard to security. Additionally, such a lax approach isn't fair to those people who don't reuse passwords, because it is still fairly easy for an attacker to compromise the account at hand.

Therefore, you should use care with any data that a user might consider sensitive. Your primary high-level goal should be to keep sensitive data in memory for as short a time as possible. And when it is necessary to store such data, you should do your best to prevent that data from ever being recovered by unauthorized parties.

These goals are simple, but they are often tough to realize in practice. In this article, we'll explore the best methods for protecting data in real applications.

What to protect against
Let's say that an attacker breaks into a system running your software. You can't stop the attacker from getting at your data while it is in memory, if that person is determined and resourceful enough. However, most people breaking into a computer are unlikely to exert a lot of energy extracting passwords from running programs if it is difficult to do. There is plenty of ripe fruit hanging far lower.

The most important thing to avoid is putting sensitive data in a file on the file system. If you absolutely must store sensitive data here for any extended length of time (that is, if you cannot use a cryptographic checksum), you should use encryption to protect the data. We'll discuss this in more detail below.

Additionally, you should prevent your program from leaving memory dumps around when the program crashes. Memory dumps are stored in regular old files, and it's very easy to get ASCII strings that were in memory at the time of a crash out of such files. You can forbid core dumps by using the setrlimit call. (rlimit is an abbreviation for "resource limit.")



#include <sys/time.h>

#include <sys/resource.h>

#include <unistd.h>



int  main(int argc, char **argv){

  struct rlimit rlim;



  getrlimit(RLIMIT_CORE, &rlim);

  rlim.rlim_max = rlim.rlim_cur = 0;

  if(setrlimit(RLIMIT_CORE, &rlim)) {

    exit(-1);

  }

  ...

  return 0;

}

Generally, the best way to use setrlimit is to first call getrlimit to get current resource limits, change those values, then call setrlimit to set the new values. This ensures that all resource limits will always have sane values.

Another way your data can make it to disk is by being swapped out. The operating system can decide to take parts of your running program in memory and save them to disk. In C, you may be able to "lock" your data to keep it from swapping out; your program will generally need administrative privileges to do this successfully, but it never hurts to try. Here's a simple way to lock memory when possible:



#include <sys/mman.h>



void *locking_alloc(size_t numbytes) {

     static short have_warned = 0;

     void *mem = malloc(numbytes);



     if(mlock(mem, numbytes) && !have_warned) {

       /* We probably do not have permission.

        * Sometimes, it might not be possible to lock enough memory.

        */

       fprintf(stderr, "Warning: Using insecure memory!\n");

       have_warned = 1;

     }     

     return mem;

}



The mlock() call generally locks more memory than you want. Locking is done on a per-page basis. All of the pages the memory spans will be locked in RAM, and will not be swapped out under any circumstances, until the process unlocks something in the same page by using mlock().

There are some potentially negative consequences here. First, If your process locks two buffers that happen to live on the same page, then unlocking either one will unlock the entire page, causing both buffers to unlock. Second, when locking lots of data, it is easy to lock more pages than necessary (the operating system doesn't move data around once it has been allocated), which can slow down machine performance significantly.

Therefore, you should allocate all memory that might need to contain sensitive data at the same time, preferably in one big chunk. Assuming all the data fits onto a single page, you should lock the entire chunk when you need secure memory. As soon as you have no need for secure memory at a particular moment in time, unlock the entire chunk (there's no need to risk hampering performance when there is no data to secure).

Unlocking a chunk of memory looks exactly the same as locking it, except that you call munlock():



munlock(mem, numbytes);



If you require lots of sensitive data, it is possible to lock all memory in the address space using the mlockall() call. You should probably avoid this call, though, due to the potential performance hit.

In most cases, these calls are not available: Often, programs are unable to run with admistrative permission; other times the language being used does not support page locking at all. In such cases, your best bet is to use as small a chunk of memory as possible, and to use it and erase it as quickly as possible, thus minimizing your window of vulnerability. If you are constantly accessing the buffer from the time you place the data in until the time you erase it, then you minimize your risk; paging rules will likely (but not definitely) keep the page in question from swapping.

Another way to get a block of memory that will not swap is to use a RAM disk. That is, the operating system will provide you with a "disk drive" that is really part of the system memory. It's much easier to manage than mlock(), but requires a very non-standard environment. If you think this might be a viable option for you, see Resources for a link on RAM disks. Other operating systems may provide encrypted swap space, or encrypted file systems that you can use to store sensitive data (see Resources). However, these are also rare.

In addition, you may also have problems actually erasing data from memory. We'll discuss this in the next section.

Erasing data from memory
When possible, try to avoid saving sensitive data altogether. For example, when keeping a login database, many people choose not to store passwords at all. Instead, they keep a cryptographic hash, which is simply a high-quality checksum of the original password. To validate a user, you just need to recompute the checksum on the entered password, and see if it matches the stored checksum. The primary advantage to this approach is that an application never has to store the actual password -- which improves the end user's security and privacy.

Of course, at some point you may have no choice but to handle a password in its raw format. For example, the user doesn't directly enter a checksum; this needs to be computed from input text. In such a case, you should erase the password immediately after computing the checksum.

To erase data in memory, write over the data itself. The following is not sufficient in a C program:



int validate(char *username) {

  char *password;

  char *checksum;



  password = read_password();

  checksum = compute_checksum(password);

  password = 0;

  return !strcmp(checksum, get_stored_checksum(username));

}

This is insufficient because we haven't erased the actual memory that stores the password; we've only erased a pointer to the password. If we know the password is null-terminated, we can erase it as such:



/* Overwrite a string with 0's, until we get to the null terminator. */

void erase_string(char *s) {

     while(*s) { *s++ = 0; }

}

Otherwise, we will need to know the length of the data:



memset(buf, 0, numbytes); /* set numbytes to 0, starting at buf. */

In other languages, we might have a more difficult time erasing sensitive data. High-level languages often have data types that are immutable. The program can only write to an immutable object once, at creation time. For example, consider the following Python code:



pw = input()              # Returns an ASCII string

pw = compute_checksum(pw) # Computes the cryptographic hash, overwriting pw

Even after this code has run, the unencrypted password may still exist in memory. This is because assigning pw to the result of compute_checksum(pw) will probably not overwrite the actual memory where pw was previously stored. Instead, it will create a new string that lives elsewhere in memory.

You might think to fix the problem by directly overwriting each character of the string. You could try the following:



for i in range(len(pw)):

  pw[i] = 0

Unfortunately, this approach will not work, because Python does not allow the user to overwrite any part of a string. You have no way of knowing when the language will decide to actually write over the stored memory. Strings in Java, Perl, Tcl, and most other high-level languages have the exact same problem.

The only solution to this is to use mutable data structures. That is, you must only use data structures that allow you to dynamically replace elements. For example, in Python you can use lists to store an array of characters. However, every time you add or remove an element from a list, the language might copy the entire list behind your back, depending on the implementation details. To be safe, if you have to dynamically resize a data structure, you should create a new one, copy data, and then write over the old one. For example:



def paranoid_add_character_to_list(ch, l):

  """Copy l, adding a new character, ch.  Erase l.  Return the result."""

  new_list = []

  for i in range(len(l)):

    new_list.append(0)

  new_list.append(ch)

  for i in range(len(l)):

    new_list[i] = l[i]

    l[i] = 0

  return new_list

The inability to use immutable data types for sensitive data is quite inconvenient. For example, in most high-level languages, you can no longer use standard input functions to read in a string; you must read characters in one at a time. This can be quite a task in and of itself, depending on the language.

A similar problem exists in some languages that support garbage collection, even if they provide mutable data types. The garbage collector may copy sensitive memory while it is in use (for efficiency purposes). Languages that support reference-counting only, such as Python, will not have this problem. However, even if most Java implementations do not have this problem, some of them might.

Erasing the disk
You may need to store sensitive data on the hard disk without protecting it. Perhaps your application needs to view a sensitive document that is much too big to fit into memory all at once. Encryption might be an option for protecting the document in some environments, but others might have performance considerations that forbid it. The best solution is to try to protect the file while it is in use, and delete it as quickly as possible. But when we delete the file, does it really go away?

Usually, "deleting" a file means simply removing a file system entry that points to a file. The file will still exist somewhere, at least until it gets overwritten. Unfortunately, the file will also exist even after it gets overwritten. Disk technology is such that even files that have been overwritten can be recovered, given the right equipment and know-how. Some people claim that if you want to securely delete a file, you should overwrite it seven times. The first time, overwrite it with all ones, second with all zeroes. Then, overwrite it with an alternating pattern of ones and zeros. Finally, overwrite the file four times with random data, such as that generated from /dev/urandom or a similar source.

Unfortunately, this technique probably isn't sufficient. It is widely believed that the United States government has disk recovery technology that can thwart such a scheme. If you are really concerned about this, then we recommend implementing Peter Gutmann's 35-pass scheme as a bare minimum (see Resources).

Of course, anyone who gives you a maximum number of times to write over data is misleading you. No one knows how many times will be sufficient. If you want to take no chances at all, then you need to ensure that the bits of interest are never written to disk with encryption, decrypting them directly into locked memory. There is no other alternative.

Conclusion
When it comes to dealing with sensitive data in software applications, it is easy to ignore the problem, especially since it is so difficult to do things right. Nowhere is this more true than in today's high-level languages, which provide no good mechanism for securely holding data, and often make it difficult or impossible to completely erase sensitive data.

Future programming languages may effectively address this security requirement. Until then, developers will be forced to make tough trade-offs when working with sensitive data.

Resources

About the author
John Viega ([email protected]) is co-author of Building Secure Software (Addison-Wesley, 2001) and Java Enterprise Architecture (O'Reilly and Associates, 2001). John has authored more than 50 technical publications, primarily in the area of software security. He also wrote Mailman, the GNU Mailing List Manager and ITS4, a tool for finding security vulnerabilities in C and C++ code.