Protecting sensitive data in memory
The goal is simple, but the pitfalls are numerous
-- here's how to avoid exposing your goods
John Viega ([email protected])
Author
February 2001
Security-conscious
programmers often need to protect sensitive data in
memory, such as passwords and cryptographic keys. In
order to do this effectively, a programmer should
keep sensitive data in memory for as short a time as
possible, and should try to ensure that the data
never gets written to disk.
Introduction
Often, a programmer may not find it worthwhile to pay
attention to security when it comes to protecting
sensitive data -- such as passwords and cryptographic
keys -- from the user. The programmer may feel that
security measures are unwarranted because the data will
only be used locally. However, programmers should
consider the value of the data in use by their programs.
What
if an attacker breaks into the machine through some means
and captures, say, a password? In such a case, the
attacker would have gained some kind of access to an
account in your application. Additionally, people tend to
reuse passwords on different kinds of accounts, just to
keep things simple.
Some
might say that those who use the same password for
multiple accounts deserve what they get. This theory
offloads too much of the security burden onto users, who
are probably more concerned about other things; it is the
responsibility of the application to be reasonably
diligent with regard to security. Additionally, such a
lax approach isn't fair to those people who don't reuse
passwords, because it is still fairly easy for an
attacker to compromise the account at hand.
Therefore,
you should use care with any data that a user might
consider sensitive. Your primary high-level goal should
be to keep sensitive data in memory for as short a time
as possible. And when it is necessary to store such data,
you should do your best to prevent that data from ever
being recovered by unauthorized parties.
These
goals are simple, but they are often tough to realize in
practice. In this article, we'll explore the best methods
for protecting data in real applications.
What
to protect against
Let's say that an attacker breaks into a system running
your software. You can't stop the attacker from getting
at your data while it is in memory, if that person is
determined and resourceful enough. However, most people
breaking into a computer are unlikely to exert a lot of
energy extracting passwords from running programs if it
is difficult to do. There is plenty of ripe fruit hanging
far lower.
The
most important thing to avoid is putting sensitive data
in a file on the file system. If you absolutely must
store sensitive data here for any extended length of time
(that is, if you cannot use a cryptographic checksum),
you should use encryption to protect the data. We'll
discuss this in more detail below.
Additionally,
you should prevent your program from leaving memory dumps
around when the program crashes. Memory dumps are stored
in regular old files, and it's very easy to get ASCII
strings that were in memory at the time of a crash out of
such files. You can forbid core dumps by using the setrlimit
call. (rlimit is an abbreviation for
"resource limit.")
Generally,
the best way to use setrlimit is to first
call getrlimit to get current resource
limits, change those values, then call setrlimit
to set the new values. This ensures that all resource
limits will always have sane values.
Another
way your data can make it to disk is by being swapped
out. The operating system can decide to take parts of
your running program in memory and save them to disk. In
C, you may be able to "lock" your data to keep
it from swapping out; your program will generally need
administrative privileges to do this successfully, but it
never hurts to try. Here's a simple way to lock memory
when possible:
The
mlock() call generally locks more memory
than you want. Locking is done on a per-page basis. All
of the pages the memory spans will be locked in RAM, and
will not be swapped out under any circumstances, until
the process unlocks something in the same page by using mlock() .
There
are some potentially negative consequences here. First,
If your process locks two buffers that happen to live on
the same page, then unlocking either one will unlock the
entire page, causing both buffers to unlock. Second, when
locking lots of data, it is easy to lock more pages than
necessary (the operating system doesn't move data around
once it has been allocated), which can slow down machine
performance significantly.
Therefore,
you should allocate all memory that might need to contain
sensitive data at the same time, preferably in one big
chunk. Assuming all the data fits onto a single page, you
should lock the entire chunk when you need secure memory.
As soon as you have no need for secure memory at a
particular moment in time, unlock the entire chunk
(there's no need to risk hampering performance when there
is no data to secure).
Unlocking
a chunk of memory looks exactly the same as locking it,
except that you call munlock() :
If
you require lots of sensitive data, it is possible to
lock all memory in the address space using the mlockall()
call. You should probably avoid this call, though, due to
the potential performance hit.
In
most cases, these calls are not available: Often,
programs are unable to run with admistrative permission;
other times the language being used does not support page
locking at all. In such cases, your best bet is to use as
small a chunk of memory as possible, and to use it and
erase it as quickly as possible, thus minimizing your
window of vulnerability. If you are constantly accessing
the buffer from the time you place the data in until the
time you erase it, then you minimize your risk; paging
rules will likely (but not definitely) keep the page in
question from swapping.
Another
way to get a block of memory that will not swap is to use
a RAM disk. That is, the operating system will provide
you with a "disk drive" that is really part of
the system memory. It's much easier to manage than mlock() ,
but requires a very non-standard environment. If you
think this might be a viable option for you, see Resources for a link on RAM
disks. Other operating systems may provide encrypted swap
space, or encrypted file systems that you can use to
store sensitive data (see Resources). However, these are
also rare.
In addition,
you may also have problems actually erasing data from
memory. We'll discuss this in the next section.
Erasing
data from memory
When possible, try to avoid saving sensitive data
altogether. For example, when keeping a login database,
many people choose not to store passwords at all.
Instead, they keep a cryptographic hash, which is
simply a high-quality checksum of the original password.
To validate a user, you just need to recompute the
checksum on the entered password, and see if it matches
the stored checksum. The primary advantage to this
approach is that an application never has to store the
actual password -- which improves the end user's security
and privacy.
Of
course, at some point you may have no choice but to
handle a password in its raw format. For example, the
user doesn't directly enter a checksum; this needs to be
computed from input text. In such a case, you should
erase the password immediately after computing the
checksum.
To
erase data in memory, write over the data itself. The
following is not sufficient in a C program:
This
is insufficient because we haven't erased the actual
memory that stores the password; we've only erased a
pointer to the password. If we know the password is
null-terminated, we can erase it as such:
Otherwise,
we will need to know the length of the data:
In
other languages, we might have a more difficult time
erasing sensitive data. High-level languages often have
data types that are immutable. The program can
only write to an immutable object once, at creation time.
For example, consider the following Python code:
Even
after this code has run, the unencrypted password may
still exist in memory. This is because assigning pw
to the result of compute_checksum(pw) will
probably not overwrite the actual memory where pw
was previously stored. Instead, it will create a new
string that lives elsewhere in memory.
You
might think to fix the problem by directly overwriting
each character of the string. You could try the
following:
Unfortunately,
this approach will not work, because Python does not
allow the user to overwrite any part of a string. You
have no way of knowing when the language will decide to
actually write over the stored memory. Strings in Java,
Perl, Tcl, and most other high-level languages have the
exact same problem.
The
only solution to this is to use mutable data structures.
That is, you must only use data structures that allow you
to dynamically replace elements. For example, in Python
you can use lists to store an array of characters.
However, every time you add or remove an element from a
list, the language might copy the entire list behind your
back, depending on the implementation details. To be
safe, if you have to dynamically resize a data structure,
you should create a new one, copy data, and then write
over the old one. For example:
The
inability to use immutable data types for sensitive data
is quite inconvenient. For example, in most high-level
languages, you can no longer use standard input functions
to read in a string; you must read characters in one at a
time. This can be quite a task in and of itself,
depending on the language.
A
similar problem exists in some languages that support
garbage collection, even if they provide mutable data
types. The garbage collector may copy sensitive memory
while it is in use (for efficiency purposes). Languages
that support reference-counting only, such as Python,
will not have this problem. However, even if most Java
implementations do not have this problem, some of them
might.
Erasing
the disk
You may need to store sensitive data on the hard disk
without protecting it. Perhaps your application needs to
view a sensitive document that is much too big to fit
into memory all at once. Encryption might be an option
for protecting the document in some environments, but
others might have performance considerations that forbid
it. The best solution is to try to protect the file while
it is in use, and delete it as quickly as possible. But
when we delete the file, does it really go away?
Usually,
"deleting" a file means simply removing a file
system entry that points to a file. The file will still
exist somewhere, at least until it gets overwritten.
Unfortunately, the file will also exist even after it
gets overwritten. Disk technology is such that even files
that have been overwritten can be recovered, given the
right equipment and know-how. Some people claim that if
you want to securely delete a file, you should overwrite
it seven times. The first time, overwrite it with
all ones, second with all zeroes. Then, overwrite it with
an alternating pattern of ones and zeros. Finally,
overwrite the file four times with random data, such as
that generated from /dev/urandom or a similar source.
Unfortunately,
this technique probably isn't sufficient. It is widely
believed that the United States government has disk
recovery technology that can thwart such a scheme. If you
are really concerned about this, then we recommend
implementing Peter Gutmann's 35-pass scheme as a bare
minimum (see Resources).
Of course,
anyone who gives you a maximum number of times to write
over data is misleading you. No one knows how many times
will be sufficient. If you want to take no chances at
all, then you need to ensure that the bits of interest
are never written to disk with encryption, decrypting
them directly into locked memory. There is no other
alternative.
Conclusion
When it comes to dealing with sensitive data in software
applications, it is easy to ignore the problem,
especially since it is so difficult to do things right.
Nowhere is this more true than in today's high-level
languages, which provide no good mechanism for securely
holding data, and often make it difficult or impossible
to completely erase sensitive data.
Future
programming languages may effectively address this
security requirement. Until then, developers will be
forced to make tough trade-offs when working with
sensitive data.
Resources
About the author
John Viega ([email protected]) is co-author of Building
Secure Software (Addison-Wesley, 2001) and Java
Enterprise Architecture (O'Reilly and Associates, 2001).
John has authored more than 50 technical publications,
primarily in the area of software security. He also wrote
Mailman, the GNU Mailing List Manager and ITS4, a tool
for finding security vulnerabilities in C and C++ code.
|