Chapter�10.� Data Validation

Table of Contents

Validation Strategies
Accept Only Known Valid Data
Reject Known Bad Data
Sanitize All Data
Never Rely on Client-Side Data Validation

Most of the common attacks on systems (whose descriptions follow this section) can be prevented, or the threat of their occurring can be significantly reduced, by appropriate data validation. Data validation is one of the most important aspects of designing a secure web application. When we refer to data validation we are referring to both input to and output from a web application.

Validation Strategies

Data validation strategies are often heavily influenced by the architecture for the application. If the application is already in production it will be significantly harder to build the optimal architecture than if the application is still in a design stage. If a system takes a typical architectural approach of providing common services then one common component can filter all input and output, thus optimizing the rules and minimizing efforts.

There are three main models to think about when designing a data validation strategy.

  • Accept Only Known Valid Data

  • Reject Known Bad Data

  • Sanitize Bad Data

We cannot emphasize strongly enough that "Accept Only Known Valid Data" is the best strategy. We do, however, recognize that this isn't always feasible for political, financial or technical reasons, and so we describe the other strategies as well.

All three methods must check:

  • Data Type

  • Syntax

  • Length

Data type checking is extremely important. The application should check to ensure a string is being submitted and not an object, for instance.

Accept Only Known Valid Data

As we mentioned, this is the preferred way to validate data. Applications should accept only input that is known to be safe and expected. As an example, let's assume a password reset system takes in usernames as input. Valid usernames would be defined as ASCII A-Z and 0-9. The application should check that the input is of type string, is comprised of A-Z and 0-9 (performing canonicalization checks as appropriate) and is of a valid length.

Reject Known Bad Data

The rejecting bad data strategy relies on the application knowing about specific malicious payloads. While it is true that this strategy can limit exposure, it is very difficult for any application to maintain an up-to-date database of web application attack signatures.

Sanitize All Data

Attempting to make bad data harmless is certainly an effective second line of defense, especially when dealing with rejecting bad input. However, as described in the canonicalization section of this document, the task is extremely hard and should not be relied upon as a primary defense technique.