How We Built an Error Validation Framework for Our API
Shutterstock developers pay a lot of attention to the user experience of our website. We have a fleet of User Experience experts who help make sure the error states our web application shows to customers are useful and actionable.
But when we’re building backend APIs instead of HTML forms, that experience doesn’t translate. What’s the equivalent of this, in an API?
The Shutterstock Contributor Team has been building our next-generation content-review system, so that we can scale our image-review operation. We’re building it in a service-oriented fashion, in Ruby, with DataMapper as an ORM.
As developers building backend APIs, it’s solely our responsibility to provide useful information to the developers who will use our services. A good error validation framework preserves the integrity of our applications’ data and empowers developers to integrate with a new API.
Rather than write custom validation for each API endpoint, we took a systematic approach to add validation to all of them. Now we can avoid many application crashes, while providing useful information to developers.
One of the first things the review system needs is to learn about new items needing review:
This call puts the photo with item id 3709 and owner id 81 into the main review queue. The expected result is HTTP
201 Createdwith a
Location:header giving the URL of the created item. There are several other Shutterstock teams that will eventually integrate with this review service. Sometimes, when developers are still writing the software, they will post invalid data:
Whoops! This POST left out the queue name, so the review system doesn’t know who’s supposed to review it. Without data validation, our application will throw a 500 error:
It would be better if we told the programmer what he’s done wrong. Also, we’d like to return HTTP
400 Bad Requestinstead of having an internal server error.
Our team realized that there’s a tool to help us do this sort of thing: the
json-schema Ruby gem, an implementation of the IETF JSON Schema spec. To use this, we’ll need to build up a schema. For the items route, it would look like this:
Now we will make our review service pass the incoming POST data through json-schema’s JSON::Validator before doing anything else:
If there are any errors, the response looks like this instead:
This message tells us that there’s a property missing in the JSON document root (#/). If there’s more than one item missing, the validator will identify them all. The validator does more than check for the existence of the required fields; it also checks the types of each field. If someone passes in a Hash instead of a string, like so:
Then they’ll get an error message about item. Previously the application would have returned another Internal Server Error about a TypeError as soon as it tried to treat item as a string.)
There’s just one problem. We have a variety of resource types to manage. It would be really great if we didn’t have to write a custom schema for all of them. It’s a fair amount of text to write; it’s easy to get wrong; the hand-written schema can fall out of sync with the actual code; and above all, it’s redundant! Most of that validation information is already encoded in our ORM layer, where it looks like this:
It turns out that we can use this class definition to build our schema:
- figure out the class of the resource in question (we’ll call it
- ask the
resource_classfor a list of its properties (
- ignore properties that our application can automatically populate (like the internal database
- figure out the data type for the remaining properties (
- ask the properties whether they’re not required (
Once we’ve done that, we almost have enough information to build a schema. There are a few other wrinkles: our properties include things like
domain_idas an integer instead of a string, and we want our consumers to specify
shutterstock-photoinstead of the internal database ID. So for those we:
- ask for the
- figure out the
- replace the property matching that key with a string instead
Finally, we present all this data in the JSON Schema format.
That’s all the information we need to build schemas for all of our resource types. By computing and caching this at application load time, we can provide a basic schema for all
We may need to customize a generated schema for certain routes that are special cases. For instance, we’ve decided that the
POST /items route calls its logical ID field
item in the POST and
external_id in the database. Such customization is straightforward to accomplish.
Our final realization was that once we had all the information about how a schema ought to look, we could make the schema available to our users. So now they can issue a request against
owners.schema) and see for themselves exactly what fields the system is expecting to create a new resource. By providing a URL to the schema in the error message, we end up with a self-documenting API!