A resource is identified by URI and may emit representations. There’s no way to tell from the representations what the resource “is”; I tend to believe a resource is what its publisher says it is as a good rule of thumb. But it doesn’t affect the software very much.
–Tim Bray on the xml-dev mailing list, Wednesday, 23 Jul 2003
REST is like Quantum Mechanics; or, more specifically, resources are like atoms (and I do mean atoms, not ATOMs). In quantum mechanics you cannot actually say what an atom is, where it is, or how fast it is moving. You can only predict how it will respond to certain experiments, and then only in a probablistic fashion. As soon as you try to figure out what the wave function actually represents, well then you fall down a sink hole of bad physics and worse philosophy.
Resources are the same. When designing RESTful systems, you never see the resources. All you see is the URL and the representation of the resource the URL provides. What resource does the URL identify? Who knows? The only way to reason about it is through the Copenhagen interpration.
On the server side things are a little different. You may be able to look behind the curtain and in fact know quite a bit more about the real resource than the client can. (This is simply not possible in quantum mechanics.) However it pays off not to do this. The more indirection you have between the resource and the actual physical representation the more flexible your design will be. If you tie your URLs to particular file system layouts or database queries, then you’re going to have a really hard time porting that system to a new operating system or database.
It’s best to design your URLs without respect to how they will actually be implemented. Design them so they make sense to human clients and search engines. Then worry later about how you’ll actually implement the backend that serves representations of those URLs.
Sadly server support for such schemes is very lacking in 2006. Doing it on top of Apache requires serious mod_rewrite voodoo. (That’s how WordPress creates the very nice URLs you see on this site.) PHP doesn’t help out here since it’s still very tied to a one-URL, one .phtml file model.
Rails improves matters a little bit, but doesn’t really take the plunge of completely decoupling the file system from the URL and database structure. <troll>In fact, its whole convention-over-configuration attitude is exactly contrary to what’s needed. You can configure Rails to do the right thing, but by default it does the wrong thing. This makes it faster and easier to build small simple systems; but quite a bit harder to build more complex, more flexible, more scalable systems. i.e. Rails makes the easy things easy, but the hard things harder. </troll> On the flip side, the easy things are easy enough that even with the more than linear growth in complexity as you move to a more decoupled URL structure, it’s still possible in Rails. It’s just not as easy as it would be if flexibility were as big a design goal in Rails as a cool 15-minute demo.
I have not yet seen a web server system that implements the full decoupling of URL space from implementation details I desire. I have seen sites do this. Yahoo and Amazon come to mind. However, they either use custom servers or a lot of mod_rewrite scripts sitting on top of Apache. I’m still imagining what I want, but I think it would look something like this:
A centralized configuration database accessed through the web server itself that allows:
- An individual URL to be mapped to an arbitrary file or script
- A URL subtree to be mapped to a filesystem subtree or script
Of course, the devil’s in the details. If you aren’t careful, you’ll just end up with something as ugly and complex as mod_rewrite. To that end, I’d prefer not to have full regular expressions in the language, but perhaps they’re necessary.
Doing this right means we need full and complete support for HTTP 1.1, not just the popular subset of POST and GET you see in a lot of products today. This includes:
- Each script has access to the full and complete URL and the full HTTP request by which it was invoked, including headers and body.
- Scripts support all HTTP methods including POST, PUT, and DELETE; not just GET.
- All four operations can (potentially) be applied to the same URL. You don’t necessarily have separate URLs for POST and GET.
- HTTP authentication, both DIGEST and BASIC. Different authentication credentials can be applied to different operations. For instance, authentication might be necessary for PUT or DELETE but not GET. URLs can be added to realms based on criteria more complex than just parent directory. For instance, all URLs on the site containg the string “edit” might require authorization.
- Pluggable authentication to allow experiments with other, hopefully better authentication systems going forward.
- No dependence on cookies.
- Content and language negotiation. It should be possible, for instance, to send one client HTML and another XML+XSLT; or one client French and another English.
- Full support for the various HTTP response codes.
I don’t know any content management systems that meet this requirement. Do you? Is there a system out there than can do this today, or is it a subject for current research? I suspect one could built this on top of the Java Servlet API or Apache. Any takers?