FirstClown

firstclown at firstclown.us

Archive for January 17th, 2006

On Not Validating Your XML

There's an issue with XML that doesn't seem to be getting much attention. I've always assumed that it was common knowledge or at least common sense, but I've seen quite a few problems at work related to it, so I thought I'd throw it out here and see what the geeks think.

What I assumed to be a golden rule of XML appears to not be so. That rule is:

Never validate your XML against the schema in production level code.

Am I wrong here? Does anyone else see this as an important rule?

Here's my reasoning:

The XML schema is there to lay some rules on what the XML document should contain. It's there to tell you what you expect to get from an XML document and, more importantly, it's there to tell the other guy how to format his XML. Using the schema during development of your XML app is very important because you need to make sure you're getting valid XML to work with. If you didn't verify the XML during development, you might inadvertantly think that your program was broke when it was really the XML. The schema is also helpful in writing the app, since it will explain every possible permutation of XML you'll be getting.

But if it's good for development, isn't it good for production? I say nay. If I'm getting bad XML at production and my production code can't handle it, it's not production code.

  1. Checking the XML against the schema is costly. It's not too costly, but if your reading a lot of XML docs, it adds up fast.
  2. If the DTD is unreachable, your application bombs(usually). There are quite a few parsers out there that die on you if you tell it to validate your XML and the schema isn't there. That's not good for production.
  3. If the XML's bad acording to the schema, that doesn't mean the whole document is bad. If you have one value that's supposed to be a number and it's a string instead, that doesn't mean that every other value in the document is bad. If the document is atomic, fine, kill the whole thing, but most XML documents aren't that I've worked with.

Basically, my thinking is that the XML creater should be checking the XML against the schema before it's sent, but the benefits aren't there for the consumer to do the same. If the XML is really bad, meaning malformed, it's obviously unusable to the consumer, but a parser will catch that anyway. I should be doing data validation in my program already, at least at the database level, so I don't need the schema for that. If a tag is named wrong, I'll be able to tell that in my parsing code and act accordingly without killing the whole app. The usefulness of validating on production escapes me and I've seen plenty of times when it shouldn't be happening.

Anyways, I've always thought this was common practice, but I still see a lot of code out there that dies on "DTD not found" errors. If Apache.org goes down or Sun or if I delete my DTD on accident, it shouldn't kill my program. My program should not depend on another site being up.

The thing is, I searched for this on Google and didn't find anyone else talking about it. Is this the "right way to do it", or is it just me?

FirstClown is powered by WordPress
Entries (RSS) and Comments (RSS).