[New-ITS] Java validation (was: RE: Schemas for Validation??)

ann.wrightson@bt.com ann.wrightson at bt.com
Thu Jul 20 16:42:21 BST 2006


Here is a response from Inigo Surguy (inigo.surguy at csw.co.uk):

Compared to W3C XML Schema, I agree that many approaches can provide
better validation, and if that was the only option, then I would agree
that Java validation would be a reasonable choice.

However, I think it makes more sense to use Schematron as a validation
language rather than Java. The pros of the Java validation approach are
all pros of Schematron, too (except probably the speed), and Schematron
is easier to write, and more generally supported.

With the Java approach, you still need some way of specifying the
constraints that will be applied to the document, and the most obvious
choice is to use XPath. Then, it makes sense for the XPaths to be held
in a properties file so they can be changed without recompilation, and
it makes sense to associate an error message with each XPath as well. Go
any distance down this route and you end up reinventing Schematron!

The only significant reason I can see for using Java over Schematron is
for memory and performance efficiency. Constraints can be checked using
a streaming Java SAX implementation without having to read the entire
HL7 document into memory, which allows very large documents to be
handled without problems. Empirically, I found during PSIS development
that reading HL7 via SAX and validating it with some simple constraints
written in Java was about twice as fast as doing the same via Java using
XPath. A Schematron implementation would be slower again than
Java+XPath. However, I don't consider the performance benefit a viable
reason for choosing Java over another validator.

When I was working on PSIS, I did write a simple Java HL7 validator.
However, this worked by applying a W3C Schema, and a Schematron schema.
The Java code made it easy to apply it to multiple messages
simultaneously, and the Schematron made it easy to write the validation
code. The Schematron code was using XSLT 2 Schematron (and hence had
regex support, etc.) with some XSLT 2 functions to abstract common
checks. This worked well, and non-Java programmers were able to extend
it and improve it.

This is the approach I recommend - using Schematron to write the
validation rules, and Java (.NET, whatever) to apply the validation. I
think this suits the strengths of both languages. In addition - for
those constraints which are difficult to express in Schematron, then you
can use Java extension functions from within the XSLT Schematron
implementation. As long as the definition of these extensions is clear,
then they could equally well be implemented in .NET, or pure XSLT, or
any other language - see the EXSLT project for an example of this
approach.

Cheers

Inigo

 


More information about the New-ITS mailing list