DITA Document Types, Configuration, and Specialization

In DITA, a "DITA document type" is nothing more or less than a unique set of vocabulary and constraint modules used together in a document. For example, a <concept> document that uses the highlight and indexing domain modules (and no others) reflects the DITA document type consisting of the concept topic type module and the highlight and indexing domain modules. This combination can be expressed by the string "topic concept hi-d indexing-d," read as "the 'concept' topic type, which extends the 'topic' topic type, integrated with the highlight and indexing domains."

This simple list of module names tells you everything you need to know about how to process the document, and how to tell whether or not elements from another DITA document are consistent with the elements in this document.

In DITA documents, this declaration of the set of modules used is specified by the required @domains attribute. e.g.:
<concept id="topic-id"
  domains="(topic concept hi-d indexing-d)"
>
 <title>My Concept</title>
</concept>

Note that you don't need the actual DTD or XSD declarations for the modules, you only need to know the module names.

One implication of this is that DITA documents do not need to have DOCTYPE declarations or XSD schema associations, as long as they specify the set of vocabulary modules they use. Likewise, when a document does have a DOCTYPE or schema association, it doesn't matter what DTD file or XSD document it uses as long as that DTD or XSD accurately reflects the set of modules the document declares it uses.

This means that DITA processors should never depend on a specific DTD or XSD file, because the use of a specific file means nothing. Two DTD or XSD document type shells that reflect the same set of modules define identical DITA document types. This is a fundamental difference between DITA and traditional XML and SGML applications, where the only thing you could know for sure was the specific DTD or XSD file a document used.

For this reason, any system that claims to be a general DITA-aware processor that also requires or expects the use of specific DTD or XSD files is fundamentally broken because it demonstrates a lack of understanding of how DITA document types work.

(Keep in mind that the DITA way of viewing document types is so different from traditional XML practice that it's no surprise that tools and many practitioners would get it wrong, especially tools that reflect an SGML heritage, where the DTD was everything. Unfortunately, some of these tools reflect unfortunate architectural decisions made decades ago that are difficult or impossible to undo. That doesn't mean those tools are not useful or even compelling, just that they will be harder to adapt to locally-defined document types and non-standard-defined vocabulary modules.)

There are three types of module that can be used to define a DITA document type:
  • Structural modules, which define map types or topics types ("map,""bookmap,""topic,""concept," etc.)
  • Domain modules, which define sets of elements usable across map or topic types (the highlighting domain, the programming domain, etc.)
  • Constraint modules, which restrict the content models or attribute lists of specific element types within a specific structural or domain module, for example, the strict task constraint module, which takes the DITA 1.2 general task topic type and restricts it to match the rules of the DITA 1.1 task topic type.

In this module-based approach to vocabulary management there are two things you can do to create DITA document types: configuration and specialization.

The DITA standard defines specific structural, naming, and coding requirements for document type shells and modules that help ensure consistency of design and implementation and make it easy to combine modules into new document types. While these patterns are not technically needed (they have no bearing on the syntactic validity or processability of DITA documents), they make it easier to use and re-use modules and generally keep things consistent. Once you understand the patterns and how the pieces fit together, you will see that creating new specializations and configurations is remarkably easy.

DITA is about interchange and that includes interchange of knowledge and interchange of implementation components, as well as interchange of content. DITA's modular vocabulary approach is designed in part to make the interchange of vocabulary as reliable as the interchange of content. A large part of this is simply standardizing implementation details so that having learned how DITA vocabulary implementation works you should be able to quickly apply that knowledge to any conforming DITA vocabulary, no matter how specialized.