Topic Specialization Step 1: Design The Topic Element Types

For topic specialization you have to think about several things in designing your specialization:
  • What should the new topic element type name be?
  • What should be allowed within the topic body?
  • Should the topic allow any nested topics?
  • Does the topic require specialized topic-level metadata?

Of course, to answer these questions, you have to first understand the requirements, both for the information content and the information presentation and processing.

For this tutorial, our task is to create a specialization of <concept> that supports the requirements of FAQ information, that is, questions and answers.

I chose FAQs as the subject of the tutorial because they are familiar to most Web users (and now even non-Web users), they are relatively simple (at least on their face) but not so trivial as to be boring, and they have some potential sophistications that could make for interesting exploration beyond the immediate task of "what are the mechanics of defining and implementing a new topic type?" In addition, there are any number of useful and reasonable ways that FAQs could be constructed using DITA; this tutorial presents only one, and not necessarily the best one.

For this tutorial I have decided that the each question should be a separate topic, rather than having one topic that contains multiple question/answer pairs. I have found that making <topic> the primary unit of organization and granularity works well, even if it leads to topics that some people might initially or intuitively think are too small. But I wouldn't go to the mat to defend this design decision, and I don't claim it's necessarily the best. It has a logic I can defend, but that's as far as I'll go.

As you work through the tutorial, take the time to ask yourself how you would have done it and why a different way would or wouldn't be better for some reason. This type of analytical thought is all part of understanding your requirements and mapping them to implementations to create the best possible solution.

For this tutorial, let's define an FAQ as a set of one or more question and answer pairs, where the question is a relatively short statement and the answer may be as short or long as needed. We would like the markup to reflect this essential nature, that is, there should be something named something like "question" and something named something like "answer." There are no particular requirements for the contents of answers themselves. We would like the rendering to clearly identify the question and answer, for example, "Q. Question statement,""A. Answer response." The default topic presentation would not be sufficient in this case.

Note that these requirements are pretty simple. In any sort of engineering activity, it is best to start off as simply as you can and use iterative refinement to satisfy new requirements as you discover them. In the world of agile methods this is known as "the simplest thing that could possibly work."

This approach does several things: it lets you get something working quickly, it gives you immediate practical experience that will feed back into the design and implementation, and it avoids designing and implementing things you don't need. When designing XML markup it is quite easy to over-design and build complex markup structures that nobody actually wants, needs, or perhaps, can understand how to use. I've certainly done my share of this in the past.

I now find it much more effective to start small and build up as needed. Often this refinement process happens over the course of a few hours as I implement a new document type or specialization and start testing it with real data, but sometimes it happens over weeks or months as the new markup design is tested by its target users. In any case, for this tutorial, we will start small and, once we have something working that minimally meets our requirements, we can start thinking about other things we might need.

Another characteristic of agile development methods is "test-driven development," that is, using test cases to drive the implementation, rather than implementing first and testing later. The basic idea is that you write the test case first, which will of course initially fail (because there's no code yet), and then you do the implementation until the test case passes, at which point you know you're done. The test cases reflect the requirements as you understand them, and if the requirements change, you update the test case to reflect your new understanding.

For markup design, this means you first create document instances and then implement the DTD or schema that will validate those instances. When the instances are valid, you know you're done (as long as your instances reflect all the important cases the schema needs to support). This is in contrast to simply going from requirements straight to markup declarations, and then only creating instances after the fact, which is the way we had to do it back in SGML days. One of the nice things about XML is that you can have documents with no document type, so you can start with instances and add DTDs or schemas later.

Names are always important and in this case there is a slight problem with the name to use for the topic element itself, namely "faq," which would be the obvious choice. The problem is that the term "FAQ" can be read as either singular or plural (a set of questions) but here we want a single topic to reflect a single question/answer pair. The name "question-and-answer" might work, except that could also imply a testing environment rather than an FAQ environment. Thus, I have arrived at the name "faq-question"—it's technically redundant but fairly clear and not too long:
<faq-question id="q1">
</faq-question>
The topic title will be the question statement, and it's probably useful to specialize <title> to <faq-question-statement> to make that clear:
<faq-question id="q1">
  <faq-question-statement>Can I add attributes to specific element types?</faq-question-statement>
</faq-question> 

In this case we don't have any particular requirements for the topic body content, so we could leave <body> unspecialized, but since the body will be the FAQ answer, it makes sense to rename <body> to <faq-answer>.

Note that just "answer" is probably too generic. One challenge with DITA 1.x is that because you cannot use namespaces, all element types, including all specialized element types, must be unique. While you can't guarantee that your specialized types won't conflict with somebody else's, it helps if you use names that are specific to your content. This can sometimes lead to names that are longer or more cumbersome than they would need to be if we could use namespaces in DITA 1.x. A good example is the element types defined by the DITA 1.2 Learning and Training vocabulary modules, which all start with "lc" (for "learning content"), which functions as a sort of "namespace prefix" and helps ensure that no other vocabularies will have names that collide with the Learning and Training types.

So our topic body should look like this:
<faq-question id="question-id">
  <faq-question-statement>Can I add attributes to specific element types?</faq-question-statement>
  <faq-answer >
    <p>No, you can only define global attributes, specialized either from &lt;base&gt;
    or @props.</p>
  </faq-answer>
</faq-question>

This design should be sufficient to get us going. Note that we are deferring all issues of how to organize the FAQ questions into FAQs using maps, where we know we already have everything we need to create sets of questions and to do things like group questions into titled groups.