To make index construction more efficient, we represent terms as termIDs
(instead of strings as we did in Figure 1.4), where each termID is a unique
serial number. We can build the mapping from terms to termIDs on the fly
while we are processing the collection; or, in a two-pass approach, we compile
the vocabulary in the first pass and construct the inverted index in the
second pass. The index construction algorithms described in this chapter all
do a single pass through the data. Section 4.7 gives references to multipass
algorithms that are preferable in certain applications, for example,when disk
space is scarce.