Collecting all values (here: docIDs) for a given key (here: termID) into one
INVERTER list is the task of the inverters in the reduce phase. The master assigns each
term partition to a different inverter – and, as in the case of parsers, reassigns
term partitions in case of failing or slow inverters. Each term partition
(corresponding to r segment files, one on each parser) is processed by one inverter.
We assume here that segment files are of a size that a single machine
can handle (Exercise 4.9). Finally, the list of values is sorted for each key and
written to the final sorted postings list (“postings” in the figure). (Note that
postings in Figure 4.6 include term frequencies, whereas each posting in the
other sections of this chapter is simply a docID without term frequency information.)
The data flow is shown for a–f in Figure 4.5. This completes the
construction of the inverted index.