The SPIMI method in Section 4.3 is from (Heinz and Zobel 2003). We have
simplified several aspects of the algorithm, including compression and the
fact that each term’s data structure also contains, in addition to the postings
list, its document frequency and house keeping information. We recommend
Heinz and Zobel (2003) and Zobel andMoffat (2006) as up-do-date, in-depth
treatments of index construction. Other algorithms with good scaling propertieswith
respect to vocabulary size require several passes through the data,
e.g., FAST-INV (Fox and Lee 1991, Harman et al. 1992).