We can do better than Θ(T
2/n) by introducing log2
(T/n) indexes I0, I1
,
I2, . . . of size 20 × n, 21 × n, 22 × n . . . . Postings percolate up this sequence of indexes and are processed only once on each level. This scheme is called log-arithmic merging (Figure 4.7). As before, up to n postings are accumulated in
an in-memory auxiliary index, which we call Z0. When the limit n is reached,
the 20 × n postings in Z0 are transferred to a new index I0 that is created on
disk. The next time Z0 is full, it is merged with I0 to create an index Z1 of size
2
1× n. Then Z1
is either stored as I1
(if there isn’t already an I1
) or merged with I1 into Z2 (if I1 exists); and so on. We service search requests by query-ing in-memory Z0 and all currently valid indexes Ii on disk and merging the
results. Readers familiar with the binomial heap data structure2 will recog-nize its similarity with the structure of the inverted indexes in logarithmic
merging