Exercise 4.1 If we need Tlog2 T comparisons (where T is the number of termID–docID pairs) and two disk seeks for each comparison, how much time would index construction for Reuters-RCV1 take if we used disk instead of memory for storage and an unopti- mized sorting algorithm (i.e., not an external sorting algorithm)? Use the system parameters in Table 4.1.
Exercise 4.2
How would you create the dictionary in blocked sort-based indexing on the fly to avoid an extra pass through the data?