Index construction interacts with several topics covered in other chapters.
The indexer needs raw text, but documents are encoded in many ways (see Chapter 2). Indexers compress and decompress intermediate files and the
final index (see Chapter 5). In web search, documents are not on a local file system, but have to be spidered or crawled (see Chapter 20). In enter prise search, most documents are encapsulated in varied content manage ment systems, email applications, and databases. We give some examples
in Section 4.7. Although most of these applications can be accessed via http,
native Application Programming Interfaces (APIs) are usually more efficient.
The reader should be aware that building the subsystem that feeds raw text
to the indexing process can in itself be a challenging problem.