Assessing the performance of expert finding tools should take a multidimensional tact. Of
course it is important that the system actually be able to find experts. Accordingly
technical performance measures such as those described in the previous section (e.g., the
precision and recall of a returned expert list) are important. One key to comparing and
contrasting systems is a common data set – lists of experts and sources from which that
expertise can be inferred. Unfortunately very few organizations assess the performance of
their expert finders much less benchmark them against a standard data set or expert
finding task. Fortunately, the Text Retrieval and Evaluation Conference (TREC)
Enterprise track evaluated both email search and expert search (Craswell et al. 2005). In
the latter task, 9 groups participated in the first expertise search task which sought to find
experts from 331,037 documents retrieved from the World Wide Web Consortia (W3C)
(*.w3.org) site in June 2004. Given 50 topical queries, find the list of W3C people who
are experts in that topic area given a list of 1092 candidate experts. Ten training queries
were provided. The Mean Average Precision (MAP) of the best system was .275 MAP.
MAP is the mean of the average precisions over a set of queries after each document is
retrieved. This measure gives better scores to techniques that return more relevant
documents earlier. US, European and Chinese organizations participated. Results from
TREC are displayed in Figure 4. Unfortunately the commercial solutions described in the
next section have not yet been assessed against this benchmark.