18.4.3 Evaluation
The current proxy-based architecture has been used as the basis of a long-term trial of the CWS approach in a corporate search scenario. In this section we will describe some recent results drawn from this trial, which speak to the value of the community-based promotions offered by CWS. The trial participants included the 70+ employees of a local Dublin software company where the CWS architecture was configured to work with the standard Google search engine so that all Google requests were redirected through the CWS system. The search experience was based on the standard Google interface with a maximum of 3 results promoted (and annotated with explanations) in any session;
if more than 3 promotions were available then non-promoted results were annotated with explanation icons, but left in their default Google position. The results presented here are drawn from just over 10 weeks of usage and cover a total of 12,621 individual search sessions. One of the challenges in evaluating new search technologies in a natural setting is how to evaluate the quality of individual search sessions. Ideally we would like to capture direct relevance feedback from users as they search. While it would be relatively straightforward to ask users to provide such feedback during each session, or as they selected specific results, this was not feasible in the current trial because participants were eager to ensure that their search experience did not deviate from the norm, and were unwilling to accept pop-ups, form-filling or any other type of additional feedback. As an alternative, in this evaluation, we used a less direct measure of relevance based on the concept of a successful session (see also [92, 91]). We define a successful session to be one where at least one search result has been selected, indicating that the searcher has found at least one (partially) relevant result. In contrast, search sessions where the user does not select any results are considered to be unsuccessful, in the sense that the searcher has found no relevant results. While this is a relatively crude measure of overall search performance, it at least allows us to compare search sessions in a systematic way.