At first, the system splits all of sentences, which are title of
articles in the expert publication database. It splits sentences to
sub sentences by punctuations.
Second, the system parses clauses from those split
sentences by Stanford NLP toolkits [11]. There are several
types in those clauses. However, there are two reasons that let
us choice two type clauses, “Noun+Noun” and “(Adj |
Noun)+Noun," in the system. One is that queries are some
terms of domain, and they are nouns. The other is they are
short terms of noun in general.
Third, the system makes candidates of extend queries by Cvalue
method. C-value is an Automatic Term Recognition
(ATR) measure. It suits the measure that the input is a large
corpus; output is terms of the domain, and domain is very
specific [9]. C-value method ranks all clauses base on the
frequency of clause and the times of nested clause [10]. The
equation is as (1).