Results (
Thai) 1:
[Copy]Copied!
At first, the system splits all of sentences, which are title ofarticles in the expert publication database. It splits sentences tosub sentences by punctuations.Second, the system parses clauses from those splitsentences by Stanford NLP toolkits [11]. There are severaltypes in those clauses. However, there are two reasons that letus choice two type clauses, “Noun+Noun” and “(Adj |Noun)+Noun," in the system. One is that queries are someterms of domain, and they are nouns. The other is they areshort terms of noun in general.Third, the system makes candidates of extend queries by Cvaluemethod. C-value is an Automatic Term Recognition(ATR) measure. It suits the measure that the input is a largecorpus; output is terms of the domain, and domain is veryspecific [9]. C-value method ranks all clauses base on thefrequency of clause and the times of nested clause [10]. Theequation is as (1).In the equation (1), f(c) is the frequency of clause c.f (nested) is the frequency of nested clause. |c| is the length ofclause c. Tc is the set of clauses that contains c. |Tc| means thenumber of the set of clauses.Fourth, the system extends query items according to the Cvalueof candidate clauses. It sets the average of C-value as thethreshold. Then it picks out all of the clauses that the C-value isgreater than the average. Every selected clause must bedifferent. After this phase, the system can get a set of extendedqueries { } 1 ,..., n Q = q q .
Being translated, please wait..
