A. Document Frequiency (DF)Document frequency is the number of documen translation - A. Document Frequiency (DF)Document frequency is the number of documen Indonesian how to say

A. Document Frequiency (DF)Document

A. Document Frequiency (DF)
Document frequency is the number of documents in which a
term occurs in a dataset. It is the simplest criterion for term
selection and easily scales to a large dataset with linear
computation complexity. A basic assumption of this method is
that terms appear in minority documents are not important or
will not influence the clustering efficiency. It is a simple but
effective feature selection method for text categorization [9].
B. Term Contributtion (TC)
Because the simple method like DF assumes that each term
is of same importance in different documents, it is easily
biased by those common terms which have high document
frequency but uniform distribution over different classes. TC
is proposed to deal with this problem [10].
We will introduce TF.IDF (Term Frequency Inverse
Document Frequency) first [11]. TF.IDF synthetically
considers the frequency of a term in a document and the
document frequency of the term. It believes that if a term
appears in too many documents, it's too common and not
important for clustering. So Inverse Document Frequency is
considered. That is, if the frequency of a term in a document is
high and it does not appear in many documents, the term is
important. A common form of TF.IDF is



The result of text clustering is highly dependent on the
documents similarity. So the contribution of a term can be
viewed as its contribution to the documents' similarity. The
similarity between documents Di and D is computed by dot
product:




Term variance quality method is introduced by lnderjit
Dhillon, Jacob Kogan and Charles Nicholas [12]. It follows
the ideas of Salton and McGill [13]. The quality of the term t
is measured as follows:


Where n is the number of documents in which t occurs at
least once, and fij>=I,j=1,...,n.



We introduce a new method called Term Variance to
evaluate the quality of terms. That is to compute the variance
of every term in all dataset. Methods like DF assume that each
term is of same importance in different documents, it is easily
biased by those common terms which have high document
frequency but uniform distribution over different classes. TV
follows the idea of DF that the terms with low document
frequency is not important and can solve the problem above at
the same time. A term appears in very few documents or has
uniform distribution over documents will have a low TV value.
The quality of the term is measured as follows:
0/5000
From: -
To: -
Results (Indonesian) 1: [Copy]
Copied!
A. dokumen Frequiency (DF)Dokumen frekuensi adalah jumlah dokumen yangistilah terjadi dalam dataset. Ini adalah kriteria yang paling sederhana untuk jangkaseleksi dan mudah timbangan untuk dataset besar dengan linearkompleksitas komputasi. Asumsi dasar dari metode ini adalahbahwa istilah muncul dalam minoritas dokumen tidak penting atautidak akan mempengaruhi efisiensi clustering. Ini adalah sederhana namunFitur efektif metode seleksi untuk teks kategorisasi [9].B. jangka Contributtion (TC)Karena metode sederhana seperti DF mengasumsikan bahwa setiap istilahadalah sama pentingnya dalam dokumen yang berbeda, itu adalah mudahbias oleh istilah tersebut umum yang memiliki tinggi dokumenfrekuensi tapi distribusi seragam atas kelas yang berbeda. TCdiusulkan untuk menangani masalah ini [10].Kami akan memperkenalkan TF. IDF (istilah frekuensi inversDokumen frekuensi) pertama [11]. TF. IDF sintetikmempertimbangkan frekuensi istilah dalam dokumen dandokumen frekuensi istilah. Percaya bahwa jika istilahmuncul dalam dokumen-dokumen yang terlalu banyak, terlalu umum dan tidakpenting untuk pengelompokan. Jadi invers dokumen frekuensidianggap. Yaitu jika frekuensi istilah dalam dokumentinggi dan tidak muncul dalam banyak dokumen, istilahpenting. Bentuk umum TF. IDF adalahHasil teks pengelompokan sangat tergantung padadokumen kesamaan. Jadi kontribusi dari istilah yang dapatdilihat sebagai kontribusinya terhadap dokumen-dokumen yang kesamaan. Thekesamaan antara dokumen Di dan D dihitung dengan dotProduk:Istilah varians kualitas metode diperkenalkan oleh lnderjitDhillon, Jacob Kogan dan Charles Nicholas [12]. Ini mengikutiide-ide Salton dan McGill [13]. Kualitas t istilahdiukur sebagai berikut:Dimana n adalah jumlah dokumen di mana t terjadi padasetidaknya sekali, dan fij > = I, j = 1,..., n.Kami memperkenalkan metode baru yang disebut istilah varians untukmengevaluasi kualitas persyaratan. Itu adalah untuk menghitung varianssetiap periode dalam semua dataset. Metode seperti DF berasumsi bahwa masing-masingistilah sama penting dalam dokumen yang berbeda, itu adalah mudahbias oleh istilah tersebut umum yang memiliki tinggi dokumenfrekuensi tapi distribusi seragam atas kelas yang berbeda. TVmengikuti gagasan DF yang berdamai dengan rendah dokumenfrekuensi ini tidak penting dan dapat memecahkan masalah di atas disaat yang sama. Istilah yang muncul dalam dokumen-dokumen yang sangat sedikit atau memilikidistribusi seragam atas dokumen akan memiliki nilai TV yang rendah.Kualitas istilah diukur sebagai berikut:
Being translated, please wait..
 
Other languages
The translation tool support: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Chinese, Chinese Traditional, Corsican, Croatian, Czech, Danish, Detect language, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Frisian, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar (Burmese), Nepali, Norwegian, Odia (Oriya), Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scots Gaelic, Serbian, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Xhosa, Yiddish, Yoruba, Zulu, Language translation.

Copyright ©2025 I Love Translation. All reserved.

E-mail: