Results (
Indonesian) 1:
[Copy]Copied!
B. ofCluster evaluasi kriteria validitasKetika kita berbicara dalam Bagian III, ada banyak gugus validitycriterions dapat digunakan untuk mengevaluasi kinerjamenggunakan Clustering algorithms. Tapi kinerja cluster validitaskriteria mereka sendiri berbeda. Dalam bagian ini, kita akan pertamamengevaluasi kriteria validitas ini dengan menerapkan fitur tunggalPemilihan metode DF pada berbeda dataset yangkinerja telah mencapai pandangan kompatibel dalam inibidang penelitian.DF adalah sederhana namun efektif fitur metode seleksi. Kapanmenerapkan DF pada dataset teks, jika minoritas istilah dihapus,kinerja clustering akan meningkat atau tidak ada kerugian. Kapanlebih banyak istilah dihapus, kinerja clustering akan turuncepat.Nilai-nilai berbeda berlaku kriteria ketika menerapkan DFpada dataset yang berbeda yang menunjukkan dalam gambar 1.Hasil FM AA, RS, yang masing-masing berkisar dari0.5714 untuk 0.7201, dari 0.7370 ke 0.8928, dari 0.1422 untuk0.5157. sebagai dapat dilihat pada gambar 1, empat kurva RS metodepada dataset empat yang berbeda sekitar mengikuti aturan kitadisebutkan di atas. Tetapi kurva sangat lembut, sehingga trentidak berbeda. Empat kurva ofAA adalah semua mengikuti aturan baikkecuali kurva TR45. Kurva FM pada dataset FBIS danREI mengikuti aturan DF dengan baik, sementara kurva TR45dan TR4 1 gelombang secara acak.Jadi untuk hasil dari percobaan pertama kami, AA adalah yang terbaikkriteria validitas. Dan kita dapat melihat bahwa teks dari hasilclassification performance varies greatly on different dataset.The performance of FBIS and REI are much better than theothers. And if we only consider the result of FBIS and REI,AA and FM validity criterions are both good, and FM mayeven better. So in the experiments below, we will mainly useFBIS and REI datasets, as well as AA and FM validitycriterions.V. EVALUATION OF FEATURE SELECTION METHODSThe following experiments we conducted are to comparethe unsupervised feature selection methods DF, TC, TVQ andTV.We chose K-means to be the clustering algorithm.Since K-means clustering algorithm is easily influenced by selection ofinitial centroids, we random produced 5 sets of initialcentroids for each dataset and averaged 5 times performanceas the final clustering performance.The AA and FM results on FBIS and REI are shown in Fig.2 to Fig. 5.From these figures, first, we can see that the unsupervisedfeature selection methods can improve the clusteringperformance when a certain terms are removed. For allmethods in our experiments, at least 70% terms can beremoved with no loss in clustering performance on bothdatasets. And for most feature selection methods, when certainfeatures are removed, the clustering performances can beimproved. For instance, when 20% terms of FBIS areremoved by TC method, it can achieve 9.4% FM valueimprovement.Second, TC is the steadiest method in all. The performanceof clustering will not descend distinctly when terms areremoved. The results of TC method are shown in Fig. 6.Third, TV method is a little worse than TC, but much betterthan DF and TVQ. DF method drop quickly when more than60% terrns are removed, and the performance of TVQ is verybad when more than 70% terms are removed from RE 1dataset. The results of TV method are shown in Fig. 7. Whenno more than 80% terms are removed from datasets by TVmethod, there will be no loss in clustering performance.VI. CONCLUSIONClustering is one of the most important tasks in the datamining process for discovering groups and identifyinginteresting distributions and patterns in the underlying data. Inorder to solve the high dimensionality and inherent datasparsity problems of feature space, feature selection methodsare used. In real case, the class information is unknown, soonly unsupervised feature selection methods can be exploited.In this paper, we evaluate several unsupervised featureselection methods, including DF, TC, TVQ and a newproposed method TV. TC and TV are better than DF and TVQ. We also indicate in this paper that the performances ofdifferent cluster validity criterions are not same, and AA andFM criterions are better for evaluating the clustering results.
Being translated, please wait..
