The algorithm parses documents into

The algorithm parses documents into termID–docID pairs and accumulates the pairs in memory until a block of a ﬁxed size is full(PARSENEXTBLOCK in Figure 4.2). We choose the block size to ﬁt comfortably into memory to permit a fast in-memory sort. The block is then inverted and written to disk. Inversion involves two steps. First, we sort the termID–docID pairs. Next, we collect all termID–docID pairs with the same termID into a postings list, where a posting is simply a docID. The result, an inverted index for the block we have just read,is then written to disk. Applying this to Reuters-RCV1 and assuming we can ﬁt 10 million termID–docID pairs into memory, we end up with ten blocks, each an inverted index of one part of the collection.

0/5000

From: -

To: -

Results (Thai) 1: [Copy]

Copied!

อัลกอริทึมการวิเคราะห์เอกสารเป็น termID–docID คู่ และสะสมคู่ในหน่วยความจำจนกว่าจะบล็อกขนาด ﬁxed (PARSENEXTBLOCK ในรูป 4.2) เราเลือกบล็อกขนาด ﬁt สบายในหน่วยความจำเพื่ออนุญาตให้มีการเรียงลำดับอย่างรวดเร็วในหน่วยความจำ บล็อคแล้วจะกลับ และการเขียนดิสก์ กลับเกี่ยวข้องกับขั้นตอนที่สอง ครั้งแรก เราเรียงลำดับคู่ termID–docID ถัดไป เรารวบรวมทั้งหมด termID–docID คู่กับ termID เดียวกันรายการลงรายการบัญชี docID เพียงการลงรายการบัญชี ผล ดัชนีกลับสำหรับบล็อคที่เราอ่าน แล้วเขียนลงในดิสก์ ใช้นี้รอยเตอร์ RCV1 และสมมติว่า เราสามารถ termID–docID ﬁt 10 ล้านคู่ในหน่วยความจำ เราจบลง ด้วย 10 บล็อก แต่ละดัชนีกลับของส่วนหนึ่งของคอลเลกชัน

Being translated, please wait..

Results (Thai) 2:[Copy]

Copied!

ขั้นตอนวิธีการแยกวิเคราะห์เอกสารที่เป็นคู่ termID-docID และสะสมคู่ในความทรงจำจนบล็อกที่มีขนาดคงที่ที่เต็มไปด้วย (PARSENEXTBLOCK ในรูปที่ 4.2) เราเลือกขนาดของบล็อกเพื่อให้พอดีกับความสะดวกสบายในหน่วยความจำที่จะอนุญาตให้มีการจัดเรียงในหน่วยความจำได้อย่างรวดเร็ว บล็อกจะกลับแล้วและเขียนไปยังดิสก์ ผกผันเกี่ยวข้องกับสองขั้นตอน ครั้งแรกที่เราเรียงลำดับคู่ termID-docID ต่อไปเราจะเก็บทุกคู่ termID-docID กับ termID เดียวกันในการโพสต์รายชื่อที่โพสต์เป็นเพียง docID ผล, ดัชนี inverted บล็อกที่เราได้อ่านเพียงแค่เขียนแล้วไปยังดิสก์ ใช้นี้เพื่อรอยเตอร์-RCV1 และสมมติว่าเราสามารถใส่ 10 ล้านคู่ termID-docID ในหน่วยความจำที่เราจบลงด้วยสิบบล็อกแต่ละดัชนี inverted ของส่วนหนึ่งของคอลเลกชัน

Being translated, please wait..

Results (Thai) 3:[Copy]

Copied!

วิธีวิเคราะห์เอกสารใน termid – docid คู่และสะสมคู่ในหน่วยความจำจนกว่าบล็อกของจึง xed ขนาดเต็ม ( parsenextblock ในรูปที่ 4.2 ) เราเลือกขนาดบล็อกที่จะถ่ายทอดลงในหน่วยความจำเพื่อให้ไม่สบายรวดเร็วในการจัดเรียงหน่วยความจำ บล็อกเป็นแล้วเอามาเขียนลงดิสก์ การเกี่ยวข้องกับสองขั้นตอน ครั้งแรกที่เราเรียง termid – docid คู่ ต่อไปเรารวบรวมทั้งหมด termid – docid คู่กับ termid ลงในรายการการโพสต์ที่โพสต์เป็นเพียง docid . ผล กลับดัชนีบล็อกเราได้แค่อ่าน แล้วเขียนไปยังดิสก์ ใช้สิ่งนี้เพื่อ reuters-rcv1 และสมมติว่าเราสามารถถ่ายทอด T 10 ล้าน termid – docid คู่เข้าไปในหน่วยความจำ เราลงเอยกับสิบบล็อก แต่ละกลับดัชนีส่วนหนึ่งของคอลเลกชัน

Being translated, please wait..

Other languages

The translation tool support: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Chinese, Chinese Traditional, Corsican, Croatian, Czech, Danish, Detect language, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Frisian, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar (Burmese), Nepali, Norwegian, Odia (Oriya), Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scots Gaelic, Serbian, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Xhosa, Yiddish, Yoruba, Zulu, Language translation.