Reuters-RCV1 has 100 million tokens

Reuters-RCV1 has 100 million tokens. Collecting all termID–docID pairs of
the collection using 4 bytes each for termID and docID therefore requires 0.8
GB of storage. Typical collections today are often one or two orders of magnitude larger than Reuters-RCV1. You can easily see how such collections
overwhelm even large computers if we try to sort their termID–docID pairs
in memory. If the size of the intermediate ﬁles during index construction is
within a small factor of available memory, then the compression techniques
introduced in Chapter 5 can help; however, the postings ﬁle of many large
collections cannot ﬁt into memory even after compression.
With main memory insufﬁcient, we need to use an external sorting algorithm, that is, one that uses disk. For acceptable speed, the central require

0/5000

From: -

To: -

Results (Thai) 1: [Copy]

Copied!

รอยเตอร์-RCV1 มีสัญญาณ 100 ล้าน รวบรวมทั้งหมด termID–docID คู่
คอลเลกชันโดยใช้ 4 ไบต์สำหรับ termID และ docID จึงต้องการ 0.8
GB เก็บ คอลเลกชันทั่วไปวันนี้มักหนึ่ง หรือสองอันดับของขนาดใหญ่กว่ารอยเตอร์ส-RCV1 คุณสามารถมองเห็นวิธีเช่นชุด
ล้นแม้ขนาดใหญ่คอมพิวเตอร์ถ้าเราพยายามเรียงคู่ของพวกเขา termID–docID
ในหน่วยความจำได้ ถ้าขนาดของ ﬁles กลางระหว่างก่อสร้างดัชนีเป็น
ภายในตัวขนาดเล็กของหน่วยความจำ แล้วเทคนิคบีบอัด
แนะนำในบทที่ 5 สามารถช่วย อย่างไรก็ตาม ﬁle ลงของใหญ่
ชุดไม่ ﬁt ลงในหน่วยความจำแม้หลังจากบีบอัดด้วย
กับหน่วยความจำหลัก insufﬁcient เราจำเป็นต้องใช้การภายนอกเรียงอัลกอริทึม คือ หนึ่งที่ใช้ดิสก์ได้ ความเร็วยอมรับ require กลาง

Being translated, please wait..

Results (Thai) 2:[Copy]

Copied!

สำนักข่าวรอยเตอร์-RCV1 มี 100 ล้านสัญญาณ การเก็บรวบรวมทั้งหมดคู่ termID-docID ของ
คอลเลกชันที่ใช้ 4 ไบต์แต่ละ termID และ docID จึงต้อง 0.8
GB ของการจัดเก็บ คอลเลกชันทั่วไปวันนี้มักจะมีหนึ่งหรือสองคำสั่งของขนาดใหญ่กว่ารอยเตอร์-RCV1 คุณสามารถดูวิธีการที่คอลเลกชันดังกล่าว
ครอบงำแม้เครื่องคอมพิวเตอร์ขนาดใหญ่ถ้าเราพยายามที่จะเรียงลำดับคู่ termID-docID ของพวกเขา
ในหน่วยความจำ ถ้าขนาดของไฟล์กลางในระหว่างการก่อสร้างดัชนีเป็น
ปัจจัยภายในที่มีขนาดเล็กของหน่วยความจำที่มีอยู่แล้วเทคนิคการบีบอัด
ที่รู้จักในบทที่ 5 สามารถช่วย แต่การโพสต์ไฟล์ขนาดใหญ่จำนวนมากของ
คอลเลกชันไม่สามารถใส่ลงไปในหน่วยความจำแม้หลังจากการบีบอัด
ที่มีหน่วยความจำหลัก พอเราจำเป็นต้องใช้ขั้นตอนวิธีการเรียงลำดับภายนอกที่เป็นหนึ่งที่ใช้ดิสก์ สำหรับความเร็วที่ยอมรับกลางจำเป็นต้องใช้

Being translated, please wait..

Results (Thai) 3:[Copy]

Copied!

reuters-rcv1 มี 100 ล้าน สัญญาณ การเก็บรวบรวมทั้งหมด termid – docid คู่
คอลเลกชันโดยใช้ 4 ไบต์แต่ละสำหรับและดังนั้นจึงต้องมี termid docid 0.8
GB ของการจัดเก็บ คอลเลกชันทั่วไปวันนี้มักจะเป็นคำสั่งของขนาด ใหญ่ กว่า reuters-rcv1 หนึ่งหรือสอง คุณสามารถดูคอลเลกชัน
ต่อคอมพิวเตอร์ขนาดใหญ่ เช่น ถ้าเราพยายามที่จะจัดเรียงของพวกเขา termid – docid คู่
ในหน่วยความจำถ้าขนาดของกลางจึงเล ในระหว่างการก่อสร้างดัชนี
ภายในปัจจัยเล็ก ๆของหน่วยความจำแล้วอัดเทคนิค
แนะนำในบทที่ 5 ช่วย อย่างไรก็ตาม การโพสต์จึงเลอคอลเลกชันขนาดใหญ่มากไม่สามารถถ่ายทอดลงในหน่วยความจำ T

แม้หลังจากการบีบอัด ด้วยหน่วยความจําหลัก insuf จึง cient เราต้อง ใช้ขั้นตอนวิธีการเรียงลำดับภายนอก นั่นคือ คนที่ใช้ดิสก์ความเร็วกลาง ต้องยอมรับ

Being translated, please wait..

Other languages

The translation tool support: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Chinese, Chinese Traditional, Corsican, Croatian, Czech, Danish, Detect language, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Frisian, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar (Burmese), Nepali, Norwegian, Odia (Oriya), Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scots Gaelic, Serbian, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Xhosa, Yiddish, Yoruba, Zulu, Language translation.