Information Retrieval
Information Retrieval
Index compression Hamid Beigy
Sharif university of technology
October 19, 2018
Hamid Beigy | Sharif university of technology | October 19, 2018 1 / 28
Information Retrieval Index compression Hamid Beigy Sharif - - PowerPoint PPT Presentation
Information Retrieval Information Retrieval Index compression Hamid Beigy Sharif university of technology October 19, 2018 Hamid Beigy | Sharif university of technology | October 19, 2018 1 / 28 Information Retrieval Introduction 1
Information Retrieval
Hamid Beigy | Sharif university of technology | October 19, 2018 1 / 28
Information Retrieval
Hamid Beigy | Sharif university of technology | October 19, 2018 2 / 28
Information Retrieval
Hamid Beigy | Sharif university of technology | October 19, 2018 3 / 28
Information Retrieval | Characterization of an index
Hamid Beigy | Sharif university of technology | October 19, 2018 4 / 28
Information Retrieval | Characterization of an index
word types non-positional postings positional post- ings (word tokens) size of dictionary non-positional index positional index size ∆ cumul. size ∆ cumul. size ∆ cumul. unfiltered 484,494 109,971,179 197,879,290 no numbers 473,723
100,680,242
179,158,204
case folding 391,523
96,969,056
179,158,204
30 stop words 391,493
83,390,443
121,857,825
150 stop words 391,373
67,001,847
94,516,599
stemming 322,383
63,812,300
94,516,599
Hamid Beigy | Sharif university of technology | October 19, 2018 4 / 28
Information Retrieval | Characterization of an index
Hamid Beigy | Sharif university of technology | October 19, 2018 5 / 28
Information Retrieval | Characterization of an index
Hamid Beigy | Sharif university of technology | October 19, 2018 6 / 28
Information Retrieval | Characterization of an index
Hamid Beigy | Sharif university of technology | October 19, 2018 7 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 8 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 8 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 9 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 10 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 11 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 12 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 13 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 14 / 28
Information Retrieval | Compressing the dictionary
Hamid Beigy | Sharif university of technology | October 19, 2018 15 / 28
Information Retrieval | Compressing the posting lists
Hamid Beigy | Sharif university of technology | October 19, 2018 16 / 28
Information Retrieval | Compressing the posting lists
Hamid Beigy | Sharif university of technology | October 19, 2018 16 / 28
Information Retrieval | Compressing the posting lists
encoding postings list the docIDs . . . 283042 283043 283044 283045 . . . gaps 1 1 1 . . . computer docIDs . . . 283047 283154 283159 283202 . . . gaps 107 5 43 . . . arachnocentric docIDs 252000 500100 gaps 252000 248100
Hamid Beigy | Sharif university of technology | October 19, 2018 17 / 28
Information Retrieval | Compressing the posting lists | Using variable-length byte-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 19 / 28
Information Retrieval | Compressing the posting lists | Using variable-length byte-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 20 / 28
Information Retrieval | Compressing the posting lists | Using variable-length byte-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 21 / 28
Information Retrieval | Compressing the posting lists | Using γ-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 23 / 28
Information Retrieval | Compressing the posting lists | Using γ-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 24 / 28
Information Retrieval | Compressing the posting lists | Using γ-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 25 / 28
Information Retrieval | Compressing the posting lists | Using γ-codes
Hamid Beigy | Sharif university of technology | October 19, 2018 26 / 28
Information Retrieval | Conclusion
Hamid Beigy | Sharif university of technology | October 19, 2018 27 / 28
Information Retrieval | Conclusion
Hamid Beigy | Sharif university of technology | October 19, 2018 27 / 28
Information Retrieval | Conclusion
Hamid Beigy | Sharif university of technology | October 19, 2018 28 / 28