Fast Dictionary-Based Compression for Inverted Indexes

Giulio Ermanno Pibiri1, Matthias Petri2, Alistair Moffat3

1 The University of Pisa, Italy; giulio.pibiri@di.unipi.it 

2 The University of Melbourne, Australia; matthias.petri@gmail.com

3 The University of Melbourne, Australia; ammoffat@unimelb.edu.au

Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed.

This talk will provide an overview of work that will be published in the Proceedings of the 12th ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11-15 February 2019.

 

 

ABOUT CORE

The Computing Research and Education Association of Australasia, CORE, is an association of university departments of computer science in Australia and New Zealand. Prior to 2004 it was known as the Computer Science Association, CSA.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2018 Conference Design Pty Ltd