Where academic tradition
meets the exciting future

Interpolative Coding of Integer Sequences Supporting Log-Time Random Access

Jukka Teuhola, Interpolative Coding of Integer Sequences Supporting Log-Time Random Access. Information Processing and Management 47(5), 742–761, 2011.

http://dx.doi.org/10.1016/j.ipm.2010.11.006

Abstract:

Sequences of integers are common data types, occurring either as primary data or ancillary structures. The sizes of sequences can be large, making compression an interesting option. Effective compression presupposes variable-length coding, which destroys the regular alignment of values. Yet it would often be desirable to access only a small subset of the entries, either by position (ordinal number) or by content (element value), without having to decode most of the sequence from the start. Here such a random access technique for compressed integers is described, with the special feature that no auxiliary index is needed. The solution applies a method called interpolative coding, which is one of the most efficient non-statistical codes for integers. Indexing is avoided by address calculation guaranteeing sufficient space for codes even in the worst case. The additional redundancy, compared to regular interpolative coding, is only about one bit per source integer for uniform distribution. The time complexity of random access is logarithmic with respect to the source size for both position-based and content-based retrieval. According to experiments, random access is faster than full decoding when the number of accessed integers is not more than approximately 0.75n/log2 n for sequence length n. The tests also confirm that the method is quite competitive with other approaches to random access coding, suggested in the literature.

BibTeX entry:

@ARTICLE{jTeuhola_Jukka11b,
  title = {Interpolative Coding of Integer Sequences Supporting Log-Time Random Access},
  author = {Teuhola, Jukka},
  journal = {Information Processing and Management},
  volume = {47},
  number = {5},
  publisher = {Elsevier},
  pages = {742–761},
  year = {2011},
  keywords = {source coding, data compression, random access, interpolative coding, inverted index},
}

Belongs to TUCS Research Unit(s): Algorithmics and Computational Intelligence Group (ACI)

Publication Forum rating of this publication: level 3

Edit publication