NAME: Compressing sets and multisets of sequences AUTH: Christian Steinruecken YEAR: 2015 MNTH: 03 FROM: itrit XDOI: 10.1109/TIT.2015.2392093 VOLM: 61 NMBR: 3 PAGE: 1485-1490 ABST: This article describes lossless compression algorithms for multisets : of sequences, taking advantage of the multiset's unordered structure. : Multisets are a generalisation of sets where members are allowed to : occur multiple times. A multiset can be encoded naïvely by simply : storing its elements in some sequential order, but then information is : wasted on the ordering. We propose a technique that transforms the : multiset into an order-invariant tree representation, and derive an : arithmetic code that optimally compresses the tree. Our method : achieves compression even if the sequences in the multiset are : individually incompressible (such as cryptographic hash sums). The : algorithm is demonstrated practically by compressing collections of : SHA-1 hash sums, and multisets of arbitrary, individually encodable : objects. ANN0: Journal version. : ArXiv preprint: steinruecken2014a, : DCC proceedings: steinruecken2014b