Data compression using zstd
Table of Contents
1 Introduction
Zstd (or Zstandard) is a data compression library developed by Facebook. It is especially optimized for high compression and decompression speeds. Another interesting feature of this library is that it offers "dictionary compression," wherein you can train the algorithm with some files, producing a dictionary which needs to be fed to the compressor and decompressor. This gives improved compression for small files. The main use case for dictionary compression is when you have a lot of small files of the same type (same statistics) that need to be compressed separately.
This documentation is particularly helpful for getting started with zstd, and some example source code is provided here.
This tecmint article might also be helpful. The official documentation can be found here.
1.1 Installing zstd
The easiest way is to build from source. You can simply clone the repo and install the libraries as root. Run the following commands in the directory where you want to clone the repo.
$ git clone https://github.com/facebook/zstd.git $ cd zstd $ sudo make install
The include files are generally in /usr/local/include, while the library files are in /usr/local/lib.
1.2 Basic compression and decompression
Download common.h
, simple_compression.c
and simple_decompression.c
to your local folder.
Also create an empty file called emptdict
in this directory
(we won't be doing dictionary compression for now, so this will serve as our dictionary).
To compile simple_compression
, run
$ gcc -Wall -I/usr/local/include/ -c dictionary_compression.c -lm
For linking, use
$ gcc -L/usr/local/lib/ dictionary_compression.o -o compressor -lzstd
This will create an executable file called compressor. To run this, use
$ ./compressor raw_filename emptydict
You can also use terminal commands to compress using zstd:
$ zstd -T4 -10 raw_filename
performs compression of the file raw_filename
using compression level 10, and uses at most 4 threads.
Usage and options for compression and decompression can be found using man zstd
.
One can train the dictionary using
$ zstd --train -B1024 --maxdict=10240 training_file
splits the training_file
into chunks of size 1024 each and creates a dictionary of size 10240B.
The resulting dictionary file can be used for dictionary compression.