Project: Performance Evaluation of Distributed Systems 15/16

Organization

Please find the slides of our kick-off meeting last Friday here. If you decide to participate in the project, you can register via email to This email address is being protected from spambots. You need JavaScript enabled to view it. until next Friday, the 22nd of January. Further instructions will follow via email just before the project's start in February.

Instructors

M.Sc. Daniel Berger
M.Sc. Matthias Schäfer

Follow #PEDSproject and @DISCO_Teaching.

Literature Research

Please find the results of the first phase here:

Maximum Compression Ratio

Our goal is to find a trade-off between compression ratio and timeliness. Information which is older than 1 second is considered outdated and must be ignored. So the first requirement is to transmit messages at latest 1s after its reception. However, we first start with finding the algorithm(s) which suit(s) our data best.

Therefore, use this trace of binary sensor data and find the algorithm with the maximum compression ratio. The data is formatted as follows:

<esc> "2" : 6 byte MLAT timestamp, 1 byte signal level, 7 byte Mode-S short frame
<esc> "3" : 6 byte MLAT timestamp, 1 byte signal level, 14 byte Mode-S long frame

where

<esc><esc>: true 0x1a (i.e. 0x1a's within packets are escaped)
<esc> is 0x1a, and "1", "2" and "3" are 0x31, 0x32 and 0x33

The original format description can be found here. The upper 18 bit of the MLAT timestamp are the seconds of the day, the lower 30 bits are the nanoseconds of the second of the day. The Mode-S frames are encoded according to ICAO Annex 10 Volume IV. Find the Mode S message encoding in this file.

Results

The maximum compression ratio of about 2.2 has been achieved with LZMA. An additional 10% improvement can be achieved with some optimizations: reducing the entropy by XORing the CRC and removing ESC chars. However, the high compression ratio is paid for with a high computation time of up to 4 minutes. Find the presentation with the details below:

Stripped Data

Group 3 provided a clean dataset without the escape characters (so each message directly starts with 2 -> short message (15 Bytes) or 3 -> long message (22 Bytes)), they removed unnecessary messages, and they XORed the CRC so that the entropy is lower. They also provided the chunks needed for the next task. Find both, the complete clean and optimized dataset as well as the chunks here.

Chunk Size vs. Compression Ratio & Profiling

In the third task, we investigated the effect of the datasize on the compression ratio and compression time. Since we aim at compressing very small chunks of data, the investiated chunk sizes are 1, 2, 4, ..., 1024 Radarcape messages. Three compression algorithms were tested: LZMA, deflate (gzip, zlib), and burrows wheeler. Interestingly, while in the previous task LZMA was the outstanding winner with the best compression ratio for the complete dataset, the results of this experiment show that for these very small datasets, all algorithms and compression levels perform more or less equally good.

In addition to the compression ratio for different chunksizes, we profiled the execution time of the different steps of compression algorithms. The results clearly show, that matching the longest symbols is the most expensive task for dictionary-based compression schemes.

Find the presentation with the details below:

Stream-Compression Implementation

Two-week task: March 7 until March 18.

PEDS Project Report

Use Latex and this template (DiscoReport.zip).

Intro and Problem Statement (2 pages)
Literature Research + Table of Compression Algorithms (2 pages)
Performance Evaluation of Classical Algorithms
1. Max Compression Ratio + plot (1 page)
2. Compression Ratio with Message Chunks + plot (1 page)
3. Profiling of Execution Time + plot (1 pages)
Stream Compression System Design (2 pages)
Performance Evaluation of Stream Compression System + plots (3 pages)
Conclusions and Future Work (1 page)