vfzip

We had a deployment issue with Logship. The size of the deployment directory had ballooned with the addition of each new .net core service. At around 5GB, that size was so prohibatively large that build artifact movement between the build servers and artifact storage + build artifact download to production nodes was taking many minutes.

Logship build output

The Problem

.Net Core self-contained executables are big. The publish process packages up a lot of the framework, so that it is available for the executable at runtime. There are a couple of options here:

  • Assembly Trimming (SLOW)
  • Install the .Net Core Runtime (ANNOYING)

So what's the issue. As the number of microservices increases, the number of self-contained executables increases. We end up with the same libraries published into the execution directory of each microservice.

Zip file

First attempt at shrinking the size was to zip up the entire folder for deployment. This worked pretty well at the beginning, but zip performance was bad, and final file size was large. The Zip algorithem also doesn't deduplicate at the file level, just the blocks of content. In the end, the 5GB directory compresses down to about: 1.8GB.

Solution

The issue here is the number of duplicate, but idential, files. vfzip is designed to fix the duplicate file compression problem specifically, acheiving much higher compression ratios.

Process - Compression

  1. First Phase: We hash every file in the target directory, so we can make sure we only store each file one time.
  2. Second Phase: We compress unique files via the Deflate algorithm into a container, indexed by the SHA1 file hash, encoding the directory layout at the end.

Process - Decompression

  1. Inflate Phase: All unique files are extracted from the Deflated file onto disk.
  2. Rebuild Phase: The directory tree is reconstructed from the extracted files. There are two options here, either the extracted files can be copied into their target folders with their new names, or the directory tree can be recreated via symlinks, saving space.

vfzip

vfzip is able to reduce the size of our build artifacts from 5GB to 104MB. Since we only compress each file once, the vfzip container creation process is significantly faster than a zip for directories which contain duplicate files.

Source code

https://github.com/petersulucz/vfzip