Working around a bug in Microsoft.Bcl.Compression
Published on Thursday, September 12, 2013 1:00:00 PM UTC in Programming
A short while ago the .NET team released a portal library that supports compression streams as well as zip archives on a various set of platforms, including .NET, Windows Phone 8 and .NET for Windows Store apps. The library is named Microsoft.Bcl.Compression and if you're not familiar with it you can find more information about it in the official blog. I was eager to try it out, and today I finally came around doing so – and instantly hit a bug that I think is worth talking about so you don't have to spend hours working around it, like I did.
What I wanted to do
First of all let me give you some context. This may be important as the library uses different implementations based on the target platform. For example, if you use this on Windows Phone devices you have to target ARM, for other situations this has to be x86 however. Behind the scenes, depending on your selection different native dependencies are packaged, so you might get deviating results in different situations. For me, what I used was:
- Windows Phone 8
- In the emulator (i.e. target platform x86)
The actual code I was trying to use did not involve any magic. I simply wanted to take a string and compress it to a byte array using GZip. For the following sample code I left out all the unrelated additional stuff of the original implementation for clarity:
using (var targetStream = new MemoryStream())
using (var compressedStream = new GZipStream(targetStream, CompressionMode.Compress))
var textAsBytes = Encoding.UTF8.GetBytes(myText);
compressedStream.Write(textAsBytes, 0, textAsBytes.Length);
var compressedBytes = targetStream.ToArray();
// do something with the compressed bytes...
Simple as that.
By the way, this style of implementation is in line with e.g. the example in the official documentation on MSDN. The difference is that the official example works with a file stream whereas my code uses a memory stream.
What I got
To my surprise, this resulted in a 10 byte compressed byte array for my sample string. For a second I was amazed by the brutal compression ratio :). Until of course I realized that nothing had been compressed at all. The 10 bytes of data were only the gzip header information.
From there I spent almost two hours fiddling with the details, trying different alternatives to write to the stream in all sorts of ways I could think of, inspecting all involved objects in the debugger for minutes, googling the heck out of Bing – you get the idea.
Finally I realized that the library passes a value of "NoFlush" all the way down to the involved native ZLib implementation for each write you do. I had anticipated that and tried to explicitly flush the stream in my attempts multiple times, but some decompilation ninja skills (provided by the folks at JetBrains :)) revealed that the flush method does exactly nothing except throwing an exception when the stream is disposed – well thank you.
Digging a bit deeper I realized that there is some finalization logic for the underlying implementation that actually forces writing all pending bytes to the target stream, but only when you dispose the compression stream… yay?
The workaround to make this work is that you need to dispose the compression stream and at the same time tell it to not dispose the target stream (which is the default behavior). The latter would prevent you from actually using the now correct result because you wouldn't be able to access the disposed target stream's content anymore. Luckily, there's an overload for the constructor that lets you control exactly that behavior. This slight change in the code brings the intended results:
The constructor argument of "true" forces the compressed stream to not close the target stream when it is disposed, and of course the actual conversion to a byte array additionally needs to be moved outside of the using statement of the compression stream to complete this logic – done!
Again it looks simple, but as I mentioned it really took me a while to figure it out. I hope that this post will save you from falling into the same pit, and that it will also help me remember the issue whenever I use the library the next time.
Update: Initially I wanted to create a connect bug report for this, until I stumbled upon this existing entry. Apparently the behavior is similar on .NET 4.5 and a breaking change from 4.0. Unfortunately the team decided not to fix the issue, so you have to be really careful not to corrupt your data when using any of the involved libraries in this way :-/. The subtle problem here is that it works for example for typical usages of file streams, but not for the typical use of memory streams. This inconsistency, in my eyes, is quite problematic.