Compression

There are two distinct features that Windows Server 2008 outshines Linux on; and both are centric on compression.

For a very long time Microsoft has supported transparent compression as a part of NTFS; you can designate on a file-by-file or directory level what parts of the file system are compressed by the operating system (applications need do nothing to use compressed files).  This feature was probably originally intended to save the disk foot print of seldom used files; however, with the explosive growth in computing power what’s happened is that compressed files can often be read and decompressed much faster from a disk than a uncompressed file can.  Of course, if you’re modifying say a byte or two in the middle of a compressed file over and over, it might not be a good idea to mark it as compressed — but if you’re basically reading the file sequentially then compression may dramatically increase the overall performance of the system.

The reason for this increase is easy to understand; many files can be compressed ten to one (or better), that means each disk read is reading effectively ten times the information, and for a modern, multi-core, single-instruction/multiple-data capable processor to decompress this stream of data put no appreciable burden on the processing unit(s).

Recently, with SMBv2, Microsoft has expanded the file sharing protocol to be able to transport a compressed data stream, or even a differential data stream (Remote Differential Compression – RDC) rather than necessarily having to send every byte of the file.  This also has the effect of often greatly enhancing the effect data rate, since once again a modern, multi-core, single-instruction/multiple-data capable processor can compress (and decompress) a data stream at a much higher rate than most any network fabric can transmit the data (the exception would be 10G).  In cases of highly constrained networks, or networks with extremely high error rates the increase in effect through put could be staggering.

Unfortunately, Linux lags behind in both areas.

Ext4 does not include transparent compression; and currently no implementation of SMBv2 is available for Linux servers (or clients).

While there’s no question, what-so-ever, that the initial cost of a high performance server is less if Linux is chosen as the operating system, the “hidden” costs of lacking compression may make the total cost of ownership harder to determine.

Supporting transparent compression in a file system is merely a design criteria for a new file system (say Ext5 or Ext4.1); however, supporting SMBv2 will be much more difficult since (unlike SMBv1) it is a closed/proprietary file sharing protocol.

Originally posted 2010-07-11 02:00:49.