Wednesday, April 15, 2015

MongoDB v3.0 Brings Pluggable Storage Engine, and More!

MongoDB v3.0 brings a set of new features. Some of the major roll-outs includes:
  • Pluggable Storage Engine
  • Document-level Locking
  • Compression
For a complete list of features, refer to MongoDB's release note.

Pluggable Storage Engine

Prior to v3.0, MongoDB runs only on  MMAPv1 storage engine. Since acquiring WiredTiger, MongoDB had developed a pluggable storage engine API, which enables it to run on different storage engines. 

List of storage engines:

Storage Engines Status Developed By
MMAPv1 Supported MongoDB
WiredTiger Supported MongoDB
In-Memory In development MongoDB
RocksDB In development RocksDB
InnoDB In development InnoDB
FusionIO In development FusionIO
HDFS In development Hadoop
... ... ...

Pluggable storage engines opens up new possibilities in replica set distribution. Each member of a replica set can run on different storage engine, while sharing the same JSON data model. In an example replica set, different members can run on:
  • WiredTiger for write-heavy workload
  • In-memory for extreme high throughtput
  • HDFS integrates in Hadoop cluster
  • FusionIO, backup engine, etc.
Document-level Locking

MongoDB was notorious for locking at database-level for all write activities. MongoDB suffers data throughput with write-heavy workloads. It had to refer to alternative methods to accommodate write-heavy workload by distribute writes to multiple databases, or distribute on a sharded cluster. With v3.0 WiredTiger engine, MongoDB is able to write at document-level. WiredTiger engine provides improvement to write-heavy application.

In addition, MongoDB v3.0 with the default MMAPv1 engine is able to lock at collection-level. It is also an improvment to the previous database-level lock.

WiredTiger shipped with default Btree algorithm, however, LSM algorithm is available as a configurable option.

  • Read heavy use case: Btree > LSM
  • Write heavy use case: LSM > Btree

Compression

Compression does not exist prior to v3.0. MongoDB v3.0 with WiredTiger engine can compress data in two flavors: Snappy, or Zlib.

  • Snappy - 70% compression ratio, low CPU overhead, default option
  • Zlib - 80% compression ratio, higher CPU overhead, non-default option
Zlib is suitable for archival purpose, as it uses higher CPU overhead and compress at a higher ratio.

Snappy and Zlib compression work on documents and the journal file, while indexes use Prefix compression, compresses indexes at ~50% ratio.

Note: compression ratio may vary depend on use case, on average a 70% compression ratio is observed.

Here's a look at compression size comparison between different storage configuration:

Test load: 1 collection, 500,000 docs, 20KB/doc




















Compare to MMAPv1 which has no compression option, WiredTiger with snappy compression and zlib compression do a good job at compressing data size to about ~84% ratio. However, my question is, does compression affect performance? Look at next blog which we'll benchmark MongoDB with these storage configurations.