Journey with MongoDB: April 2015

MongoDB v3.0 brings a set of new features. Some of the major roll-outs includes:

Pluggable Storage Engine
Document-level Locking
Compression

For a complete list of features, refer to MongoDB's release note.

Pluggable Storage Engine

Prior to v3.0, MongoDB runs only on MMAPv1 storage engine. Since acquiring WiredTiger, MongoDB had developed a pluggable storage engine API, which enables it to run on different storage engines.

List of storage engines:

Storage Engines	Status	Developed By
MMAPv1	Supported	MongoDB
WiredTiger	Supported	MongoDB
In-Memory	In development	MongoDB
RocksDB	In development	RocksDB
InnoDB	In development	InnoDB
FusionIO	In development	FusionIO
HDFS	In development	Hadoop
...	...	...

Pluggable storage engines opens up new possibilities in replica set distribution. Each member of a replica set can run on different storage engine, while sharing the same JSON data model. In an example replica set, different members can run on:

WiredTiger for write-heavy workload
In-memory for extreme high throughtput
HDFS integrates in Hadoop cluster
FusionIO, backup engine, etc.

Document-level Locking

MongoDB was notorious for locking at database-level for all write activities. MongoDB suffers data throughput with write-heavy workloads. It had to refer to alternative methods to accommodate write-heavy workload by distribute writes to multiple databases, or distribute on a sharded cluster. With v3.0 WiredTiger engine, MongoDB is able to write at document-level. WiredTiger engine provides improvement to write-heavy application.

In addition, MongoDB v3.0 with the default MMAPv1 engine is able to lock at collection-level. It is also an improvment to the previous database-level lock.

WiredTiger shipped with default Btree algorithm, however, LSM algorithm is available as a configurable option.

Read heavy use case: Btree > LSM
Write heavy use case: LSM > Btree

Compression

Compression does not exist prior to v3.0. MongoDB v3.0 with WiredTiger engine can compress data in two flavors: Snappy, or Zlib.

Snappy - 70% compression ratio, low CPU overhead, default option
Zlib - 80% compression ratio, higher CPU overhead, non-default option

Zlib is suitable for archival purpose, as it uses higher CPU overhead and compress at a higher ratio.

Snappy and Zlib compression work on documents and the journal file, while indexes use Prefix compression, compresses indexes at ~50% ratio.

Note: compression ratio may vary depend on use case, on average a 70% compression ratio is observed.

Here's a look at compression size comparison between different storage configuration:

Test load: 1 collection, 500,000 docs, 20KB/doc

Compare to MMAPv1 which has no compression option, WiredTiger with snappy compression and zlib compression do a good job at compressing data size to about ~84% ratio. However, my question is, does compression affect performance? Look at next blog which we'll benchmark MongoDB with these storage configurations.

Journey with MongoDB

Wednesday, April 15, 2015

MongoDB v3.0 Brings Pluggable Storage Engine, and More!

About Me