Craigslist uses MongoDB for archiving purpose
- Data in archive is accessed differently than data in production
- Update schema in archive takes a month, not including slave.
- MySQL concepts carries over to MongoDB
- Indexes
- Master - Slave
- Binary log = oplog
- Shard key selection is easy due to unique postID
- Data stored on spinning disks take long time to access, especially when it grows larger than data on ram.
- It is wise to test on the same machine spec you will be deploying on.
- Automatic failovers in replica sets work great. Instead of manually go in to MySQL database, reset configuration files, system admins can simply watch MongoDB elect new primary.
- Know you data
- Migrating from relational model to document model might cause sizing issue. What happens when data is larger than you think? There are always out-liers.
- String vs Integer, MongoDB is sensitive to the data type stored, for indexing purpose.
- Balancer can your friend, also your enemy
- Insert rate could drop by 40x if too much I/O going on
- Turning off balancer, use pre-splitting if possible.
- If a slave is down too long and can't catch up using oplog, it needs to resync with master and copy data over. Most painful part is index rebuild might take days.
- Solution: having a larger oplog?