Tuesday, August 27, 2013

Migration, notes from Craigslist


Craigslist uses MongoDB for archiving purpose
  • Data in archive is accessed differently than data in production
    • Update schema in archive takes a month, not including slave.
  • MySQL concepts carries over to MongoDB
    • Indexes
    • Master - Slave
    • Binary log = oplog
  • Shard key selection is easy due to unique postID
  • Data stored on spinning disks take long time to access, especially when it grows larger than data on ram.
  • It is wise to test on the same machine spec you will be deploying on.
  • Automatic failovers in replica sets work great. Instead of manually go in to MySQL database, reset configuration files, system admins can simply watch MongoDB elect new primary.
  • Know you data
    • Migrating from relational model to document model might cause sizing issue. What happens when data is larger than you think? There are always out-liers.
    • String vs Integer, MongoDB is sensitive to the data type stored, for indexing purpose.
  • Balancer can your friend, also your enemy
    • Insert rate could drop by 40x if too much I/O going on
    • Turning off balancer, use pre-splitting if possible.
  • If a slave is down too long and can't catch up using oplog, it needs to resync with master and copy data over. Most painful part is index rebuild might take days.
    • Solution: having a larger oplog?


No comments:

Post a Comment