Tuesday, March 3, 2009

Numbers Everyone Should Know

Google AppEngine Numbers

This group of numbers is from Brett Slatkin in Building Scalable Web Apps with Google App Engine.

Writes are expensive!

  • Datastore is transactional: writes require disk access
  • Disk access means disk seeks
  • Rule of thumb: 10ms for a disk seek
  • Simple math: 1s / 10ms = 100 seeks/sec maximum
  • Depends on:
    * The size and shape of your data
    * Doing work in batches (batch puts and gets)

    Reads are cheap!

  • Reads do not need to be transactional, just consistent
  • Data is read from disk once, then it's easily cached
  • All subsequent reads come straight from memory
  • Rule of thumb: 250usec for 1MB of data from memory
  • Simple math: 1s / 250usec = 4GB/sec maximum
    * For a 1MB entity, that's 4000 fetches/sec

    Numbers Miscellaneous

    This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google.

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Mutex lock/unlock 100 ns
  • Main memory reference 100 ns
  • Compress 1K bytes with Zippy 10,000 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same datacenter 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns

    The Lessons

  • Writes are 40 times more expensive than reads.
  • Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
  • Architect for scaling writes.
  • Optimize for low write contention.
  • Optimize wide. Make writes as parallel as you can.


  • No comments: