From Google Pro Tips: Numbers Everyone Should Know
To evaluate design alternatives you first need a good sense of how long typical operations will take. Dr. Dean gives this list:
- L1 cache reference 0.5 ns
- Branch mispredict 5 ns
- L2 cache reference 7 ns
- Mutex lock/unlock 100 ns
- Main memory reference 100 ns
- Compress 1K bytes with Zippy 10,000 ns
- Send 2K bytes over 1 Gbps network 20,000 ns
- Read 1 MB sequentially from memory 250,000 ns
- Round trip within same datacenter 500,000 ns
- Disk seek 10,000,000 ns
- Read 1 MB sequentially from network 10,000,000 ns
- Read 1 MB sequentially from disk 30,000,000 ns
- Send packet CA->Netherlands->CA 150,000,000 ns
Some things to notice:
- Notice the magnitude differences in the performance of different options.
- Datacenters are far away so it takes a long time to send anything between them.
- Memory is fast and disks are slow.
- By using a cheap compression algorithm a lot (by a factor of 2) of network bandwidth can be saved.
- Writes are 40 times more expensive than reads.
- Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
- Architect for scaling writes.
- Optimize for low write contention.
- Optimize wide. Make writes as parallel as you can.
Example: Generate Image Results Page Of 30 Thumbnails
The is the example given in the video. Two design alternatives are used as design thought experiments.
Design 1 - Serial
- Read images serially. Do a disk seek. Read a 256K image and then go on to the next image.
- Performance: 30 seeks * 10 ms/seek + 30 * 256K / 30 MB /s = 560ms
Design 2 - Parallel
- Issue reads in parallel.
- Performance: 10 ms/seek + 256K read / 30 MB/s = 18ms
- There will be variance from the disk reads, so the more likely time is 30-60ms