Java frameworks and tools overview: 2013

Friday, January 25, 2013

Numbers Everyone Should Know

From Google Pro Tips: Numbers Everyone Should Know

To evaluate design alternatives you first need a good sense of how long typical operations will take. Dr. Dean gives this list:

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from network 10,000,000 ns
Read 1 MB sequentially from disk 30,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

Some things to notice:

Notice the magnitude differences in the performance of different options.
Datacenters are far away so it takes a long time to send anything between them.
Memory is fast and disks are slow.
By using a cheap compression algorithm a lot (by a factor of 2) of network bandwidth can be saved.
Writes are 40 times more expensive than reads.
Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
Architect for scaling writes.
Optimize for low write contention.
Optimize wide. Make writes as parallel as you can.

Example: Generate Image Results Page Of 30 Thumbnails

The is the example given in the video. Two design alternatives are used as design thought experiments.

Design 1 - Serial

Read images serially. Do a disk seek. Read a 256K image and then go on to the next image.
Performance: 30 seeks * 10 ms/seek + 30 * 256K / 30 MB /s = 560ms

Design 2 - Parallel

Issue reads in parallel.
Performance: 10 ms/seek + 256K read / 30 MB/s = 18ms
There will be variance from the disk reads, so the more likely time is 30-60ms

Subscribe to: Posts (Atom)