Monday, 4 June 2012

Delays and Rates

From a recent post on, vertically aligned for ease of comparison, with corresponding rates to better understand the implications:

(Edit: Expanded with numbers from a recent Ars Technica article on SSDs)

Register                                << 1 ns
L1 cache reference (lower bound)         < 1 ns 2,000,000,000 Hz
L1 cache reference (upper bound)           3 ns   333,333,333 Hz
Branch mispredict                          5 ns   200,000,000 Hz
L2 cache reference                         7 ns   142,857,143 Hz
L3 cache reference                        20 ns    50,000,000 Hz
Mutex lock/unlock                         25 ns    40,000,000 Hz
Main memory reference                    100 ns    10,000,000 Hz
Compress 1K bytes with Zippy           3,000 ns       333,333 Hz
Send 2K bytes over 1 Gbps network     20,000 ns        50,000 Hz
Read 1 MB sequentially from memory   250,000 ns         4,000 Hz
Round trip within same datacenter    500,000 ns         2,000 Hz
Disk seek (lower bound)            3,000,000 ns           333 Hz

Disk seek (upper bound)           10,000,000 ns           100 Hz
Read 1 MB sequentially from disk  20,000,000 ns            50 Hz
Send packet CA->Netherlands->CA  150,000,000 ns           < 7 Hz

Surprises and take home lesson(s):

1. Data intensive (I/O bound) systems are REALLY slow compared to the raw CPU grunt that is available.
2. Within-datacenter Network I/O is faster than disk I/O.
3. It makes sense to think about network I/O in the same way as we used to think about the SIMD/AltiVec/CUDA tradeoff. The payoff has to be worth while, because the packaging/transfer operations are expensive.
4. Branch mis-prediction is actually pretty expensive compared to L1 cache. For CPU bound inner-loop code, it makes sense to spend a bit more time trying to avoid branching. 

Here is the table from Ars Technica:

Level                Access time    Typical size
Registers        "instantaneous"    under 1 KB
Level 1 Cache                1-3 ns      64 KB per core
Level 2 Cache               3-10 ns     256 KB per core
Level 3 Cache              10-20 ns    2-20 MB per chip
Main Memory                30-60 ns    4-32 GB per system
Hard Disk   3,000,000-10,000,000 ns