Skip to main content
10 Commandments for Code Optimization
0x01 Know Your Data
- organize data on few cachelines with related items close to each other
- utilize today's massive cache prefetches
- align your data properly - have values on boundaries of their size
- use proper access sizes - bytes might be faster than shifting words
0x02 Synchronize
- do NOT put more than one lock object per cacheline where feasible
- sometimes it's still good to use lock-bitmaps
- avoid atomic access in tight loops - poll volatiles in RAM instead of atomic operations
0x03 Know Your Memory
- virtual memory works on pages
- caches operate in lines and/or strides
- aligned access is the way to go
0x04 Prefetch
- utilize cache-coloring to hopefully reduce cacheline collisions - randomize cacheline addresses of accessed data
- leverage massive cache prefetches
- e.g. hash table chains may be implemented as flat tables
- no more pointer chase/dependency/fetch for every item
- tables are iterated over from cache
0x05 Know Your Cache
- keep related data close and compact
- align structures on cacheline boundaries
- pad structures to cacheline boundaries where beneficial
0x06 Unroll
- more linear pipeline execution with fewer branches taken
- sometimes Duff's devices are in order
- pay attention to memory access pattern
- process cachelines
- optionally prefetch the next cacheline
0x07 Know Your Threads
- synchronize access to critical data structures
- utilize event/message queues for IPC
- shared memory for localhost connections
- thread-local storage (TLS) for buffering per-thread data/buffers
0x08 Inline
- do NOT overdo it; consider instruction cache locality/small functions
0x09 Know Your System
0x0a Hack!
Comments
Post a Comment