10 Commandments for Code Optimization

0x01 Know Your Data


  • organize data on few cachelines with related items close to each other
  • utilize today's massive cache prefetches
  • align your data properly - have values on boundaries of their size
  • use proper access sizes - bytes might be faster than shifting words

0x02 Synchronize


  • do NOT put more than one lock object per cacheline where feasible
  • sometimes it's still good to use lock-bitmaps
  • avoid atomic access in tight loops - poll volatiles in RAM instead of atomic operations

0x03 Know Your Memory


  • virtual memory works on pages
  • caches operate in lines and/or strides
  • aligned access is the way to go

0x04 Prefetch


  • utilize cache-coloring to hopefully reduce cacheline collisions - randomize cacheline addresses of accessed data
  • leverage massive cache prefetches
  • e.g. hash table chains may be implemented as flat tables
  • no more pointer chase/dependency/fetch for every item
  • tables are iterated over from cache

0x05 Know Your Cache


  • keep related data close and compact
  • align structures on cacheline boundaries
  • pad structures to cacheline boundaries where beneficial

0x06 Unroll


  • more linear pipeline execution with fewer branches taken
  • sometimes Duff's devices are in order
  • pay attention to memory access pattern
  • process cachelines
  • optionally prefetch the next cacheline

0x07 Know Your Threads


  • synchronize access to critical data structures
  • utilize event/message queues for IPC
  • shared memory for localhost connections
  • thread-local storage (TLS) for buffering per-thread data/buffers

0x08 Inline


  • do NOT overdo it; consider instruction cache locality/small functions

0x09 Know Your System



0x0a Hack!



Comments

popular