Guide to InnoDB Semaphores and Long Semaphore Waits
InnoDB uses internal semaphores — mutual-exclusion mutexes and reader-writer latches — to protect shared data structures such as buffer pool pages, the adaptive hash index, and the redo log buffer. A semaphore wait occurs when a thread cannot immediately acquire one of these internal locks and must spin or yield to the OS scheduler. The SEMAPHORES section of SHOW ENGINE INNODB STATUS reports both OS waits (kernel-level context switches) and spin rounds (busy-wait CPU cycles), providing a window into InnoDB's internal concurrency health. High semaphore wait counts indicate internal hot spots that degrade throughput far more than row-level lock waits, because every InnoDB thread competes for them simultaneously. This guide explains how to read semaphore output, diagnose the underlying causes, and tune spin parameters for your hardware.
Anatomy of the SEMAPHORES Section
OS Wait Array: Reservation Count vs Signal Count
The OS wait array is a fixed-size pool of OS-level synchronization objects (event slots) that InnoDB threads reserve when they give up spinning and yield to the operating system. Each slot records whether it is in use (reservation_count) and how many signal events have been delivered. The output line "OS WAIT ARRAY INFO: reservation count N, signal count M" shows the cumulative number of times threads have taken the expensive OS-wait path. A healthy server with sufficient CPU capacity shows very few OS waits relative to spin rounds, because most locks are held for microseconds and the spinning thread acquires the latch before the OS scheduling delay is incurred. Elevated reservation counts indicate either very high concurrency or a genuine internal hot spot requiring architectural changes.
Spin Rounds and RW-Shared vs RW-Exclusive Lock Stats
The RW-shared spins, OS waits, and RW-excl spins, OS waits statistics measure the cost of acquiring reader-writer latches specifically. A shared (S) latch permits concurrent readers, while an exclusive (X) latch requires all readers and writers to drain. High rw_excl_rounds values indicate a thread is spinning heavily before acquiring exclusive access, which typically points to a page in the buffer pool being modified by many concurrent writers — a classic "hot page" scenario. Compare the ratio of spin rounds to OS waits: a ratio above 100:1 means the spin overhead is high but context switches are avoided; a ratio below 10:1 means threads are yielding frequently, consuming kernel scheduler time and adding latency.
Diagnosing Long Semaphore Waits
Threshold for "Semaphore Wait" Alerts in MySQL Error Log
InnoDB prints a warning to the error log and records the semaphore waiter detail in SHOW ENGINE INNODB STATUS when a thread has waited more than 240 seconds for a semaphore. This threshold is hard-coded and signals a severe internal stall — the affected thread and all threads waiting for locks it holds are completely frozen. The most common cause is a single buffer pool page becoming a hot spot under extreme write concurrency, such as a secondary index root page on a high-INSERT table. When this threshold is breached repeatedly without recovery, InnoDB will eventually trigger an assertion or the watchdog will crash the server to prevent data corruption. Monitor the error log for "InnoDB: Semaphore wait has lasted > 240 seconds" as a critical alert signal.
Thread Starvation and CPU-Bound Workloads
Thread starvation behind semaphores typically manifests on CPU-bound servers where the spinning threads consume so many CPU cycles that the latch holder cannot get scheduled to release the latch. This creates a self-reinforcing feedback loop: more spinning → less CPU for the holder → longer wait → more spinning. NUMA architectures exacerbate this because spinning threads on a remote NUMA node cause cross-node memory traffic for each cache-line check of the latch state. Solutions include: reducing innodb_thread_concurrency to match the true number of physical CPU cores, enabling thread_pool_plugin (Percona/MariaDB) to limit total OS threads, and partitioning the buffer pool with innodb_buffer_pool_instances to distribute latch pressure across multiple independent sub-pools.
Reducing Semaphore Contention
Tuning innodb_spin_wait_delay and innodb_sync_spin_loops
innodb_spin_wait_delay controls the random delay range in microseconds injected between each spin loop iteration, reducing cache-coherency traffic when many threads compete for the same latch. The default of 6 is conservative; on NVMe-backed servers with fast latch holders, reducing it to 2–4 lowers spin latency. innodb_sync_spin_loops (default 30) sets the maximum number of spin iterations before yielding to the OS — higher values reduce context switches but burn more CPU. Tune these together by monitoring the spin-rounds-to-OS-wait ratio: target a ratio of 50:1 or higher. In MySQL 8.0.16+, innodb_spin_wait_pause_multiplier replaces innodb_spin_wait_delay and provides finer-grained pause instruction tuning for modern x86 PAUSE latency changes across microarchitectures.
Adaptive Hash Index Contention on High-Concurrency Servers
The Adaptive Hash Index (AHI) is a frequent but overlooked source of Long Semaphore Wait entries. When many threads concurrently read or insert rows, InnoDB serialises AHI updates behind a single latch, which can become a bottleneck on multi-core servers. If you observe Long Semaphore Wait lines referencing btr0sea.cc in the SEMAPHORES section, the AHI latch is the likely culprit. Test disabling it dynamically with SET GLOBAL innodb_adaptive_hash_index = OFF and monitor throughput: on highly concurrent workloads the latch contention cost often outweighs the hash-lookup benefit, resulting in a net performance gain.
Buffer Pool Instances and Reducing Internal Contention
Increasing innodb_buffer_pool_instances from 1 to 8 or 16 (when buffer pool size ≥ 1 GB) partitions the buffer pool into independent segments, each with its own mutex and LRU list. This single change can cut buffer pool latch contention by 8–16× on read-heavy workloads with high page request rates. Each instance must be at least 1 GB; MySQL automatically adjusts instance count if the total buffer pool size is too small. The BUFFER POOL AND MEMORY section of SHOW ENGINE INNODB STATUS reports per-instance statistics when multiple instances are configured. Pair buffer pool partitioning with innodb_page_cleaners set to match the instance count, ensuring that dirty page flushing is also parallelised across all buffer pool instances for write-intensive workloads.