Understanding and Limiting "Queries inside InnoDB"

The "queries inside InnoDB" counter in SHOW ENGINE INNODB STATUS shows how many threads are currently executing inside the InnoDB storage engine, while "queries in queue" shows threads waiting for admission. InnoDB uses a concurrency throttle controlled by innodb_thread_concurrency (default 0 = unlimited) to prevent too many simultaneous threads from degrading throughput through excessive context switching and internal latch contention. When the concurrency limit is reached, new threads sleep for innodb_thread_sleep_delay microseconds before retrying admission, creating a queue. Understanding this throttle is essential for tuning MySQL on servers that handle hundreds of simultaneous connections, because an incorrect innodb_thread_concurrency setting can dramatically degrade throughput even when CPU and I/O resources appear underutilised.

What "Queries inside InnoDB" Measures

Thread Concurrency Ticket System in InnoDB

innodb_concurrency_tickets (default 5000) controls how many "tickets" a thread consumes per InnoDB re-entry before being required to check the concurrency gate again. Once a thread is admitted inside InnoDB, it can execute up to innodb_concurrency_tickets row operations before going back to the wait queue. This prevents a single long-running transaction from monopolising concurrency slots. Each row read or write consumes one ticket; a complex JOIN touching 50,000 rows will re-enter the concurrency gate approximately 10 times with the default ticket value. Reducing innodb_concurrency_tickets increases fairness among competing threads but adds admission overhead; increasing it reduces gate overhead but may cause long queries to starve short ones.

Queries Queued vs Queries Admitted

The SHOW ENGINE INNODB STATUS line "N queries inside InnoDB, M queries in queue" provides an instantaneous snapshot of InnoDB thread concurrency. A non-zero "queries in queue" value means InnoDB is actively throttling threads and admission wait is contributing to query latency. If the queue is consistently non-zero while CPU and disk utilisation are low, innodb_thread_concurrency is set too aggressively and should be relaxed. Conversely, if CPU is at 100% and the queue is empty, innodb_thread_concurrency is either unset (0) or set too high, and reducing it may improve throughput by limiting the number of threads competing for CPU time. Use the Performance Schema event innodb_thread_concurrency_timer (if enabled) to measure actual admission wait time per query.

Configuring innodb_thread_concurrency

Zero Value (Unlimited) vs Bounded Concurrency

Setting innodb_thread_concurrency=0 (the default in MySQL 5.7+) disables the concurrency throttle entirely, relying on the OS scheduler to manage thread contention. This works well on servers with fewer than 64 CPU threads and moderate connection counts (< 200 simultaneous active queries). Above that scale, setting innodb_thread_concurrency=2×(number of CPU cores) is a common starting point, based on the observation that InnoDB threads alternate between CPU-bound execution and I/O waits, allowing approximately 2× oversubscription before context switching overhead becomes detrimental. Benchmark your specific workload with innodb_thread_concurrency values of 0, 16, 32, and 64 using sysbench or a production replay tool to find the optimal setting for your CPU count and query mix.

innodb_thread_sleep_delay and Concurrency Tickets

innodb_thread_sleep_delay (default 10,000 microseconds = 10 ms) controls how long a thread sleeps between admission retry attempts when the concurrency gate is full. A 10 ms sleep adds significant latency to already-throttled queries; reducing it to 1,000–2,000 microseconds on low-latency NVMe-backed servers makes the throttle more responsive at the cost of slightly higher CPU usage from more frequent retry attempts. In MySQL 8.0.3+, innodb_adaptive_max_sleep_delay can limit the maximum value that adaptive admission logic can choose for the sleep delay, preventing excessive back-off under sustained overload. Set innodb_adaptive_max_sleep_delay=0 to disable adaptive sleep adjustment and rely entirely on the static innodb_thread_sleep_delay value.

Optimizing Query Concurrency at Scale

Connection Pooling to Reduce InnoDB Thread Pressure

Connection pooling at the application layer is the most effective way to reduce thread pressure inside InnoDB. Most web applications create one database connection per request, generating connection bursts that overwhelm InnoDB concurrency management at peak traffic. Replacing per-request connections with a pool of 20–100 long-lived connections (e.g., HikariCP for Java, pgBouncer-style for MySQL via ProxySQL) reduces the number of simultaneous threads competing inside InnoDB from hundreds to tens. This directly reduces semaphore contention, CPU scheduler pressure, and innodb_thread_concurrency queue depth. Monitor the "queries in queue" metric alongside application 99th-percentile latency to confirm that reducing the connection pool size correlates with improved latency under load.

ProxySQL and MaxScale for Query Rate Limiting

ProxySQL and MaxScale both provide middleware-level query rate limiting and connection multiplexing that transparently reduce InnoDB thread pressure without application changes. ProxySQL's max_connections_per_hostgroup parameter caps total connections to a backend MySQL server, automatically queuing new connections at the proxy layer rather than inside InnoDB. MaxScale's Throttle filter can rate-limit queries per user or schema, preventing a single analytics query burst from saturating InnoDB concurrency slots needed by OLTP queries. Both tools also support read-write splitting, routing SELECT queries to replicas and thereby halving the InnoDB thread load on the primary server without any application code changes.