Add --chunk-concurrent-size for parallel row-copy by dnovitski · Pull Request #1688 · github/gh-ost

dnovitski · 2026-05-22T09:01:52Z

Summary

Port of #1398 by @shaohk to current master, with correctness improvements.

Adds --chunk-concurrent-size flag that allows multiple row-copy chunks to execute in parallel within each iteration using errgroup. Default is 1 (no behavior change).

Motivation

On large tables with fast storage (NVMe/SSD), the single-threaded row-copy loop can become a bottleneck. This flag enables parallel chunk copying to improve migration throughput.

Performance Results

1M rows, ADD COLUMN extra_col INT DEFAULT 0, Docker MySQL 8.0, chunk-size=1000:

Concurrency	Wall-clock	Speedup
1 (default)	30.6s	baseline
4	23.9s	22% faster
8	20.9s	32% faster

Benefits scale with table size and storage throughput.

Key Design Decisions

Two execution paths: concurrency=1 matches master's retry semantics exactly (range calc inside retry loop for hook-based chunk size reduction); concurrency>1 pre-calculates ranges under mutex for safe parallel execution
Thread-safe range calculation: CalculateNextIterationRangeEndValues(advanceCursor bool) protected by mutex, returns *IterationRangeValues struct with isolated Min/Max per goroutine
No shared mutable state in hot path: SQL warnings returned as function value (eliminates data race on shared MigrationLastInsertSQLWarnings field)
errgroup with real migration context: Proper cancellation propagation when any chunk fails
DB pool auto-sizing: Connection pool increased when --chunk-concurrent-size exceeds default pool size
Backward compatible: Default concurrency=1 preserves existing single-threaded behavior exactly

Changes from original #1398

Adapted to current master API (builder pattern, receiver names, retryBatchCopyWithHooks)
Fixed thread-safety: ApplyIterationInsertQuery returns SQL warnings instead of writing to shared field
Correct retry behavior: single-threaded path recalculates range on retry (matches master); concurrent path retries same range (INSERT IGNORE is idempotent)
Proper IncludeMinValues handling for first iteration
Uses real migration context (not context.Background())

Testing

All CI checks pass (build, lint, CodeQL, migration tests on MySQL 5.7/8.0/8.4/Percona)
Performance benchmarked (22-32% improvement with concurrency 4-8)
Data integrity verified (1M rows, checksum match)
TestRetryBatchCopyWithHooks passes (hook-based chunk size reduction works correctly)

Checklist

Tests pass
Documentation updated (doc/command-line-flags.md)
Backward compatible (default=1)

Based on work by @shaohk in #1398.

@shaohk

Port of PR github#1398 by @shaohk: allows multiple row-copy chunks to execute in parallel within each iteration using errgroup. Key changes: - Add IterationRangeValues struct for thread-safe range passing - Serialize range calculation with CalculateNextIterationRangeEndValuesLock - Rewrite iterateChunks to spawn N goroutines per queue item via errgroup - Return SQL warnings from ApplyIterationInsertQuery (eliminates race on shared MigrationLastInsertSQLWarnings field) - Increase DB connection pool when concurrency > default pool size - Add --chunk-concurrent-size CLI flag (default 1, no behavior change) Co-authored-by: shaohk <shaohk@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

shaohk · 2026-06-07T00:08:00Z

There's a problem with concurrent chunk copying: during the chunk operation, executing SELECT ... INSERT will hold the auto-increment lock on the target table, which puts a ceiling on the concurrency gains you can achieve.

dnovitski · 2026-06-08T09:10:09Z

Thanks @shaohk — you're right to flag this, and since most tables do have an AUTO_INCREMENT column it's worth being precise about when it bites.

The AUTO-INC ceiling depends on innodb_autoinc_lock_mode:

Mode 0/1: INSERT ... SELECT is classified as a "bulk insert", so InnoDB holds the table-level AUTO-INC lock for the whole statement — and that's decided by statement type, not by whether we supply values (we do copy the source PK values, but it doesn't exempt us). Concurrent chunks serialize, so the ceiling is real here. Mode 1 was the default through MySQL 5.7.
Mode 2 (interleaved): no table-level AUTO-INC lock — just a lightweight per-allocation mutex. This is the default on MySQL 8.0+, and it's safe for gh-ost specifically because gh-ost mandates binlog_format=ROW (statement-based replication is the only reason mode 2 is otherwise flagged "unsafe"). So on a typical 8.0 deployment the AUTO-INC ceiling you describe is largely gone by default.

So I'd frame it as: the hard serialization you're describing applies under mode 0/1; under mode 2 it becomes soft contention.

That said — even under mode 2, parallel copy doesn't scale linearly, and the dominant cap is actually in gh-ost's own loop rather than MySQL. The next-chunk boundary calculation (CalculateNextIterationRangeEndValues) runs under a global mutex and includes a round-trip + indexed scan of the source, because each chunk's start depends on the previous chunk's max cursor. So boundary computation is fully serialized and only the INSERT...SELECT runs in parallel — classic Amdahl. On top of that there's secondary-index/redo contention on the ghost table and gh-ost's replication-lag/load throttling, which usually paces the migration well before insert concurrency does.

I think the right things to do here are (1) document that --chunk-concurrent-size > 1 wants innodb_autoinc_lock_mode = 2, and ideally detect mode 0/1 on an auto-increment table at startup and warn that the speedup won't materialize; (2) we need to fix the gh-ost loop such that it's not capped internally

Does that match what you were seeing?

dnovitski requested review from meiji163, rashiq and timvaillancourt as code owners May 22, 2026 09:01

dnovitski force-pushed the pr-1398-rebased branch 4 times, most recently from 4e3c02d to 18f91e8 Compare May 22, 2026 11:15

dnovitski force-pushed the pr-1398-rebased branch from 18f91e8 to de32943 Compare May 22, 2026 11:23

dnovitski mentioned this pull request May 23, 2026

Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle #1665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --chunk-concurrent-size for parallel row-copy#1688

Add --chunk-concurrent-size for parallel row-copy#1688
dnovitski wants to merge 1 commit into
github:masterfrom
dnovitski:pr-1398-rebased

dnovitski commented May 22, 2026 •

edited

Loading

Uh oh!

shaohk commented Jun 7, 2026

Uh oh!

dnovitski commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dnovitski commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Performance Results

Key Design Decisions

Changes from original #1398

Testing

Checklist

Uh oh!

shaohk commented Jun 7, 2026

Uh oh!

dnovitski commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dnovitski commented May 22, 2026 •

edited

Loading

dnovitski commented Jun 8, 2026 •

edited

Loading