Add opt-in MI_SINGLE_THREADED specialization for the free fast path#1322
Open
Alan-S-Andrade wants to merge 1 commit into
Open
Add opt-in MI_SINGLE_THREADED specialization for the free fast path#1322Alan-S-Andrade wants to merge 1 commit into
Alan-S-Andrade wants to merge 1 commit into
Conversation
Contributor
|
@Alan-S-Andrade please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
In a program that only ever uses mimalloc from a single thread, every freed block is guaranteed to be thread-local, so the per-free thread ownership check in mi_free_ex (a thread-id TLS read, a relaxed atomic load of segment->thread_id, and a compare) is redundant. This adds an opt-in, off-by-default compile-time switch (-DMI_SINGLE_THREADED=1) that forces is_local=true and skips that check. The default build is unchanged (the #else path is identical to before and the full multi-threaded test suite still passes). Measured on a single-threaded, allocation-heavy workload (mimalloc-bench alloc-test, 1 thread, pinned, interleaved median-of-11): ~0.4% faster wall time, with perf showing ~2.5% fewer branches and ~1.1% fewer instructions (the eliminated per-free ownership branch). cfrac/espresso output is byte-identical to baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1cfacd3 to
2511176
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In a program that only ever uses mimalloc from a single thread, every freed block is guaranteed to be thread-local, so the per-free thread-ownership check in
mi_free_ex— a thread-id TLS read, a relaxed atomic load ofsegment->thread_id, and a compare — is redundant.This PR adds an opt-in, off-by-default compile-time switch
-DMI_SINGLE_THREADED=1that forcesis_local = trueand skips that check on the free fast path. It is motivated by recent literature on specializing general-purpose allocators for single-threaded use (e.g. ExGen-Malloc, "Old is Gold", IEEE CAL'25).Safety / default build
The default build is unchanged — the
#elsepath is identical to the previous code. The flag is a single-thread-only promise by the embedder (multi-threaded use is out of contract, exactly like other single-thread allocator builds). With the flag off, the full test suite (including the multi-threadedtest-stressandtest-stress-dynamic) passes.Measurements
mimalloc-bench, single-threaded, pinned to one core, interleaved median-of-11 (baseline = current
dev):perf staton alloc-test confirms the mechanism: ~2.5% fewer branches (~50M, the eliminated per-free ownership branch) and ~1.1% fewer instructions.cfrac/espressostdout is byte-identical to baseline.The gain is modest but consistent on allocation-heavy single-threaded workloads, with zero cost or risk to the default multi-threaded build.