_mi_heap_delayed_free_all can spin forever when a page is stuck in MI_DELAYED_FREEING (priority-inversion livelock on a core-pinned, priority-scheduled platform)

### Summary

`_mi_heap_delayed_free_all()` busy-spins (`while (!_mi_heap_delayed_free_partial(heap)) mi_atomic_yield();`) until *every* thread-delayed block can be freed. A block cannot be freed while its page is transiently in `MI_DELAYED_FREEING` (set by a foreign thread in `mi_free_block_delayed_mt`). On a platform with **strict priority-based, per-core preemptive scheduling and threads pinned to cores**, if the spinning thread is *higher* priority than the foreign thread that parked the page and they share a core, the foreign thread **never gets the core**, the flag is never cleared, and the spin never ends. `mi_atomic_yield()` cannot break this — it does not make the higher-priority spinner non-runnable.

This reproduces as an intermittent full freeze in a long-running test; the spinning thread is `mi_heap_collect` → `_mi_heap_delayed_free_all` reached from the periodic collect inside `mi_malloc_generic`.

### Environment

- **mimalloc:** 2.2.7 (`MI_MALLOC_VERSION 227`).
- **Build:** compiled as C++. On our AArch64 target, `mi_atomic_yield()` resolves to the aarch64 branch `__asm__ volatile("wfe")` (`include/mimalloc/atomic.h`). The point below holds for the C++ `std::this_thread::yield()` branch as well — see "Why yield can't fix it."
- **Platform:** an AArch64 game console with a strict priority-based preemptive scheduler. Relevant scheduler semantics:
  - Per core, the highest-priority *runnable* thread always runs; a lower-priority thread **cannot run on that core while a higher-priority thread is runnable there**.
  - `yield` hands the core only to another runnable thread of the **same** priority.
  - Round-robin time-slicing applies only within the **lowest** priority band.
  - A lower-priority thread may still run on a **different** core.
  - Threads are affinitized to specific cores.

### Observed stack (the spinning thread)

```
_mi_page_try_use_delayed_free   page.c:152    <- spinning (load/CAS retry; page is MI_DELAYED_FREEING)
_mi_free_delayed_block          free.c:199
_mi_heap_delayed_free_partial   page.c:331
_mi_heap_delayed_free_all       page.c:315    <- unbounded `while (!partial) mi_atomic_yield()`
mi_heap_collect_ex              heap.c:157    (collect == MI_NORMAL)
mi_heap_collect                 heap.c:187
mi_malloc_generic               page.c:1013   (periodic collect, every `generic_collect` allocs)
... application allocation (zlib deflateInit during a normal allocation) ...
```

`collect == MI_NORMAL`, i.e. this is the best-effort periodic collect, not teardown.

### Root cause

Page delayed-free state (`xthread_free` low bits, `types.h`):

```
MI_USE_DELAYED_FREE   = 0   // full page: first foreign free goes to the owning heap's thread_delayed_free
MI_DELAYED_FREEING    = 1   // transient: a foreign thread is mid-push onto the owning heap
MI_NO_DELAYED_FREE    = 2   // a delayed free is already queued; further foreign frees go page-local
MI_NEVER_DELAYED_FREE = 3   // sticky (abandoned / force-abandon)
```

**Producer** — foreign thread B frees a block in a full page owned by thread A. `mi_free_block_delayed_mt` (`free.c:218`):
1. CAS `MI_USE_DELAYED_FREE → MI_DELAYED_FREEING` (`free.c:229`)
2. push the block onto A's `heap->thread_delayed_free` (`free.c:245-248`)
3. CAS `MI_DELAYED_FREEING → MI_NO_DELAYED_FREE` (`free.c:252-257`)

Between (1) and (3) the page is `MI_DELAYED_FREEING`. B holds no lock here, so the window is short **only while B keeps getting CPU**.

**Consumer** — owner A drains its list: `_mi_heap_delayed_free_all` (`page.c:314`) → `_mi_heap_delayed_free_partial` (`page.c:321`) → `_mi_free_delayed_block` (`free.c:186`) → `_mi_page_try_use_delayed_free(page, MI_USE_DELAYED_FREE, false)` (`page.c:146`). If the page is `MI_DELAYED_FREEING`, that inner call yields up to 4 times and returns **false** (`page.c:155-156`); `_mi_free_delayed_block` returns false; `_mi_heap_delayed_free_partial` re-queues the block and returns false; **`_mi_heap_delayed_free_all` yields and loops again — with no bound** (`page.c:315`).

So the wait ends only when *every* referenced page has left `MI_DELAYED_FREEING`, i.e. when every producer that touched A's full pages has reached step (3). If any such producer is not making forward progress, A spins indefinitely.

### Why `mi_atomic_yield()` cannot fix it here

Under the scheduler above, a higher-priority *runnable* thread keeps the core; a lower-priority thread on that core does not run. `_mi_heap_delayed_free_all` keeps A **runnable** the whole time:

- aarch64 branch — `wfe`: a CPU wait, not an OS call. It never enters the scheduler, never requeues the thread, and (with no paired `LDXR`/`SEV`) just naps until the next interrupt/event and resumes the **same** thread. It cannot let B run.
- C++ branch — `std::this_thread::yield()`: hands off only to **equal-or-higher** priority threads; it will not run the lower-priority B, and never makes A non-runnable.

Either way, if B is lower priority and pinned to A's core, B never reaches step (3) and the loop never terminates. If B happens to be on a different core it completes in parallel and A recovers — which is exactly why this is **intermittent** (it depends on where B is scheduled). Consistent with this: pausing the spinning thread in the debugger and resuming lets it proceed (the debugger perturbs scheduling so B runs).

The inner bounded retry does not help: `_mi_page_try_use_delayed_free` gives up after 4 yields (`page.c:156`), but `_mi_heap_delayed_free_all` re-arms it unconditionally. Forcing `_mi_page_try_use_delayed_free` to return `false` makes `_mi_heap_delayed_free_all` loop forever deterministically — showing the outer loop has no escape.

Note `mi_malloc_generic` already avoids this on the hot path: `page.c:1007` uses the non-looping `_mi_heap_delayed_free_partial` with the comment *"free delayed frees from other threads (but skip contended ones)"*. It is the `mi_heap_collect_ex` path (`heap.c:157`) that still loops.

### Questions

1. **Is `_mi_heap_delayed_free_all`'s unbounded spin meant to be safe when a producer may be unscheduled for an unbounded time?** On a priority-scheduled, core-pinned platform, the `(1)→(3)` window in `mi_free_block_delayed_mt` is no longer bounded by "B is briefly busy," and the outer loop has no give-up.

2. **Would you accept making the non-teardown collect use the partial (non-blocking) form,** reserving the full drain for the paths that actually require `thread_delayed_free == NULL` (`MI_ABANDON` — assert `heap.c:164`; `mi_heap_absorb` — assert `heap.c:463`; `_mi_page_force_abandon` — `page.c:415`)? This mirrors what `mi_malloc_generic` already does. Proposed patch:

   ```c
   // heap.c, mi_heap_collect_ex(), replacing the unconditional _mi_heap_delayed_free_all(heap):
   if (collect == MI_ABANDON) {
     _mi_heap_delayed_free_all(heap);      // teardown: must fully drain (assert below)
   } else {
     _mi_heap_delayed_free_partial(heap);  // best-effort; contended blocks are retried next time
   }
   ```

3. **Should the remaining must-drain spin loops escalate `mi_atomic_yield()` to a real descheduling wait** (e.g. a short sleep) after a few spins, so a lower-priority producer on the same core can complete step (3)? On platforms where `yield` cannot deschedule a higher-priority waiter, a pure spin (whether `wfe`, `std::this_thread::yield`, `YieldProcessor`, or `pause`) cannot break this inversion.

4. **Is the aarch64 `mi_atomic_yield = __asm__ volatile("wfe")` intended as a scheduling yield?** As a bare `WFE` (no `LDXR`-armed monitor, no `SEV` from the releaser) it is a low-power CPU pause that never reaches the OS scheduler, so it provides no scheduling fairness at all.

### What we did locally

We shipped the patch in Q2 (partial drain on non-`MI_ABANDON` collect) and added a spin counter in `_mi_heap_delayed_free_all` that warns every ~2048 iterations to surface the remaining must-drain paths. Happy to open a PR for the collect change and/or a `yield`-escalation if you're open to either approach.

### Possibly related

- microsoft/mimalloc#204 — assert `from->thread_delayed_free == NULL` during heap deletion (the full-drain post-condition the looping form exists to satisfy; relevant to Q2).
- microsoft/mimalloc#1213 — heap thread-safety / arena usage for a console port (same platform class).
- microsoft/mimalloc#541, #1055 — other reports in the cross-thread delayed-free path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

_mi_heap_delayed_free_all can spin forever when a page is stuck in MI_DELAYED_FREEING (priority-inversion livelock on a core-pinned, priority-scheduled platform) #1317

Summary

Environment

Observed stack (the spinning thread)

Root cause

Why `mi_atomic_yield()` cannot fix it here

Questions

What we did locally

Possibly related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

_mi_heap_delayed_free_all can spin forever when a page is stuck in MI_DELAYED_FREEING (priority-inversion livelock on a core-pinned, priority-scheduled platform) #1317

Description

Summary

Environment

Observed stack (the spinning thread)

Root cause

Why mi_atomic_yield() cannot fix it here

Questions

What we did locally

Possibly related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why `mi_atomic_yield()` cannot fix it here