Skip to content

Isolate plugins in an out-of-process COM host#40120

Closed
benhillis wants to merge 3 commits into
masterfrom
user/benhill/plugin_host
Closed

Isolate plugins in an out-of-process COM host#40120
benhillis wants to merge 3 commits into
masterfrom
user/benhill/plugin_host

Conversation

@benhillis

@benhillis benhillis commented Apr 6, 2026

Copy link
Copy Markdown
Member

Summary

WSL plugin DLLs are moved out of wslservice.exe into a separate wslpluginhost.exe COM server so plugin code can no longer crash or destabilize the service. Each plugin is activated in its own host process (CLSCTX_LOCAL_SERVER, SYSTEM-only via AppID) and reached through a versioned COM interface defined in WslPluginHost.idl. All hosts are tied to a service-owned job object and terminate when wslservice exits.

The plugin API is unchanged and existing plugins run unmodified. Proxy/stub is consolidated into wslserviceproxystub.dll — one new exe, no new DLLs.

Detailed Description

Host-crash classification

A crashing or disconnected host is classified by IsHostCrash and surfaced as "host died, log and continue" rather than a fatal plugin error:

RPC_E_DISCONNECTED, RPC_E_SERVER_DIED, RPC_E_SERVER_DIED_DNE, CO_E_OBJNOTCONNECTED, RPC_S_SERVER_UNAVAILABLE, RPC_S_CALL_FAILED, RPC_S_CALL_FAILED_DNE.

RPC_E_CALL_REJECTED is intentionally not classified as a host crash: it is a transient COM busy state (an STA message filter rejecting a call), not a "server process died" signal. The plugin host is MTA with no message filter, so it should not occur here; treating it as a crash would silently skip future legitimate calls.

Callback re-entrancy

Plugin→service callbacks (MountFolder, ExecuteBinary, and the WSLC session APIs) arrive on a different COM thread than the outbound hook, so they cannot re-enter the lock held during the hook:

  • VM pathLxssUserSessionImpl guards callbacks with a shared_mutex: shared for callbacks, exclusive in _VmTerminate after OnVmStopping drains in-flight callbacks before the utility VM is destroyed.
  • WSLC pathPluginManager resolves sessions through its own reference map under a dedicated lock, and WSLCSessionManager releases its session lock before any plugin notification fires, so callbacks never re-enter the session lock.

New tests

  • HostCrashIsolation — kills wslpluginhost.exe mid-OnVmStarted and verifies the service survives and m_initOnce stays sticky.
  • ConcurrentCallbacks — four plugin threads behind a start-gate hammer MountFolder + ExecuteBinary, exercising the shared callback lock.
  • AsyncApiCallFromWorker — a plugin worker thread calls into the service post-hook (cross-apartment, non-COM-initialized thread).
  • CallbacksDuringTerminationDoNotCrash — detached workers race _VmTerminate's exclusive lock acquire / VM teardown, exiting deterministically via an OnVmStopping-set stop signal.

Existing WSL1 plugin tests were broadened alongside the refactor.

Validation

  • Full Windows build (x64 debug) clean.
  • bin\x64\debug\test.bat -f /name:*Plugin* — all plugin tests pass.
  • FormatSource.ps1 clean on changed files.

Copilot AI review requested due to automatic review settings April 6, 2026 16:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves WSL plugins from in-process LoadLibrary inside wslservice.exe to isolated out-of-process wslpluginhost.exe COM local servers, aiming to keep the service alive even if a plugin crashes.

Changes:

  • Introduces wslpluginhost.exe (COM local server) that loads one plugin DLL and forwards lifecycle events, while proxying plugin API callbacks back to the service.
  • Adds new COM contracts (IWslPluginHost, IWslPluginHostCallback) and consolidates proxy/stub generation into wslserviceproxystub.dll.
  • Updates service-side plugin management and adds a new shared_mutex path intended to avoid re-entrancy/deadlock on COM RPC callback threads.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/windows/wslpluginhost/exe/resource.h Adds resource IDs for new host executable.
src/windows/wslpluginhost/exe/PluginHost.h Declares COM class implementing IWslPluginHost plus API callback stubs.
src/windows/wslpluginhost/exe/PluginHost.cpp Implements plugin DLL loading, lifecycle dispatch, and callback forwarding to service.
src/windows/wslpluginhost/exe/main.rc Adds version/icon resources for wslpluginhost.exe.
src/windows/wslpluginhost/exe/main.cpp Implements COM local server entrypoint and class factory registration.
src/windows/wslpluginhost/exe/CMakeLists.txt Adds build target for wslpluginhost.exe.
src/windows/wslpluginhost/CMakeLists.txt Wires new subdirectory into build.
src/windows/service/stub/CMakeLists.txt Adds MIDL proxy/stub sources for WslPluginHost.idl into wslserviceproxystub.
src/windows/service/inc/WslPluginHost.idl Defines out-of-proc COM interfaces for plugin hosting + callbacks.
src/windows/service/inc/CMakeLists.txt Adds new wslpluginhostidl MIDL generation target.
src/windows/service/exe/PluginManager.h Refactors plugin manager to track out-of-proc hosts and adds callback implementation type.
src/windows/service/exe/PluginManager.cpp Implements COM activation of hosts, job object assignment, and service-side callback handlers.
src/windows/service/exe/LxssUserSession.h Adds shared_mutex and makes plugin-callback methods private/friend-only.
src/windows/service/exe/LxssUserSession.cpp Switches plugin callback locking from m_instanceLock to m_callbackLock and gates VM teardown.
src/windows/service/exe/CMakeLists.txt Adds dependency on wslpluginhostidl.
src/windows/common/precomp.h Adds <shared_mutex> include for new locking.
msipackage/package.wix.in Installs wslpluginhost.exe and registers COM AppID/CLSID/interfaces for activation and proxy/stub.
msipackage/CMakeLists.txt Adds wslpluginhost.exe to packaged binaries and build dependencies.
CMakeLists.txt Adds subdirectory for wslpluginhost and adjusts global include directories.
Comments suppressed due to low confidence (2)

src/windows/service/exe/LxssUserSession.cpp:3644

  • CreateLinuxProcess now only takes m_callbackLock, but it calls _RunningInstance(), which is annotated Requires_lock_held(m_instanceLock) and reads m_runningInstances. This is both a locking-contract violation and can race with writers that still use m_instanceLock only. Refactor so callback code can safely read the running-instance map (e.g., provide a callback-safe lookup guarded by m_callbackLock, and ensure all writes to m_runningInstances/m_utilityVm also take m_callbackLock exclusively after m_instanceLock per the stated lock ordering).
    // Shared lock prevents _VmTerminate from destroying the VM or instances
    // while we use them. See MountRootNamespaceFolder for rationale.
    std::shared_lock lock(m_callbackLock);
    RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);

    if (Distro == nullptr)
    {
        *Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
    }
    else
    {
        const auto distro = _RunningInstance(Distro);
        THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);

        const auto wsl2Distro = dynamic_cast<WslCoreInstance*>(distro.get());

src/windows/service/exe/LxssUserSession.cpp:2614

  • m_runningInstances is updated here without taking m_callbackLock, but plugin callbacks now read m_runningInstances under m_callbackLock (and intentionally do not take m_instanceLock). Because these are different locks, this doesn’t provide synchronization and can lead to data races/UB when a callback runs concurrently with instance creation/termination. Writers that mutate m_runningInstances (and m_utilityVm if accessed by callbacks) need to also take m_callbackLock (exclusive) in the documented order m_instanceLock → m_callbackLock.
                // This needs to be done before plugins are notified because they might try to run a command inside the distribution.
                m_runningInstances[registration.Id()] = instance;

                if (version == LXSS_WSL_VERSION_2)
                {
                    auto cleanupOnFailure =
                        wil::scope_exit_log(WI_DIAGNOSTICS_INFO, [&]() { m_runningInstances.erase(registration.Id()); });
                    m_pluginManager.OnDistributionStarted(&m_session, instance->DistributionInformation());

Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/wslpluginhost/exe/PluginHost.h Outdated
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/CMakeLists.txt
Comment thread CMakeLists.txt Outdated
Comment thread src/windows/wslpluginhost/exe/main.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Copilot AI review requested due to automatic review settings April 6, 2026 17:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/windows/service/exe/LxssUserSession.cpp:3643

  • CreateLinuxProcess now only takes m_callbackLock, but it reads m_runningInstances via _RunningInstance(), which is annotated Requires_lock_held(m_instanceLock) and the map itself is Guarded_by(m_instanceLock). This is a real race/contract violation (and can also break static analysis). You’ll need a consistent locking strategy for callback threads (e.g., make all accesses/mutations of m_runningInstances + m_utilityVm also take m_callbackLock in the documented order, or refactor callbacks to avoid touching m_runningInstances without m_instanceLock).
    // Shared lock prevents _VmTerminate from destroying the VM or instances
    // while we use them. See MountRootNamespaceFolder for rationale.
    std::shared_lock lock(m_callbackLock);
    RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);

    if (Distro == nullptr)
    {
        *Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
    }
    else
    {
        const auto distro = _RunningInstance(Distro);
        THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);

Comment thread src/windows/service/exe/LxssUserSession.h
Comment thread src/windows/wslpluginhost/exe/main.cpp
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 3418f7e to 3f03d4f Compare April 6, 2026 17:48
Copilot AI review requested due to automatic review settings April 6, 2026 18:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

src/windows/service/exe/LxssUserSession.cpp:3653

  • CreateLinuxProcess now only takes m_callbackLock but calls _RunningInstance(Distro) (which is annotated _Requires_lock_held_(m_instanceLock) and touches m_lockedDistributions). This violates the locking contract and can race with instance state changes or deadlock if someone later adds the required lock. Consider adding a callback-safe lookup that is guarded by m_callbackLock only (and does not call _EnsureNotLocked), or refactoring _RunningInstance / the guarded annotations so callback code never needs m_instanceLock.
    if (Distro == nullptr)
    {
        *Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
    }
    else
    {
        const auto distro = _RunningInstance(Distro);
        THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);

        const auto wsl2Distro = dynamic_cast<WslCoreInstance*>(distro.get());
        THROW_HR_IF(WSL_E_WSL2_NEEDED, !wsl2Distro);

Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp
Comment thread src/windows/wslpluginhost/exe/main.cpp
@benhillis

Copy link
Copy Markdown
Member Author

Hey @benhillis 👋 — Following up on this PR. It currently has merge conflicts that need to be resolved, and the CI build didn't run (shows action_required). There are also 20 unresolved review threads, including some security-relevant findings (TOCTOU on DLL signature validation, PROCESS_ALL_ACCESS over-privilege, potential UAF on g_pluginHost). Is this change still actively being worked on? If so, could you rebase to resolve the conflicts and address the outstanding review feedback?

Copilot AI review requested due to automatic review settings April 7, 2026 04:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

Comment thread src/windows/wslpluginhost/exe/main.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/CMakeLists.txt
Comment thread src/windows/service/exe/LxssUserSession.cpp
Comment thread src/windows/service/exe/LxssUserSession.cpp
Comment thread src/windows/service/exe/LxssUserSession.h
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from f32e629 to 5641289 Compare April 7, 2026 04:48
Copilot AI review requested due to automatic review settings April 7, 2026 05:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.

Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/LxssUserSession.cpp
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/main.cpp Outdated
@benhillis

Copy link
Copy Markdown
Member Author

CI is all green now and conflicts are resolved. Would love to get some eyes on this when someone has a chance — the main design change is moving plugin loading out of wslservice into separate COM hosts so a bad plugin can't take down the service.

Copilot AI review requested due to automatic review settings April 10, 2026 04:07
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 3492d6b to 57501d9 Compare April 10, 2026 04:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 8 comments.

Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/LxssUserSession.h
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Comment thread test/windows/PluginTests.cpp Outdated
Comment thread test/windows/PluginTests.cpp
Comment thread src/windows/service/exe/LxssUserSession.cpp
@benhillis benhillis marked this pull request as ready for review April 10, 2026 15:29
@benhillis benhillis requested a review from a team as a code owner April 10, 2026 15:29
@benhillis benhillis force-pushed the user/benhill/plugin_host branch 2 times, most recently from ed98200 to c66d3d7 Compare April 13, 2026 16:15
Copilot AI review requested due to automatic review settings April 13, 2026 16:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 8 comments.

Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
Comment thread test/windows/PluginTests.cpp
Comment thread test/windows/PluginTests.cpp

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/service/exe/PluginManager.cpp
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
Comment thread src/windows/service/exe/PluginManager.cpp Outdated
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from c13fe64 to e59fe72 Compare June 3, 2026 19:21
Copilot AI review requested due to automatic review settings June 5, 2026 15:32
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from e59fe72 to 40a7e2d Compare June 5, 2026 15:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.

Comment thread src/windows/service/exe/WSLCSessionManager.cpp
Comment thread src/windows/service/exe/WSLCSessionManager.h
Comment thread src/windows/wslpluginhost/exe/PluginHost.cpp Outdated
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 40a7e2d to 8861ece Compare June 5, 2026 15:44

@OneBlue OneBlue left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I think the COM interface & sub-process creation logic look good.

I think it would be worth exploring trying to run plugin callbacks in separate threads and do something like this:

class ThreadedPluginAPI
{
    struct PluginCall
    {
        std::function<HRESULT()> Call;
        std::future<HRESULT> Result;
    }

    HRESULT CreateProcess(...)
    {
         // Add plugin call to the queue
         auto future = create_future();
         auto it = m_calls.emplace_back(PluginCall([&](){ g_pluginManager.Createprocess(...), future);
    
         // Wait for the result and return it to the caller
         return it->second->get();
    }

     std::vector<PluginCall> m_calls;
}

Then, when we call a plugin callback, we could do something like:


void PluginManager::OnSessionCreated(...)
{
    // Create a new thread to call the plugin callbacks
    ThreadedPluginAPI api;

    std::thread callbackThread([&]() { plugin->OnSessionCreated(&api, ...);});
  
     // Process plugin API calls in the current thread
    while (...)
    {
        WaitForSingleObject(api.CallAvailable(),  callbackThread);
        
        if (callAvailable) // Run plugin call
        {
               api.RunPluginCall();
        }
        else
        { 
              break; // Callback thread has exited,
        }
    }
}

This approach would allow us to merge this without changes to LxssUserSession / WSLCSessionManager, because this would allow us to keep using recursive locks.

I'm proposing this because this change is quite big, and brings a lot of synchronization changes, which I'm a bit worried about, especially long term given that we're adding new mutexes, and code paths where resources are protected by mutex A or B

Copilot AI review requested due to automatic review settings June 8, 2026 17:42
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 8861ece to 542de0f Compare June 8, 2026 17:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

Comment on lines +469 to +474
std::vector<BYTE> PluginManager::SerializeSid(PSID Sid)
{
const DWORD sidLength = GetLengthSid(Sid);
std::vector<BYTE> buffer(sidLength);
CopySid(sidLength, buffer.data(), Sid);
return buffer;
Comment on lines +156 to +158
// Build a WSLSessionInformation struct from the marshaled parameters.
// The returned struct and its SID allocation are valid for the lifetime of the wil::unique_handle.
struct SessionContext
Comment on lines 805 to 810
/// <summary>
/// Contains the currently running utility VM's.
/// Contains the currently running instances.
/// Reads guarded by m_instanceLock OR m_callbackLock (shared).
/// Mutations require BOTH m_instanceLock AND m_callbackLock (exclusive).
/// </summary>
_Guarded_by_(m_instanceLock) std::map<GUID, std::shared_ptr<LxssRunningInstance>, wsl::windows::common::helpers::GuidLess> m_runningInstances;
Comment on lines 823 to 828
/// <summary>
/// The running utility vm for WSL2 distributions.
///
/// Reads guarded by m_instanceLock OR m_callbackLock (shared).
/// Mutations require BOTH m_instanceLock AND m_callbackLock (exclusive).
/// </summary>
_Guarded_by_(m_instanceLock) std::unique_ptr<WslCoreVm> m_utilityVm;
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 542de0f to 2cae48e Compare June 8, 2026 20:50
benhillis pushed a commit that referenced this pull request Jun 11, 2026
…rocess plugin host

Alternative to the m_callbackLock design in PR #40120, per @OneBlue's review.

Instead of guarding plugin callbacks with a new shared_mutex (m_callbackLock) and
a session-ref registry, this runs each outbound notification (host->On...) on a
worker thread and pumps the plugin's service-side API calls back onto the original
notifying thread, which already holds the session's recursive m_instanceLock. This
reproduces in-process re-entrancy and lets us delete the synchronization the PR added.

- New PluginCallPump primitive (Run/Invoke): the worker makes the COM notification
  call; the notifying thread pumps queued callback work and runs it under m_instanceLock.
- PluginManager: WSL notifications (OnVm*/OnDistribution*) route via RunHostNotification;
  WSL callbacks (MountFolder/ExecuteBinary[InDistribution]) route via a hybrid
  InvokeOnWslPump - pump when a hook is in flight, direct (own m_instanceLock) when not,
  preserving async out-of-hook callbacks.
- Revert m_callbackLock in LxssUserSessionImpl; MountRootNamespaceFolder/CreateLinuxProcess
  go back to the recursive m_instanceLock. Add LxssUserSessionImpl::TryInvokeUnderInstanceLock
  (timed try_lock_for on m_instanceLock) used by the direct path.
- Deadlock fix: a callback racing the pre-notification window (lock held, pump not yet
  registered) must not block on m_instanceLock - InvokeOnWslPump uses a timed acquire and
  re-checks for a pump, so the racing callback is routed onto the pump instead of dead-blocking.
- Hardening (from multi-model review): PluginCallPump::Invoke reports ran-vs-stopped
  out-of-band (no RPC_E_DISCONNECTED sentinel overload, avoids double-execution); the pump
  loop polls workerDone each iteration so callback traffic cannot starve the stop path;
  Run is exception-safe so std::thread creation failure never escapes into teardown hooks.
- WSLC session-ref registry left as-is (documented follow-up): its lock is non-recursive,
  notifications already fire outside it, and it is entangled with the create-veto protocol.
- Update test/plugin comments to describe the pump model (assertions unchanged and still pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
benhillis pushed a commit that referenced this pull request Jun 11, 2026
…rocess plugin host

Alternative to the m_callbackLock design in PR #40120, per @OneBlue's review.

Instead of guarding plugin callbacks with a new shared_mutex (m_callbackLock) and
a session-ref registry, this runs each outbound notification (host->On...) on a
worker thread and pumps the plugin's service-side API calls back onto the original
notifying thread, which already holds the session's recursive m_instanceLock. This
reproduces in-process re-entrancy and lets us delete the synchronization the PR added.

- New PluginCallPump primitive (Run/Invoke): the worker makes the COM notification
  call; the notifying thread pumps queued callback work and runs it under m_instanceLock.
- PluginManager: WSL notifications (OnVm*/OnDistribution*) route via RunHostNotification;
  WSL callbacks (MountFolder/ExecuteBinary[InDistribution]) route via a hybrid
  InvokeOnWslPump - pump when a hook is in flight, direct (own m_instanceLock) when not,
  preserving async out-of-hook callbacks.
- Revert m_callbackLock in LxssUserSessionImpl; MountRootNamespaceFolder/CreateLinuxProcess
  go back to the recursive m_instanceLock. Add LxssUserSessionImpl::TryInvokeUnderInstanceLock
  (timed try_lock_for on m_instanceLock) used by the direct path.
- Deadlock fix: a callback racing the pre-notification window (lock held, pump not yet
  registered) must not block on m_instanceLock - InvokeOnWslPump uses a timed acquire and
  re-checks for a pump, so the racing callback is routed onto the pump instead of dead-blocking.
- Hardening (from multi-model review): PluginCallPump::Invoke reports ran-vs-stopped
  out-of-band (no RPC_E_DISCONNECTED sentinel overload, avoids double-execution); the pump
  loop polls workerDone each iteration so callback traffic cannot starve the stop path;
  Run is exception-safe so std::thread creation failure never escapes into teardown hooks.
- WSLC session-ref registry left as-is (documented follow-up): its lock is non-recursive,
  notifications already fire outside it, and it is entangled with the create-veto protocol.
- Update test/plugin comments to describe the pump model (assertions unchanged and still pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WSL plugin DLLs are moved out of wslservice.exe into a separate
wslpluginhost.exe COM server so plugin code can no longer crash or
destabilize the service. Each plugin is activated in its own host
process (CLSCTX_LOCAL_SERVER, SYSTEM-only via AppID) and reached
through a versioned COM interface defined in WslPluginHost.idl. All
hosts are tied to a service-owned job object and terminate when
wslservice exits. The plugin API is unchanged; existing plugins run
unmodified.

A crashing or disconnected host is classified by IsHostCrash
(RPC_E_DISCONNECTED, RPC_E_SERVER_DIED[_DNE], CO_E_OBJNOTCONNECTED,
RPC_S_SERVER_UNAVAILABLE, RPC_S_CALL_FAILED[_DNE]); the service logs
it and continues instead of treating it as a fatal plugin error.
RPC_E_CALL_REJECTED is intentionally excluded as a transient busy
state rather than a dead host.

Plugin->service callbacks (MountFolder, ExecuteBinary, and the WSLC
session APIs) arrive on a different COM thread than the outbound hook,
so they cannot re-enter the lock held during the hook:
- VM path: LxssUserSessionImpl guards callbacks with a shared_mutex
  (shared for callbacks, exclusive in _VmTerminate after OnVmStopping
  drains in-flight callbacks before the utility VM is destroyed).
- WSLC path: PluginManager resolves sessions through its own
  reference map under a dedicated lock, and WSLCSessionManager
  releases its session lock before any plugin notification fires, so
  callbacks never re-enter the session lock. A session is registered
  in the reference map but not published until OnWslcSessionCreated
  succeeds, so a vetoed or race-lost session is never handed out.

Proxy/stub is consolidated into wslserviceproxystub.dll. One new exe,
no new DLLs.

Tests
- HostCrashIsolation: kills wslpluginhost.exe mid-OnVmStarted and
  verifies the service survives and m_initOnce stays sticky.
- ConcurrentCallbacks: four plugin threads hammer MountFolder and
  ExecuteBinary, exercising the shared callback lock.
- AsyncApiCallFromWorker: a plugin worker thread calls into the
  service post-hook (cross-apartment, non-COM-initialized).
- CallbacksDuringTerminationDoNotCrash: worker threads race
  _VmTerminate's exclusive lock and VM teardown, then wind down
  deterministically after OnVmStopping signals them and are joined on
  the next session start.
- Existing WSL1 plugin tests broadened alongside the refactor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 06:13
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from 2cae48e to f8e4df2 Compare June 11, 2026 06:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comment on lines +62 to +65
HRESULT Result() const
{
return (initHr == RPC_E_CHANGED_MODE) ? S_OK : initHr;
}
benhillis pushed a commit that referenced this pull request Jun 11, 2026
…rocess plugin host

Alternative to the m_callbackLock design in PR #40120, per @OneBlue's review.

Instead of guarding plugin callbacks with a new shared_mutex (m_callbackLock) and
a session-ref registry, this runs each outbound notification (host->On...) on a
worker thread and pumps the plugin's service-side API calls back onto the original
notifying thread, which already holds the session's recursive m_instanceLock. This
reproduces in-process re-entrancy and lets us delete the synchronization the PR added.

- New PluginCallPump primitive (Run/Invoke): the worker makes the COM notification
  call; the notifying thread pumps queued callback work and runs it under m_instanceLock.
- PluginManager: WSL notifications (OnVm*/OnDistribution*) route via RunHostNotification;
  WSL callbacks (MountFolder/ExecuteBinary[InDistribution]) route via a hybrid
  InvokeOnWslPump - pump when a hook is in flight, direct (own m_instanceLock) when not,
  preserving async out-of-hook callbacks.
- Revert m_callbackLock in LxssUserSessionImpl; MountRootNamespaceFolder/CreateLinuxProcess
  go back to the recursive m_instanceLock. Add LxssUserSessionImpl::TryInvokeUnderInstanceLock
  (timed try_lock_for on m_instanceLock) used by the direct path.
- Deadlock fix: a callback racing the pre-notification window (lock held, pump not yet
  registered) must not block on m_instanceLock - InvokeOnWslPump uses a timed acquire and
  re-checks for a pump, so the racing callback is routed onto the pump instead of dead-blocking.
- Hardening (from multi-model review): PluginCallPump::Invoke reports ran-vs-stopped
  out-of-band (no RPC_E_DISCONNECTED sentinel overload, avoids double-execution); the pump
  loop polls workerDone each iteration so callback traffic cannot starve the stop path;
  Run is exception-safe so std::thread creation failure never escapes into teardown hooks.
- WSLC session-ref registry left as-is (documented follow-up): its lock is non-recursive,
  notifications already fire outside it, and it is entangled with the create-veto protocol.
- Update test/plugin comments to describe the pump model (assertions unchanged and still pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis benhillis force-pushed the user/benhill/plugin_host branch from f8e4df2 to 13b4be8 Compare June 11, 2026 06:44
- Pass the kill-on-close job object into IWslPluginHost::Initialize and drop
  GetProcessHandle so the host assigns itself to the job before running any
  plugin code.
- Return an IWSLCProcess reference from WslcCreateProcess instead of opaque
  process cookies; remove the cookie bookkeeping methods.
- Make plugin host crashes fatal: a crash during a start/veto hook (or at
  load) blocks the WSL operation with a recorded, plugin-named error, matching
  pre-refactor behavior. Latch the first crash so subsequent operations fail
  fast with one consistent error instead of repeatedly driving a dead host
  (m_pluginError guarded by m_pluginErrorLock). Teardown hooks latch but stay
  non-fatal. Re-activation/non-fatal resilience is a follow-up.
- Drop the sidData empty-guards so WSL and WSLC hooks pass the serialized SID
  consistently.
- Rework the HostCrash test to assert the new fatal behavior.
- Fix a use-after-free in the plugin host API stubs: a plugin worker thread
  could load g_pluginHost and clear the null-check just as ~PluginHost cleared
  the pointer and freed m_callback / unloaded the plugin DLL, then dereference
  freed memory. Guard host lifetime with a process-global wil::srwlock: stubs
  hold it shared for the call (via a ScopedHostRef helper) and the destructor
  takes it exclusive before clearing g_pluginHost, so teardown drains in-flight
  stubs and any later stub observes null and returns E_UNEXPECTED.
- Drop the misleading [unique] from the WSLC hooks' scalar HANDLE UserToken in
  the IDL. [unique] is a no-op on a scalar (MIDL ignores it, so it compiled
  byte-identically), and it falsely implied an optional-token contract that
  was never on the wire. The WSL hooks already declare the token without it;
  this makes the two consistent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

Comment on lines +140 to +152
DeadSessions.reserve(DeadSessions.size() + 1);
try
{
DeadSessions.push_back(std::move(entry));
}
catch (...)
{
// Defensive: if queuing for the deferred stopping dispatch
// still fails, keep the session tracked and reap it on a
// later pass rather than dropping it without unregistering.
LOG_CAUGHT_EXCEPTION();
return false;
}
Comment on lines +469 to 475
std::vector<BYTE> PluginManager::SerializeSid(PSID Sid)
{
const DWORD sidLength = GetLengthSid(Sid);
std::vector<BYTE> buffer(sidLength);
CopySid(sidLength, buffer.data(), Sid);
return buffer;
}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis

Copy link
Copy Markdown
Member Author

Superseded by #40769, which uses the same out-of-process plugin host but replaces this PR's dual m_callbackLock + session-ref registry with a single-lock threaded callback pump (per review feedback). Closing in favor of #40769.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

msix Installer issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants