Skip to content

fix(schedules): grant legacy auth for run schedules#344

Merged
jromualdez-scale merged 2 commits into
mainfrom
jerome/fix-schedule-legacy-auth
Jul 1, 2026
Merged

fix(schedules): grant legacy auth for run schedules#344
jromualdez-scale merged 2 commits into
mainfrom
jerome/fix-schedule-legacy-auth

Conversation

@jromualdez-scale

@jromualdez-scale jromualdez-scale commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Preserve Spark-style schedule resource registration while also granting the schedule directly for legacy SGP auth.
  • Ensure schedule create rollback and delete cleanup revoke legacy permissions as well as deregister Spark resources.
  • Cover create, rollback, grant-failure compensation, no-creator, and delete cleanup behavior in focused unit tests.

Context

Legacy agentex-auth currently runs with AUTH_PROVIDER=sgp, where register_resource is a no-op. Run schedules were only writing Spark-style registration, so schedule rows and Temporal clocks could exist while legacy schedule.read search/check returned no permission, causing list endpoints to return 200 [].

This bridges current legacy SGP auth while keeping register_resource / deregister_resource in place for the eventual Spark migration.

Test plan

  • uv run --group test pytest tests/unit/services/test_agent_run_schedule_service.py
  • uv run ruff check src/domain/services/agent_run_schedule_service.py tests/unit/services/test_agent_run_schedule_service.py

Notes

Existing schedules created before this change may need a one-time legacy grant backfill or recreation to become visible through schedule list/get under legacy SGP auth.

Made with Cursor

Greptile Summary

This PR bridges the gap where register_resource is a no-op under the legacy SGP auth provider (AUTH_PROVIDER=sgp), causing schedule.read to return no permission and list endpoints to return 200 []. It adds a grant call after register_resource in the creation path and a revoke call before deregister_resource in the cleanup path, keeping both Spark-style and SGP-style auth in sync.

  • _register_schedule_in_auth now calls grant after register_resource, with a compensation block that calls deregister_resource on grant failure before re-raising; the registered flag in create_schedule remains False on any failure so the outer rollback doesn't double-deregister.
  • _deregister_schedule_from_auth now prepends a best-effort revoke call (in its own try/except) before the existing deregister_resource, so both auth paths are cleaned up on delete or rollback.
  • Tests cover the happy path, grant-failure compensation, registration-failure abort, Temporal-failure rollback, no-creator skip, and delete cleanup — with precise assert_called_once_with assertions on the new grant and revoke calls.

Confidence Score: 5/5

Safe to merge — the dual-auth bridge is correctly wired, the registered flag prevents double-cleanup, and the new test cases verify all compensation paths.

The rollback semantics are sound: registered stays False if _register_schedule_in_auth raises (whether from register_resource or grant), so the outer cleanup never calls _deregister_schedule_from_auth when the inner compensation already handled it. The grant-failure path correctly skips revoke (nothing was granted). The delete path calls revoke then deregister_resource in independent try/except blocks, so a failure in one doesn't prevent the other. Tests are precise and cover the newly introduced failure modes.

No files require special attention.

Important Files Changed

Filename Overview
agentex/src/domain/services/agent_run_schedule_service.py Adds legacy SGP grant/revoke calls alongside the existing Spark register/deregister calls; compensation logic and rollback flag semantics are correct — grant failure triggers deregister (not revoke), and the registered flag stays False so the outer rollback never double-cleans.
agentex/tests/unit/services/test_agent_run_schedule_service.py New tests cover grant-failure compensation, no-creator skip, and upgraded assertions on existing rollback/delete tests to use assert_called_once_with for both grant and revoke; all scenarios are consistent with implementation behavior.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CS as create_schedule
    participant RA as _register_schedule_in_auth
    participant AS as authorization_service
    participant TA as temporal_adapter
    participant DR as _deregister_schedule_from_auth

    CS->>RA: _register_schedule_in_auth(authz_selector, agent_id)
    RA->>AS: "register_resource(schedule_resource, parent=agent_resource)"
    alt register_resource fails
        AS-->>RA: raises Exception
        RA-->>CS: "raises (registered=False)"
        CS->>CS: _best_effort_delete_row()
    else register_resource succeeds
        RA->>AS: grant(schedule_resource)
        alt grant fails
            AS-->>RA: raises Exception
            RA->>AS: deregister_resource(schedule_resource) [compensation]
            RA-->>CS: "raises (registered=False)"
            CS->>CS: _best_effort_delete_row()
        else grant succeeds
            RA-->>CS: "returns True (registered=True)"
            CS->>TA: create_schedule(...)
            alt Temporal fails
                TA-->>CS: raises Exception
                CS->>DR: _deregister_schedule_from_auth(authz_selector)
                DR->>AS: revoke(schedule_resource)
                DR->>AS: deregister_resource(schedule_resource)
                CS->>CS: _best_effort_delete_row()
            else Temporal succeeds
                TA-->>CS: success
                CS-->>CS: return response
            end
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CS as create_schedule
    participant RA as _register_schedule_in_auth
    participant AS as authorization_service
    participant TA as temporal_adapter
    participant DR as _deregister_schedule_from_auth

    CS->>RA: _register_schedule_in_auth(authz_selector, agent_id)
    RA->>AS: "register_resource(schedule_resource, parent=agent_resource)"
    alt register_resource fails
        AS-->>RA: raises Exception
        RA-->>CS: "raises (registered=False)"
        CS->>CS: _best_effort_delete_row()
    else register_resource succeeds
        RA->>AS: grant(schedule_resource)
        alt grant fails
            AS-->>RA: raises Exception
            RA->>AS: deregister_resource(schedule_resource) [compensation]
            RA-->>CS: "raises (registered=False)"
            CS->>CS: _best_effort_delete_row()
        else grant succeeds
            RA-->>CS: "returns True (registered=True)"
            CS->>TA: create_schedule(...)
            alt Temporal fails
                TA-->>CS: raises Exception
                CS->>DR: _deregister_schedule_from_auth(authz_selector)
                DR->>AS: revoke(schedule_resource)
                DR->>AS: deregister_resource(schedule_resource)
                CS->>CS: _best_effort_delete_row()
            else Temporal succeeds
                TA-->>CS: success
                CS-->>CS: return response
            end
        end
    end
Loading

Reviews (2): Last reviewed commit: "test(schedules): tighten legacy auth cle..." | Re-trigger Greptile

Co-authored-by: Cursor <cursoragent@cursor.com>
@jromualdez-scale

Copy link
Copy Markdown
Contributor Author

Implementation note: this is intentionally a bridge for the current legacy SGP auth path. register_resource(schedule, parent=agent) stays in place for Spark resource registration, but legacy SGP treats that lifecycle API as a no-op, so this PR also writes a direct grant(schedule) and cleans it up with revoke(schedule).

Once Spark resource registration is the active read/write path for AgentEx resources, we should revisit this dual-write and remove the legacy grant/revoke fallback if it is no longer needed. Existing schedules created before this change may still need a one-time legacy grant backfill or recreation.

@jromualdez-scale jromualdez-scale changed the title Fix run schedule authz for legacy SGP via grant dual-write fix(schedules): grant legacy auth for run schedules Jul 1, 2026
)
return False
schedule_resource = AgentexResource.schedule(authz_selector)
await self.authorization_service.register_resource(

@rpatel-scale rpatel-scale Jul 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt we be checking spark authz feature flag for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I checked and those Spark rollout flags seem to live in agentex-auth, not scale-agentex.

My understanding is:

  • fgac-agentex-resources controls read/check/search routing.
  • fgac-agentex-resources-dual-write controls whether register_resource writes to Spark.

Since scale-agentex doesn’t currently evaluate those flags, I kept this as a local compatibility bridge for legacy SGP. Do you recommend changing anything here, or does this seem reasonable while legacy SGP is still active?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha no seems reasonable

@jromualdez-scale jromualdez-scale marked this pull request as ready for review July 1, 2026 19:39
@jromualdez-scale jromualdez-scale requested a review from a team as a code owner July 1, 2026 19:39
Comment thread agentex/src/domain/services/agent_run_schedule_service.py
Comment thread agentex/tests/unit/services/test_agent_run_schedule_service.py
Co-authored-by: Cursor <cursoragent@cursor.com>
@jromualdez-scale jromualdez-scale merged commit 648e81c into main Jul 1, 2026
31 checks passed
@jromualdez-scale jromualdez-scale deleted the jerome/fix-schedule-legacy-auth branch July 1, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants