Skip to content

GCSS-1132: detect manually-created (unmanaged) repositories#52

Merged
dev-milos merged 10 commits into
mainfrom
feature/gcss-1132-detect-unmanaged-repos
Jun 22, 2026
Merged

GCSS-1132: detect manually-created (unmanaged) repositories#52
dev-milos merged 10 commits into
mainfrom
feature/gcss-1132-detect-unmanaged-repos

Conversation

@dev-milos

@dev-milos dev-milos commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What

Extends the existing periodic drift-check workflow to detect all drift between the desired config and the actual GitHub state - both repos created outside GCSS (invisible to terraform plan) and manual edits to already-managed repos - and surface everything in a single, continuously-updated PR assigned to reviewers. Merge to adopt the change into config, or revert it manually.

It's folded into the existing drift-check job, so there's no new schedule or consumer workflow - the config repo's existing scheduled caller just gains a few args.

Closes #1132. Builds on the compare fix in #51.

Why

The Terraform-plan drift check only sees resources already in Terraform state, so a repo created by hand is invisible to it and never flagged. Importing the org before the plan makes those repos visible (their YAML is generated, and the module's import {} blocks make the plan report them), so one plan now covers new repos and managed-repo drift.

How it works

The single drift-check job runs:

Checkout GCSS → GCSS config setup
  → import-repos            # full org import; new repos get YAML, managed repos' current state captured
  → compare                # drop everything unchanged, keep only new + edited repos
  → terraform plan (refresh)  # new repos via import blocks, managed drift via -refresh → exit 2 if drift
  → Inspect drift          # logs + emits a ::warning:: when terraform sees drift (no longer fails the job)
  → drift-pr               # open / update / close the PR, gated on the plan result

The PR is gated on the terraform plan (plan-exitcode == 2), not on the raw compare output. Terraform is the authority on whether drift is real; compare only builds the PR content. This avoids false positives from config differences terraform treats as no-ops.

Merging the drift PR feeds the existing import-on-merge machinery (promote-imported-configs): new repos are Terraform-imported into state (not recreated); managed-repo edits update repos/.

Changes

.github/workflows/drift-check.yaml

  • The drift-check job now: generate app token → config setup → import-reposcompare → terraform plan → drift-pr.
  • Inspect drift no longer exit 1; it logs and emits a ::warning:: so terraform-level drift that can't be represented as a config change (e.g. a deleted/archived managed repo) stays visible even when no PR is raised.
  • New input reviewers; new secret app_private_key. Workflow-level concurrency so overlapping scheduled runs don't race on the shared branch/PR.

.github/actions/drift-pr/action.yaml (new)

  • Manages a single stable PR idempotently: opens it only when terraform confirms drift and there's config content to show; updates in place; skips the push when nothing changed (no re-notification noise); closes the PR + deletes the branch when drift is resolved. Requests reviewers (non-fatal).

feature/github-repo-importer/pkg/github/github.go

  • nilIfEmpty: normalises empty optional free-text fields (description, homepage_url) to nil. GitHub returns these inconsistently as "" or null over a repo's lifetime (Terraform treats them as equivalent); collapsing "" to nil keeps a fresh import byte-stable against the committed config, so compare doesn't report spurious changes.

feature/github-repo-importer/Justfile

  • import-repos mkdir -ps the per-owner configs dir so it tolerates a zero-repo import (the steady-state "no drift" case) instead of failing on find.

Consumer setup (config repo)

The existing scheduled drift-check caller just gains a few args — no second workflow:

on:
  schedule:
    - cron: '0 * * * *'   # existing trigger
jobs:
  drift-check:
    uses: G-Research/github-terraformer/.github/workflows/drift-check.yaml@<tag>
    with:
      gcss_ref: <tag>
      tfc_org: ${{ vars.TFC_ORG }}
      reviewers: <org/team-or-users>
    secrets:
      tfc_token: ${{ secrets.TFC_TOKEN }}
      app_private_key: ${{ secrets.APP_PRIVATE_KEY }}

Requirements — the job runs in the schedule environment, which must now provide all of:

  • APP_ID + app_private_key — the management GitHub App (already used by import/plan/promote); it must have org-wide ("All repositories") access, or new repos are invisible to the importer.
  • WORKSPACE + tfc_token — for the plan (already present).
  • The schedule environment must have no required-reviewer/wait-timer protection rules (it runs unattended on a schedule, so any approval gate stalls every run).

Testing

Validated end-to-end in a test org (milos-org) against real CI:

Scenario Result
Manually-created repo plan exit 2 → PR opened with that repo, reviewer requested
Manual edit to a managed repo terraform "update in-place" → PR opened with the edited config
Drift PR merged Terraform imported/updated the repo, config promoted to repos/
Re-run, no change compare clean, plan "No changes" → no PR
Re-run with an open PR, still drifting PR updated in place (no duplicates)
Drift resolved (revert / ignore) PR auto-closed + branch deleted
Empty/default field noise (homepage_url "" vs null) importer normalisation → fresh import round-trips → no false positive
Deleted/archived managed repo terraform flags it → ::warning:: raised, no spurious/empty PR

Known considerations

  • PR semantics: the YAML reflects the current GitHub state, so merging a drift PR for a manually-edited repo adopts that change into config; to reject it, don't merge and revert the change manually.
  • Scale: each run does a full org import + plan. On large orgs choose a cron interval that exceeds run duration — overlapping runs are queued (cancel-in-progress: false).
  • Legacy configs: configs imported before the nilIfEmpty change that contain description: ""/homepage_url: "" will be flagged by compare until re-adopted, but plan-gating suppresses spurious PRs (terraform sees no real drift), so they self-heal.
  • GitHub disables scheduled workflows after long repo inactivity - a disabled drift-check means no detection.

@dev-milos dev-milos requested a review from pavlovic-ivan June 18, 2026 13:09
@dev-milos dev-milos marked this pull request as ready for review June 18, 2026 14:16
@dev-milos dev-milos marked this pull request as draft June 19, 2026 11:41
@dev-milos dev-milos marked this pull request as ready for review June 19, 2026 14:20
Comment thread feature/github-repo-importer/Justfile Outdated
import-repos:
go run main.go bulk-import -c import-config.yaml
mkdir -p "../../feature/github-repo-provisioning/gcss_config/importer_tmp_dir/"
mkdir -p "configs/$OWNER" "../../feature/github-repo-provisioning/gcss_config/importer_tmp_dir/"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is not needed. The configs/$OWNER dir is primarily used for debugging locally, and should not be visible in the PR, or the repo. Am i missing something, why is this change present here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It was a leftover from an earlier version of this PR that used an optimization (exclude already-managed repos, import only the unmanaged ones)

@dev-milos dev-milos merged commit f6454cb into main Jun 22, 2026
1 check passed
@dev-milos dev-milos deleted the feature/gcss-1132-detect-unmanaged-repos branch June 22, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants