Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions docs/adr/37706-quote-env-scalars-containing-colon-space.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# ADR-37706: Quote env scalars containing `: ` via post-hoc YAML text patching

**Date**: 2026-06-08
**Status**: Draft

## Context

The compiler emits GitHub Actions lock files (`*.lock.yml`) by assembling YAML largely through string building rather than a structured marshaller. Env values that contain the token sequence `": "` — for example `ANTHROPIC_CUSTOM_HEADERS: "x-aw-gw-github-repo: ${{ github.repository }}"` — were rendered as plain (unquoted) YAML scalars. YAML parsers interpret an unquoted `: ` inside a scalar as a nested mapping separator, producing invalid YAML that fails to parse. The affected values flow through several independent rendering paths: top-level `env` section extraction, step/job env rendering for engine-generated steps, and the shared `yamlStringValue` helper. A fix needed to make these values valid scalars without altering output for values that do not require quoting.

## Decision

We will guarantee that env values containing `": "` are emitted as quoted YAML scalars by centralizing the quoting rule in a shared helper, `quoteYAMLValueContainingColonSpace`, in `pkg/workflow/yaml_env_helpers.go`, and applying it across all env rendering paths. For text-assembled output we use a companion function, `quoteEnvValuesContainingColonSpace`, that post-processes already-generated YAML text line-by-line, detecting direct children of `env:` (and `- env:`) maps by indentation and quoting only their direct values. This in-place text-patching approach is applied to top-level `env` extraction (`frontmatter_extraction_yaml.go`), step/job env rendering (`compiler_yaml_step_conversion.go`), and the `yamlStringValue` helper (`engine_helpers.go`). Values that already begin with a quote, block, or flow indicator, or that do not contain `": "`, are left unchanged.

## Alternatives Considered

### Alternative 1: Serialize env blocks with a structured YAML marshaller
Instead of patching generated text, the env maps could be marshalled through a YAML library (e.g. `goccy/go-yaml`), which would quote scalars correctly by construction. This was not chosen because the compiler's rendering paths assemble YAML as strings with bespoke indentation, comment preservation (e.g. `uses:` version comments), and expression passthrough (`${{ ... }}`); routing env through a marshaller would risk reflowing or re-quoting output that other code and golden tests depend on, a far larger and riskier change than the targeted scalar fix. *[TODO: verify whether a localized marshaller for just the env sub-tree was evaluated.]*

### Alternative 2: Quote every env value unconditionally
Every env value could be wrapped in quotes regardless of content, sidestepping detection logic. This was not chosen because it would change the output for the large set of values that are already valid unquoted, churning existing lock files and golden-file tests and reducing the readability of generated YAML. The chosen approach deliberately preserves output for values that do not require quoting.

## Consequences

### Positive
- Compiled lock files with `: `-containing env values (notably custom header values embedding expressions) are now valid, parseable YAML.
- The quoting rule lives in a single shared helper, so the three rendering paths apply identical behavior and future call sites can reuse it.
- Output is unchanged for env values that do not require quoting, minimizing churn to existing lock files and golden tests.

### Negative
- `quoteEnvValuesContainingColonSpace` parses YAML by string/indentation heuristics rather than a real parser; it assumes compiler-generated, consistently indented input and only rewrites direct `env:` children, so it is brittle to future changes in how env blocks are emitted (nested mappings or reindentation could silently bypass it).
- The same quoting concern is now addressed in multiple places (text post-processing plus `yamlStringValue`), so the invariant "env scalars with `: ` are quoted" is enforced by convention across paths rather than at a single serialization boundary.
- Only the `": "` token is targeted; other YAML-significant scalar contents (leading `@`, `#`, `*`, `&`, trailing spaces) remain the responsibility of separate logic and are not covered by this change.

### Neutral
- Both `env:` and inline `- env:` list forms are handled, reflecting that env can appear as a mapping key or on a list item.
- A focused regression test (`TestGenerateYAMLWithEnvironmentValueContainingColonSpace`) compiles a workflow with an `engine.env` colon-space value and asserts the output stays quoted and unmarshals as valid YAML.
- Non-string env values fall back to default `%v` formatting and are unaffected by the quoting helper.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/27112537413) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
21 changes: 19 additions & 2 deletions pkg/workflow/compiler_yaml_step_conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ func ConvertStepToYAML(stepMap map[string]any) (string, error) {
// Post-process to move version comments outside of quoted uses values
// This handles cases like: uses: "slug@sha # v1" -> uses: slug@sha # v1
yamlStr = unquoteUsesWithComments(yamlStr)
yamlStr = quoteEnvValuesContainingColonSpace(yamlStr)

// Add 6 spaces to the beginning of each line to match GitHub Actions step indentation
lines := strings.Split(strings.TrimSpace(yamlStr), "\n")
Expand Down Expand Up @@ -153,7 +154,11 @@ func (c *Compiler) renderStepFromMap(out *strings.Builder, step map[string]any,
// For complex fields like "with" or "env" — sort keys for stable output.
fmt.Fprintf(out, "%s:\n", field)
for _, key := range slices.Sorted(maps.Keys(v)) {
fmt.Fprintf(out, "%s %s: %v\n", indent, key, v[key])
if field == "env" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The if field == "env" quoting branch is duplicated here and again around line 198 in the "remaining fields" loop. If the quoting strategy changes (e.g., to also handle with: values), both branches need to be updated in sync — a likely source of future drift.

💡 Extract a shared helper
// writeEnvMapEntry writes a single key: value line, quoting the value when needed for env fields.
func (c *Compiler) writeMapEntry(out *strings.Builder, indent, field, key string, val any) {
    if field == "env" {
        fmt.Fprintf(out, "%s    %s: %s\n", indent, key, formatStepEnvValueForYAML(val))
    } else {
        fmt.Fprintf(out, "%s    %s: %v\n", indent, key, val)
    }
}

Then both loops call c.writeMapEntry(out, indent, field, key, v[key]) — one update point for any future changes.

fmt.Fprintf(out, "%s %s: %s\n", indent, key, formatStepEnvValueForYAML(v[key]))
} else {
fmt.Fprintf(out, "%s %s: %v\n", indent, key, v[key])
}
}
default:
fmt.Fprintf(out, "%s: %v\n", field, v)
Expand Down Expand Up @@ -190,10 +195,22 @@ func (c *Compiler) renderStepFromMap(out *strings.Builder, step map[string]any,
// Sort keys for stable output.
fmt.Fprintf(out, "%s:\n", field)
for _, key := range slices.Sorted(maps.Keys(v)) {
fmt.Fprintf(out, "%s %s: %v\n", indent, key, v[key])
if field == "env" {
fmt.Fprintf(out, "%s %s: %s\n", indent, key, formatStepEnvValueForYAML(v[key]))
} else {
fmt.Fprintf(out, "%s %s: %v\n", indent, key, v[key])
}
Comment on lines +198 to +202
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated if field == "env" block — identical logic appears at lines 157–161 in the ordered-field switch and again here in the remaining-field switch.

💡 Details

Both switch-case arms inside renderStepFromMap now contain:

if field == "env" {
    fmt.Fprintf(out, "%s    %s: %s\n", indent, key, formatStepEnvValueForYAML(v[key]))
} else {
    fmt.Fprintf(out, "%s    %s: %v\n", indent, key, v[key])
}

If formatStepEnvValueForYAML is updated (e.g., to fix the %q gap noted in the adjacent comment), the fix must be applied in both places. Extract a helper or consolidate the map[string]any handling to a single shared function to eliminate the duplication.

}
default:
fmt.Fprintf(out, "%s: %v\n", field, v)
}
}
}

func formatStepEnvValueForYAML(value any) string {
strValue, ok := value.(string)
if !ok {
return fmt.Sprint(value)
}
return yamlStringValue(strValue)
}
Comment on lines +210 to +216
47 changes: 47 additions & 0 deletions pkg/workflow/compiler_yaml_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"testing"

"github.com/github/gh-aw/pkg/testutil"
"github.com/goccy/go-yaml"
)

func TestCompileWorkflowWithInvalidYAML(t *testing.T) {
Expand Down Expand Up @@ -791,6 +792,52 @@ Test content.`
}
}

func TestGenerateYAMLWithEnvironmentValueContainingColonSpace(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] This test validates the engine.env path well, but two other paths added in this PR go untested: the top-level env: section (via frontmatter_extraction_yaml.go) and the step-level env: (via renderStepFromMap). Each passes through different code, so a future regression in one path would not be caught by the other test.

💡 Additional test cases to add

Top-level env: section — add this alongside the existing test or as a separate function:

frontmatter := `---
name: Test Workflow
on: push
env:
  ANTHROPIC_CUSTOM_HEADERS: "x-aw-gw-github-repo: ${{ github.repository }}"
---
# Test`
// Assert lock file contains quoted ANTHROPIC_CUSTOM_HEADERS and is valid YAML.

Step-level env: via a run step with env — verifies the renderStepFromMap path. If the engine emits step-level env with : values, a test fixture that triggers that path would lock it in.

Covering all three injection sites in the test suite ensures the fix is verified end-to-end across every code path that this PR modifies.

tmpDir := testutil.TempDir(t, "yaml-env-colon-space-test")

frontmatter := `---
name: Test Workflow
on: push
permissions:
contents: read
engine:
id: copilot
env:
ANTHROPIC_CUSTOM_HEADERS: "x-aw-gw-github-repo: ${{ github.repository }}"
strict: false
---

# Test Workflow

Test content.`

testFile := filepath.Join(tmpDir, "test.md")
if err := os.WriteFile(testFile, []byte(frontmatter), 0644); err != nil {
t.Fatal(err)
}

compiler := NewCompiler()
if err := compiler.CompileWorkflow(testFile); err != nil {
t.Fatalf("CompileWorkflow() error: %v", err)
}

lockFile := filepath.Join(tmpDir, "test.lock.yml")
content, err := os.ReadFile(lockFile)
if err != nil {
t.Fatalf("Failed to read lock file: %v", err)
}
yamlStr := string(content)

if !strings.Contains(yamlStr, `ANTHROPIC_CUSTOM_HEADERS: "x-aw-gw-github-repo: ${{ github.repository }}"`) {
t.Fatalf("Expected quoted env value in generated YAML, got:\n%s", yamlStr)
}

var parsed map[string]any
if err := yaml.Unmarshal(content, &parsed); err != nil {
t.Fatalf("Generated lock file should be valid YAML: %v\nYAML:\n%s", err, yamlStr)
}
}

// TestGenerateYAMLWithConcurrency tests that concurrency is properly set
func TestGenerateYAMLWithConcurrency(t *testing.T) {
tmpDir := testutil.TempDir(t, "yaml-concurrency-test")
Expand Down
3 changes: 3 additions & 0 deletions pkg/workflow/engine_helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,9 @@ func yamlStringValue(value string) string {
if len(value) == 0 {
return value
}
if quoted := quoteYAMLValueContainingColonSpace(value); quoted != value {
return quoted
}
// Values starting with YAML flow indicators need quoting to be treated as strings.
// '{' would be parsed as a YAML flow mapping, '[' as a YAML flow sequence.
first := value[0]
Expand Down
6 changes: 6 additions & 0 deletions pkg/workflow/frontmatter_extraction_yaml.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ func (c *Compiler) extractTopLevelYAMLSection(frontmatter map[string]any, key st
// which causes validation errors since they start with numbers but contain spaces
yamlStr = parser.QuoteCronExpressions(yamlStr)

// For top-level env values, quote plain scalars containing ": " because YAML
// treats this token sequence as a mapping separator in plain style.
if key == "env" {
yamlStr = quoteEnvValuesContainingColonSpace(yamlStr)
}

// Clean up null values - replace `: null` with `:` for cleaner output
// GitHub Actions treats `workflow_dispatch:` and `workflow_dispatch: null` identically
yamlStr = CleanYAMLNullValues(yamlStr)
Expand Down
72 changes: 72 additions & 0 deletions pkg/workflow/yaml_env_helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package workflow
import (
"fmt"
"sort"
"strconv"
"strings"

"github.com/github/gh-aw/pkg/logger"
Expand All @@ -24,6 +25,77 @@ func formatYAMLEnv(indent, key, value string) string {
return fmt.Sprintf("%s%s: %q\n", indent, key, value)
}

func quoteYAMLValueContainingColonSpace(value string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] quoteYAMLValueContainingColonSpace is the core of this fix but has no isolated unit tests — edge cases are only exercised indirectly through the end-to-end integration test.

💡 Suggested table-driven unit test
func TestQuoteYAMLValueContainingColonSpace(t *testing.T) {
    cases := []struct{ in, want string }{
        {"plain", "plain"},
        {"x-gw: ${{ repo }}", `"x-gw: ${{ repo }}"`},
        {`"already-quoted: val"`, `"already-quoted: val"`}, // no double-quoting
        {"", ""},
        {"no-colon-space", "no-colon-space"},
        {"{flow: map}", "{flow: map}"},   // leading { is left alone
        {"trailing: ", `"trailing: "`},
    }
    for _, c := range cases {
        got := quoteYAMLValueContainingColonSpace(c.in)
        if got != c.want {
            t.Errorf("input %q: got %q, want %q", c.in, got, c.want)
        }
    }
}

The "already-quoted" and prefix-guarded cases are especially important: a silent regression here would produce double-quoted output that is invalid YAML but hard to trace back to this function.

if value == "" ||
strings.HasPrefix(value, "\"") ||
strings.HasPrefix(value, "'") ||
strings.HasPrefix(value, "|") ||
strings.HasPrefix(value, ">") ||
strings.HasPrefix(value, "{") ||
strings.HasPrefix(value, "[") {
return value
}
if strings.Contains(value, ": ") {
return strconv.Quote(value)
}
return value
}

// quoteEnvValuesContainingColonSpace patches YAML text in-place for env blocks,
// quoting direct env values that contain ": " so they remain valid scalars.
//
// Assumptions:
// - Input YAML is compiler-generated and consistently indented.
// - We only rewrite direct children of env: maps (not nested mappings).
// - Both "env:" and "- env:" are handled because env can appear either as a
// regular mapping key or inline on a list item in YAML syntax.
func quoteEnvValuesContainingColonSpace(yamlStr string) string {
lines := strings.Split(yamlStr, "\n")
inEnv := false
envIndent := 0
envChildIndent := -1

for i, line := range lines {
trimmed := strings.TrimSpace(line)
if trimmed == "" {
continue
}

indent := len(line) - len(strings.TrimLeft(line, " "))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] strings.TrimLeft(line, " ") only strips spaces, so tab-indented input would compute indent = 0 for every line, causing the env-detection state machine to silently skip all quoting. The doc comment says "consistently indented" — worth hardening anyway so the function does not silently no-op when that assumption breaks.

💡 One-line fix
indent := len(line) - len(strings.TrimLeft(line, " \t"))

Or, for full Unicode whitespace correctness:

trimmedLeft := strings.TrimLeftFunc(line, unicode.IsSpace)
indent := len(line) - len(trimmedLeft)

if inEnv && indent <= envIndent {
inEnv = false
envChildIndent = -1
}

if trimmed == "env:" || trimmed == "- env:" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] trimmed == "env:" is an exact match, so a line like env: # production headers — valid YAML and plausible in hand-edited generated output — would silently fall through, leaving its child values unquoted. Matching with trimmed == "env:" || strings.HasPrefix(trimmed, "env: #") (or a general HasPrefix(trimmed, "env:")) would be more resilient.

💡 Alternative guard
// Matches bare "env:" and "env: # comment" while rejecting
// "env: some_inline_value" (which the compiler never emits for env blocks).
if trimmed == "env:" || (strings.HasPrefix(trimmed, "env:") && (len(trimmed) == 4 || trimmed[4] == 

inEnv = true
envIndent = indent
envChildIndent = -1
continue
}
if !inEnv || indent <= envIndent {
continue
}
if envChildIndent == -1 {
envChildIndent = indent
}
if indent != envChildIndent {
continue
}

idx := strings.Index(line, ": ")
if idx < 0 {
continue
}
quotedValue := quoteYAMLValueContainingColonSpace(line[idx+2:])
if quotedValue != line[idx+2:] {
lines[i] = line[:idx+2] + quotedValue
}
}

return strings.Join(lines, "\n")
}

// writeHeadersToYAML writes a map of headers to YAML format with proper comma placement
// indent is the indentation string to use for each header line (e.g., " ")
func writeHeadersToYAML(yaml *strings.Builder, headers map[string]string, indent string) {
Expand Down
Loading