Skip to content

[FR-126] Rollback#274

Open
eniko-dif wants to merge 7 commits into
temporalio:mainfrom
eniko-dif:eniko/feature-rollback
Open

[FR-126] Rollback#274
eniko-dif wants to merge 7 commits into
temporalio:mainfrom
eniko-dif:eniko/feature-rollback

Conversation

@eniko-dif

@eniko-dif eniko-dif commented Apr 14, 2026

Copy link
Copy Markdown

What was changed

Add a separate rollback config to the deployment specification that can be used to have different rollback behavior from the rollout one. It supports AllAtOnce or Progressive and does not have a gate. Uses the LastCurrentTime of a deployment to decide if the target version supposed to be rolled out or rolled back.

Why?

Open feature issue: #126

Checklist

  1. Closes [Feature Request] Rollback mode #126

  2. How was this tested:

Unit and integration tests (ongoing actual test with local setup).

  1. Any docs updates needed?

Yes, but needs to be decided where.

@eniko-dif eniko-dif requested review from a team and jlegrone as code owners April 14, 2026 15:35
@CLAassistant

CLAassistant commented Apr 14, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Comment thread api/v1alpha1/workerdeployment_types.go
Comment thread api/v1alpha1/worker_types.go Outdated
Comment thread api/v1alpha1/worker_types.go Outdated
Comment thread internal/planner/planner.go Outdated
Comment thread api/v1alpha1/workerdeployment_types.go
Comment thread api/v1alpha1/workerdeployment_webhook.go Outdated
Comment thread api/v1alpha1/workerdeployment_types.go
Comment thread internal/planner/planner.go Outdated
}

if config.RollbackStrategy.MaxVersionAge != nil && time.Since(*targetVersionInfo.LastCurrentTime) > config.RollbackStrategy.MaxVersionAge.Duration {
l.Info("Skipping rollback: the version's last current time exceeds MaxVersionAge",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
l.Info("Skipping rollback: the version's last current time exceeds MaxVersionAge",
l.Debug("Skipping rollback: the version's last current time exceeds MaxVersionAge",

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger has no Debug level, only Info and Error. (If I remember correctly, it was related to K8s events?)

Comment thread internal/planner/planner.go Outdated
Comment thread api/v1alpha1/temporalworker_webhook.go Outdated
}

allErrs = append(allErrs, validateRolloutStrategy(new.Spec.RolloutStrategy)...)
allErrs = append(allErrs, validateRollbackStrategy(*new.Spec.RollbackStrategy)...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could new.Spec.RollbackStrategy ever be nil? Should probably add a nil check here or update the type signature for validateRollbackStrategy instead of dereferencing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollback strategy is a pointer, because it's not mandatory in the spec. At the point of the validation it should not be nil, because of the default we set, but just in case, I added a nil check.

Comment thread api/v1alpha1/workerdeployment_types.go Outdated
Comment on lines +60 to +64
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce}
} else if s.RollbackStrategy.Strategy == "" {
s.RollbackStrategy.Strategy = RollbackAllAtOnce
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is entirely a matter of taste but I think two if statements is easier to read than the if/else flow (more clear that the strategy is always all at once if left unset).

Suggested change
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce}
} else if s.RollbackStrategy.Strategy == "" {
s.RollbackStrategy.Strategy = RollbackAllAtOnce
}
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{}
}
if s.RollbackStrategy.Strategy == "" {
s.RollbackStrategy.Strategy = RollbackAllAtOnce
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the change, but I remember the times, where I had to optimize even the number of if calls... :P

Comment thread internal/planner/planner.go Outdated
Comment on lines +902 to +903
default:
strategy = temporaliov1alpha1.UpdateAllAtOnce

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think this ought to be an error condition; if the controller sees an unknown strategy it should fall back to manual mode and stop attempting updates to the default version.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this function is used, there is a check to see if the rollout strategy is set to manual, and if it is, rollback is disabled. In any case, I updated the default to be manual.

},
}

testCases := []struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a test cases for

1.RollbackStrategy == nil (even though the webhook should set a default, it's still possible to run the controller without the webhook enabled). When nil I would assert that the behavior matches the default rollback strategy configured by the webhook.
2. RollbackStrategy.MaxVersionAge == nil (assert default max version age applied, again to match the default webhook behavior)
3. RollbackStrategy.MaxVersionAge == 0 (assert false, even when the rest of the current status would normally have resulted in a rollback)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the Config to have rollback strategy at all times, so the first two are no longer needed and the third existed already.

@@ -23,8 +24,9 @@ func TestTemporalWorkerDeployment_ValidateCreate(t *testing.T) {
tests := map[string]struct {
obj runtime.Object
errorMsg string
warnMsg string

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a warning emitted for a combination of Manual rollout + Progressive rollback? That seems in the same vein as rollout steps being faster than rollback steps. Please add a test case for that if so.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When rollback is Manual, rollback is disabled in the planner, so I don't think we need a warning.

@eniko-dif eniko-dif force-pushed the eniko/feature-rollback branch from 371578c to a5d1e33 Compare May 27, 2026 15:52
@eniko-dif eniko-dif force-pushed the eniko/feature-rollback branch from a5d1e33 to 4b0bd82 Compare June 10, 2026 15:41
@eniko-dif eniko-dif requested a review from jlegrone June 10, 2026 18:38

@jaypipes jaypipes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eniko-dif thanks very much for this! I think the UX can be simplified quite a bit by just making the rollback strategy match the rollout strategy. Also, I recommend removing the MaxVersionAge configuration option. Instead, simply roll back to the previous WDV's build ID. I really don't see a use case for configuring a rollback target version of anything other than the previous version...

Comment on lines +95 to +96
// not affected by the rollback. Only new workflow executions will be routed to the rollback
// target version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// not affected by the rollback. Only new workflow executions will be routed to the rollback
// target version.
// not affected by the rollback. Only new workflow executions that are not pinned
// to a specific build ID will be routed to the rollback target version.

// not affected by the rollback. Only new workflow executions will be routed to the rollback
// target version.
//
// Rollback is suppressed when the rollout strategy is Manual.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious, but why is rollback suppressed when the rollout stategy is Manual?

// +optional
Steps []RolloutStep `json:"steps,omitempty"`

// MaxVersionAge limits which versions are eligible as rollback targets.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suspicion this is overly complicated configuration for most users. Would a simpler "just roll back to the previous working (i.e. passes gates) version" behaviour be easier for users to grok? Where the controller simply would attempt rolling back to N-1, then N-2, etc... until landing on a working version?

type RollbackStrategy struct {
// Strategy for rollback. Valid values are "AllAtOnce" or "Progressive".
// Defaults to "AllAtOnce" for fast recovery.
Strategy VersionRollbackStrategy `json:"strategy"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really a valid reason or use case for having a rollback strategy not match the rollout strategy? Why not just have the rollback strategy always be the same as rollout?

Comment on lines +148 to +172
func warnRollbackSlowerThanRollout(rollout RolloutStrategy, rollback RollbackStrategy) admission.Warnings {
switch rollout.Strategy {
case UpdateAllAtOnce:
if rollback.Strategy != RollbackAllAtOnce {
return admission.Warnings{"rollback strategy is slower than rollout: rollout is AllAtOnce, but rollback is Progressive — is that intended?"}
}
case UpdateProgressive:
if rollback.Strategy == RollbackProgressive {
var rolloutTotal, rollbackTotal time.Duration
for _, s := range rollout.Steps {
rolloutTotal += s.PauseDuration.Duration
}
for _, s := range rollback.Steps {
rollbackTotal += s.PauseDuration.Duration
}
if rollbackTotal > rolloutTotal {
return admission.Warnings{fmt.Sprintf(
"rollback strategy is slower than rollout: progressive rollback total duration (%s) exceeds progressive rollout total duration (%s) — is that intended?",
rollbackTotal, rolloutTotal,
)}
}
}
}
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above... I suspect there's not really a valid use case for having a rollback strategy not match the rollout strategy... if we can just say the rollback == rollout, we can simplify things quite a bit and remove the need for these warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Rollback mode

4 participants