Skip to content

fix: Optimize cleanup jobs, eliminate raw ES client usage, add test coverage#2267

Open
niemyjski wants to merge 6 commits into
mainfrom
niemyjski/verbose-guacamole
Open

fix: Optimize cleanup jobs, eliminate raw ES client usage, add test coverage#2267
niemyjski wants to merge 6 commits into
mainfrom
niemyjski/verbose-guacamole

Conversation

@niemyjski

@niemyjski niemyjski commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

Comprehensive optimization of cleanup jobs: eliminate all direct Elasticsearch client usage, fix critical bugs, restore correct loop semantics, and add full integration test coverage.

Critical Bug Fixes

@min_count:2@min:2 in GetDuplicateSignaturesAsync

The @min_count syntax is silently ignored by Foundatio Parsers, causing the aggregation to return ALL signature hashes (not just duplicates). This would have caused FixDuplicateStacks to treat every stack as a duplicate on the next run.

FixDuplicateStacks only ran one batch (regression from refactor)

The original code looped until GetDuplicateSignaturesAsync returned empty. The refactor broke this. Restored the while loop with:

  • Batch tracking and per-batch logging
  • Infinite-loop guard: exits if an entire batch fails to process
  • ImmediateConsistency() on CountAsync inside GetDuplicateSignaturesAsync (one refresh per batch, matching original Indices.RefreshAsync call pattern — NOT per item)

ReassignStackAsync data-loss hazard on empty sequence

If sourceStackIds was empty, PatchAllAsync would have no stack filter and would reassign ALL events to the target stack. Added materialization + early return guard.

FixDuplicateStacks event-first ordering

Moved ReassignStackAsync before SaveAsync (soft-delete). If event reassignment fails, duplicate stacks remain visible and no data is lost. Previously: soft-delete first → event reassignment failure → orphan cleanup deletes the events.

GetDistinctFieldValuesAsync afterKey cursor leak

afterKey was only populated when composite.AfterKey != null, never cleared. Added: always clear cursor first, then repopulate. Callers checking afterKey.AfterKey.Count > 0 now correctly detect end-of-pagination.

Architecture (eliminate raw ES client from jobs)

  • Refactored CleanupOrphanedDataJob to use repository methods exclusively
  • Added GetDistinctFieldValuesAsync using composite aggregation (encapsulated in repository — composite aggregation is not in Foundatio's DSL, so raw client use is justified and documented)
  • Added RemoveAllByProjectIds/RemoveAllByOrganizationIds, RemoveAllByStackIdsAsync to IEventRepository
  • Added ReassignStackAsync using parameterized Painless script
  • Added GetDuplicateSignaturesAsync to IStackRepository

Other Fixes

  • Added OperationCanceledException filter before generic catch
  • Added lock renewal at page boundaries in CleanupDataJob
  • Consolidated two SaveAsync calls into one using spread syntax
  • Removed redundant is_deleted:false filter (repository applies soft-delete filter by default)
  • Reverted page size to 5 for org/project cleanup (2.5s sleep/item makes larger pages impractical)
  • CompositeKeyResult converted from class to record
  • Pattern matching throughout (is null/is not null, is [])
  • XML documentation on composite aggregation rationale and Painless script safety
  • Added :{Message} to error log format strings

Rebased onto main

Resolved merge conflict with #2268 (Deleted counter tests) — preserved all tests from both branches.

Test Coverage (39 job tests + 278 repo tests, all passing)

New job tests (named Method_Given_Expected):

  • RunAsync_SuspendedOrganization_SuspendsRelatedTokens
  • RunAsync_SoftDeletedOrganization_RemovesAllRelatedData
  • RunAsync_SoftDeletedProject_RemovesProjectAndEvents
  • RunAsync_SoftDeletedStack_RemovesStackAndEvents
  • RunAsync_EventsOutsideRetentionPeriod_RemovesExpiredEvents
  • DeleteOrphanedEventsByStack_WithLargeDataset_DeletesAllOrphanedEvents
  • CleanupSoftDeletedOrganizations_WithMultiplePages_RemovesAllData
  • CleanupSoftDeletedStacks_WithMultiplePages_RemovesAllStacks
  • EnforceRetention_WithMultipleOrganizations_RespectsPerOrgRetention
  • EnforceRetention_WithEventsOutsideRetention_DeletesOnlyExpiredEvents
  • Plus 5 deleted-counter tests from #2268 (merged)

New repository tests:

  • GetDistinctStackIds_WithMultipleStacks_ReturnsAllUniqueIds
  • GetDistinctStackIds_WithPagination_ReturnsAllIds
  • ReassignStack_WithSourceEvents_MovesAllEventsToTarget
  • RemoveAllByProjectIds_WithMatchingEvents_RemovesAll
  • RemoveAllByOrganizationIds_WithMatchingEvents_RemovesAll
  • GetDuplicateSignatures_WithDuplicates_ReturnsSignatures
  • GetDuplicateSignatures_WithNoDuplicates_ReturnsEmpty
  • GetDuplicateSignatures_WithSoftDeletedStacks_ExcludesThem

Comment thread src/Exceptionless.Core/Repositories/EventRepository.cs Fixed
Comment thread src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs Fixed
@niemyjski niemyjski self-assigned this May 30, 2026
Comment thread tests/Exceptionless.Tests/Jobs/CleanupOrphanedDataJobTests.cs
Comment thread tests/Exceptionless.Tests/Jobs/CleanupDataJobTests.cs Outdated
Comment thread src/Exceptionless.Core/Repositories/StackRepository.cs
Comment thread src/Exceptionless.Core/Repositories/StackRepository.cs Outdated
Comment thread src/Exceptionless.Core/Repositories/StackRepository.cs Outdated
Comment thread src/Exceptionless.Core/Repositories/EventRepository.cs Outdated
Comment thread src/Exceptionless.Core/Repositories/EventRepository.cs
Comment thread src/Exceptionless.Core/Repositories/EventRepository.cs Outdated
Comment thread src/Exceptionless.Core/Repositories/EventRepository.cs Outdated
Comment thread src/Exceptionless.Core/Repositories/Interfaces/IEventRepository.cs Outdated
Comment thread src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs Outdated
Comment thread src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs Outdated
Comment thread src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs
Comment thread src/Exceptionless.Core/Jobs/CleanupDataJob.cs Outdated
Comment thread src/Exceptionless.Core/Jobs/CleanupDataJob.cs
@niemyjski niemyjski force-pushed the niemyjski/verbose-guacamole branch from 154223e to 7788c75 Compare May 30, 2026 02:45
Comment thread src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs Fixed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors cleanup jobs to rely on repository abstractions, improves duplicate stack cleanup behavior, and adds integration coverage for cleanup/repository operations.

Changes:

  • Reworked orphaned-data and cleanup jobs with lock renewal, cancellation checks, and repository-based deletes/updates.
  • Added event repository helpers for bulk deletion, stack reassignment, and distinct-id aggregation.
  • Added integration tests for cleanup pagination, retention, duplicate signatures, and event repository operations.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/Exceptionless.Core/Jobs/CleanupDataJob.cs Extends lock duration and renews locks during paged cleanup/retention loops.
src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs Replaces direct Elasticsearch calls with repository methods for orphan cleanup and duplicate-stack fixing.
src/Exceptionless.Core/Repositories/EventRepository.cs Adds bulk delete, stack reassignment, and composite aggregation helpers.
src/Exceptionless.Core/Repositories/Interfaces/IEventRepository.cs Exposes new event repository cleanup/query APIs and composite cursor type.
src/Exceptionless.Core/Repositories/StackRepository.cs Adds duplicate signature aggregation lookup.
src/Exceptionless.Core/Repositories/Interfaces/IStackRepository.cs Exposes duplicate signature lookup API.
tests/Exceptionless.Tests/Jobs/CleanupDataJobTests.cs Adds cleanup pagination and retention integration coverage.
tests/Exceptionless.Tests/Jobs/CleanupOrphanedDataJobTests.cs Adds integration coverage for orphan cleanup and duplicate stack merging.
tests/Exceptionless.Tests/Repositories/EventRepositoryTests.cs Adds coverage for distinct ids, stack reassignment, and bulk delete helpers.
tests/Exceptionless.Tests/Repositories/StackRepositoryTests.cs Adds duplicate signature repository coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Exceptionless.Core/Repositories/EventRepository.cs
Comment thread src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs Outdated
…overage

- Refactor CleanupOrphanedDataJob to use repository methods exclusively
  (eliminates all direct IElasticClient usage)
- Fix critical bug: @min_count:2 → @min:2 in GetDuplicateSignaturesAsync
  (invalid syntax was silently ignored, returning ALL signatures as duplicates)
- Add lock renewal at page boundaries in CleanupDataJob
- Add OperationCanceledException filter before generic catch
- Convert CompositeKeyResult class to record
- Use pattern matching (is null/is not null, is []) throughout
- Remove redundant is_deleted:false filter (repository handles soft deletes)
- Revert page size to 5 (2.5s sleep/item makes large pages impractical)
- Consolidate duplicate SaveAsync calls using spread syntax
- Add XML documentation for composite aggregation and script safety
- Add 8 new integration tests for EventRepository and StackRepository
- Rename all tests to Method_Given_Expected convention

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@niemyjski niemyjski force-pushed the niemyjski/verbose-guacamole branch from 7788c75 to 3c5880b Compare May 30, 2026 03:27
niemyjski and others added 4 commits May 30, 2026 07:34
…cleanup

GetByIdsAsync with Include(s => s.Id) only fetched the id field, causing
is_deleted to default to false for all entities. ShouldReturnDocument in
Foundatio checked this field to filter soft-deletes, but always saw false
and treated all entities (including soft-deleted ones) as existing. As a
result, events pointing to soft-deleted stacks/projects/organizations were
never cleaned up by CleanupOrphanedDataJob.

Fix by including IsDeleted in the projection for all three GetByIdsAsync
calls so the soft-delete filter correctly excludes deleted entities and
their orphaned events get cleaned up.

Add integration tests proving that events pointing to soft-deleted
stacks, projects, and organizations are properly cleaned up.

Add a comment to RenewLockAsync in CleanupDataJob explaining that it is
called at each page boundary to prevent the distributed lock from
expiring during long-running bulk cleanup operations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…erage

Four tests were passing for the wrong reason: orphaned events in the
project and org cleanup tests had random stack IDs, so DeleteOrphanedEventsByStackAsync
ran first and deleted them — project/org cleanup was never exercised.
Fixed by using the valid stack ID, ensuring only the intended cleanup
phase handles those events.

Also added assertions to FixDuplicateStacks merge test verifying that
all 110 events land on the target stack (not just total count = 110).

New tests added:
- RemoveAllByStackIds_WithMatchingEvents_RemovesAll
- ReassignStack_WithEmptySourceIds_ReturnsZeroWithoutModification (guards
  against the catastrophic empty-.Stack()-filter-patches-all-events bug)
- GetDistinctProjectIds_WithMultipleProjects_ReturnsAllUniqueIds
- GetDistinctOrganizationIds_WithMultipleOrganizations_ReturnsAllUniqueIds

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cache invalidation for Stack prefix was called in three places:
1. Inside the progress-logging timer block (every 30s during batch)
2. After each batch completes (end-of-batch)
3. After the entire while loop finishes

Places 1 and 3 were redundant with place 2. The end-of-batch call already
ensures the cache is invalidated after each page of duplicates is processed.
Calling it on a timer and again after the loop added no correctness benefit
while adding unnecessary cache churn.

Resolves PR feedback: 'Why do we call this twice?'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add // Arrange / // Act / // Assert comments to all tests in
  CleanupOrphanedDataJobTests and new tests in EventRepositoryTests
  (PR feedback: 'tests need 3 name part with act, assert, arrange')
- Add explanatory comment to broad catch (Exception) in FixDuplicateStacks
  clarifying it is intentional: log-and-continue per-group so a single
  corrupt signature does not abort the entire dedup job
- Inline pre-condition assertions and simplify intermediate variables in
  tests for clarity

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@niemyjski niemyjski requested a review from Copilot May 31, 2026 23:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@github-actions

Copy link
Copy Markdown

Code Coverage

Package Line Rate Branch Rate Complexity Health
Exceptionless.AppHost 39% 40% 134
Exceptionless.Core 70% 63% 7897
Exceptionless.Insulation 24% 23% 203
Exceptionless.Web 72% 62% 4135
Summary 69% (14171 / 20543) 61% (7305 / 11900) 12369

@niemyjski

Copy link
Copy Markdown
Member Author

Merged origin/main and addressed the outstanding review feedback in 92fa434.

Summary:

  • Removed the accidental packages/ diff from the PR; the final diff against origin/main is back to the intended cleanup/repository/test files.
  • Added/kept integration coverage for the new repository methods, duplicate signature lookup, soft-delete filtering, empty ReassignStackAsync safety, orphan cleanup, and cleanup usage accounting.
  • Updated duplicate-stack cleanup to loop until no duplicate signatures remain and to renew locks during long-running cleanup paths.
  • Documented the composite aggregation cursor path and aligned it with the current Elastic client API.
  • Treated missing event indexes as zero matching events during event cleanup, which fixes empty soft-deleted project cleanup.

Verification:

  • dotnet test -- --filter-class Exceptionless.Tests.Jobs.CleanupDataJobTests: 26 passed
  • dotnet test -- --filter-class Exceptionless.Tests.Jobs.CleanupOrphanedDataJobTests: 24 passed
  • dotnet test -- --filter-class Exceptionless.Tests.Repositories.EventRepositoryTests: 17 passed, 1 skipped performance test
  • dotnet test -- --filter-class Exceptionless.Tests.Repositories.StackRepositoryTests: 13 passed

All review threads are resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants