TAP test updates#515
Conversation
pg_stat_wal lost wal_sync/wal_write in PG18 (moved to pg_stat_io), the logging collector writes server logs to TESTLOGDIR rather than the data directory, and Spock disables its failover-slots worker on PG18+ in favor of native sync_replication_slots. - 026: read WAL fsyncs from pg_stat_io on PG18+, pg_stat_wal.wal_sync below - 029: grep the harness log dir (TESTLOGDIR) instead of the data directory - 800: skip on PG18+, matching the #if PG_VERSION_NUM >= 180000 source gate
All TAP tests share fixed datadirs (/tmp/tmp_spock_node_*) and ports (544x), so a leaked postmaster from one test poisons the next. When a test died with node_drop failing (e.g. a node that was never fully set up), system_or_bail aborted destroy_cluster before the pg_ctl stop and rm -rf ran, orphaning servers on those fixed ports. Make logical cleanup best-effort (node_drop via system_maybe) and ensure the server stop and datadir removal always run (plain sleep/system), so one test's failure can no longer cascade into the next.
📝 WalkthroughWalkthroughThis PR updates four TAP test files: one selects WAL fsync counters dynamically by PostgreSQL version, one updates log directory resolution, one skips a test suite on PostgreSQL 18+, and one makes cluster teardown more resilient to failures. ChangesTAP Test Fixes
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/tap/t/026_no_double_wal_flush.pl`:
- Around line 61-63: Add context = 'normal' to the PG18 branch of the wal_sync
query in the 026_no_double_wal_flush.pl test so the pg_stat_io lookup matches
pg_stat_wal semantics. Update the $wal_sync_query logic in the conditional
around $pg_version_num to filter pg_stat_io by object = 'wal' and context =
'normal', preventing IOCONTEXT_INIT fsyncs from affecting $before, $after, and
$delta.
In `@tests/tap/t/800_standby_hot_standby_feedback.pl`:
- Around line 3-17: The test file is mixing two test-ending strategies, so keep
only one of `plan tests => 9` or `done_testing()` in the top-level script.
Update the `Test::More` setup near the existing `plan skip_all`/`plan tests`
block so the file uses a single completion style, and ensure any final test
count matches the actual assertions in this script.
In `@tests/tap/t/SpockTest.pm`:
- Around line 387-397: The cleanup logic in SpockTest’s shutdown path
backgrounds pg_ctl stop and then uses a fixed sleep before rm -rf, which can
race with still-running PostgreSQL processes. Update the stop-and-cleanup block
to run pg_ctl stop synchronously using the existing $PG_BIN/pg_ctl invocation so
it waits for each node to fully exit, or replace the fixed sleep with a real
wait/poll for shutdown completion before deleting each datadir. Keep the cleanup
flow in the loop over `@node_datadirs` and preserve the “must always run”
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 73c47e71-fb65-4de8-ab39-04235647217c
📒 Files selected for processing (4)
tests/tap/t/026_no_double_wal_flush.pltests/tap/t/029_remote_commit_ts_recovery_clean_shutdown.pltests/tap/t/800_standby_hot_standby_feedback.pltests/tap/t/SpockTest.pm
Fix non-regularly scheduled TAP tests + harden teardown
Some TAP tests not in the standard schedule failed on PG18. This fixes the genuine test bugs and hardens the shared teardown path so one test's failure can't cascade into others.
Test fixes (26d9df5)
Teardown resilience (83eb14c)
All TAP tests share fixed datadirs (/tmp/tmp_spock_node_*) and ports (544x), so a leaked postmaster from one test poisons the next. Previously a failed node_drop in destroy_cluster (via system_or_bail) could die before pg_ctl stop/rm -rf ran, orphaning servers. Logical cleanup is now best-effort (system_maybe) and the server-stop + datadir-removal always run.