DAOS-19028 test: REBUILD29 use more precise timing by kccain · Pull Request #18546 · daos-stack/daos

kccain · 2026-06-25T20:22:43Z

Before this change, rebuild_kill_PS_leader_during_rebuild() killed
a non-leader engine and immediately (tried to) inject fault
DAOS_REBUILD_TGT_SCAN_HANG on "all engines". This fault injection
itself suffered RPC timeouts due to the killed engine. This further
affected the test's overall timing, contradicting the goal of
killing the PS leader engine during the first rebuild.

With this change, the test no longer uses fault injection. Instead it
waits for the first rebuild to start and demonstrate evidence of
scanning activity (rs_toberb_obj_nr > 0). This is accomplished using a
new common function test_rebuild_wait_to_scanning_next() that waits
for both the rs_version (pool map version) to increment and the
to-be-rebuilt number of objects to become nonzero.

Test-repeat: 10
Test-tag: test_rebuild_29
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-test-rpms: true

Steps for the author:

Commit message follows the guidelines.
Appropriate Features or Test-tag pragmas were used.
Appropriate Functional Test Stages were run.
At least two positive code reviews including at least one code owner from each category referenced in the PR.
Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

Gatekeeper requested (daos-gatekeeper added as a reviewer).

Before this change, rebuild_kill_PS_leader_during_rebuild() killed a non-leader engine and immediately (tried to) inject fault DAOS_REBUILD_TGT_SCAN_HANG on "all engines". This fault injection itself suffered RPC timeouts due to the killed engine. This further affected the test's overall timing, contradicting the goal of killing the PS leader engine during the first rebuild. With this change, the test no longer uses fault injection. Instead it waits for the first rebuild to start *and* demonstrate evidence of scanning activity (rs_toberb_obj_nr > 0). This is accomplished using a new common function test_rebuild_wait_to_scanning_next() that waits for both the rs_version (pool map version) to increment and the to-be-rebuilt number of objects to become nonzero. Test-repeat: 10 Test-tag: test_rebuild_29 Skip-unit-tests: true Skip-fault-injection-test: true Skip-test-rpms: true Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>

github-actions · 2026-06-25T20:52:37Z

Ticket title is 'daos_test/rebuild.py:DaosCoreTestRebuild.test_rebuild_29 - pool reintegrate failed'
Status is 'In Progress'
Labels: '2.6.5rc3,pr_test,scrubbed_2.8,tcp_provider,test_2.6.5rc1'
https://daosio.atlassian.net/browse/DAOS-19028

kccain · 2026-06-28T09:53:43Z

New version of test passed 10x repeats
https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18546/1/testReport/FTEST_daos_test/DaosCoreTestRebuild-DAOS_Rebuild/

Seems like a roughly comparable execution time to the existing test, so probably no need to adjust the overall test timeout for it. The worst execution time for the 10x repeats with the PR change is 4m 17 seconds, versus worst time from a few recent master daily passing REBUILD29 runs was 4m 11 seconds (in build 354).

kccain added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jun 25, 2026

kccain marked this pull request as ready for review June 28, 2026 09:53

kccain requested review from liuxuezhao and wangshilong June 28, 2026 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DAOS-19028 test: REBUILD29 use more precise timing#18546

DAOS-19028 test: REBUILD29 use more precise timing#18546
kccain wants to merge 1 commit into
masterfrom
kccain/daos_19028_testfix

kccain commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

kccain commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kccain commented Jun 25, 2026

Steps for the author:

After all prior steps are complete:

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

kccain commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant