DAOS-19028 test: REBUILD29 use more precise timing#18546
Open
kccain wants to merge 1 commit into
Open
Conversation
Before this change, rebuild_kill_PS_leader_during_rebuild() killed a non-leader engine and immediately (tried to) inject fault DAOS_REBUILD_TGT_SCAN_HANG on "all engines". This fault injection itself suffered RPC timeouts due to the killed engine. This further affected the test's overall timing, contradicting the goal of killing the PS leader engine during the first rebuild. With this change, the test no longer uses fault injection. Instead it waits for the first rebuild to start *and* demonstrate evidence of scanning activity (rs_toberb_obj_nr > 0). This is accomplished using a new common function test_rebuild_wait_to_scanning_next() that waits for both the rs_version (pool map version) to increment and the to-be-rebuilt number of objects to become nonzero. Test-repeat: 10 Test-tag: test_rebuild_29 Skip-unit-tests: true Skip-fault-injection-test: true Skip-test-rpms: true Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
|
Ticket title is 'daos_test/rebuild.py:DaosCoreTestRebuild.test_rebuild_29 - pool reintegrate failed' |
Contributor
Author
|
New version of test passed 10x repeats Seems like a roughly comparable execution time to the existing test, so probably no need to adjust the overall test timeout for it. The worst execution time for the 10x repeats with the PR change is 4m 17 seconds, versus worst time from a few recent master daily passing REBUILD29 runs was 4m 11 seconds (in build 354). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before this change, rebuild_kill_PS_leader_during_rebuild() killed
a non-leader engine and immediately (tried to) inject fault
DAOS_REBUILD_TGT_SCAN_HANG on "all engines". This fault injection
itself suffered RPC timeouts due to the killed engine. This further
affected the test's overall timing, contradicting the goal of
killing the PS leader engine during the first rebuild.
With this change, the test no longer uses fault injection. Instead it
waits for the first rebuild to start and demonstrate evidence of
scanning activity (rs_toberb_obj_nr > 0). This is accomplished using a
new common function test_rebuild_wait_to_scanning_next() that waits
for both the rs_version (pool map version) to increment and the
to-be-rebuilt number of objects to become nonzero.
Test-repeat: 10
Test-tag: test_rebuild_29
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-test-rpms: true
Steps for the author:
After all prior steps are complete: