Skip to content

Conversation

@daltonbohning
Copy link
Contributor

Some rank exclusions were hardcoded to the wrong rank.

Test-tag: OSAOfflineReintegration
Test-repeat: 3
Skip-unit-tests: true
Skip-fault-injection-test: true

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@daltonbohning daltonbohning self-assigned this Feb 9, 2026
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Ticket title is 'osa/offline_reintegration.py:OSAOfflineReintegration.test_osa_offline_reintegrate_during_rebuild - False is not true : Pool Version Error: After exclude'
Status is 'In Review'
Labels: 'ci_master_weekly,weekly_test'
https://daosio.atlassian.net/browse/DAOS-18483

Some rank exclusions were hardcoded to the wrong rank.

Test-tag: OSAOfflineReintegration
Test-repeat: 3
Skip-unit-tests: true
Skip-fault-injection-test: true

Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
@daltonbohning daltonbohning marked this pull request as ready for review February 10, 2026 01:28
@daltonbohning daltonbohning requested review from a team as code owners February 10, 2026 01:28
phender
phender previously approved these changes Feb 10, 2026
Copy link
Contributor

@rpadma2 rpadma2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have any questions, please let me know.

if (self.test_during_rebuild is True and index == 0):
# Exclude rank 5
output = self.pool.exclude("5")
# Exclude the rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are breaking the intention of the code here (exclusion of a rank during rebuild).

Start a rebuild operation by excluding a rank... Previous code picked rank "5". I would recommend to pick any rank on line number 99. It is fine. But, line number 113, should use a different rank.

The present change will exclude the same rank twice where the second exclude is just a NOP.

Based on my present data corruption testing, we should add a pool query after line 99 and wait for "Rebuild Busy" state and then allow the exclusion of another rank on line number 113.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to exclude a different rank but TBH I don't really follow the logic of this function

Test-tag: OSAOfflineReintegration
Test-repeat: 3
Skip-unit-tests: true
Skip-fault-injection-test: true

Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants