3 b training prep by le1nux · Pull Request #452 · Modalities/modalities

le1nux · 2026-06-15T09:51:02Z

What does this PR do?

This PR prepares the 3B training path by tightening weight-tying behavior for parallel training and by making DCP checkpoint restores more flexible.

General Changes

Add selective AppState component loading so checkpoint restore can load only the model, optimizer, and/or LR scheduler as needed.
Thread components_to_load and allow_partial_load through the app-state factory and DCP checkpoint loading path.
Add has_tied_word_embeddings model capability checks and centralize tied-embedding validation helpers.
Reject tied word embeddings for Tensor Parallelism and Pipeline Parallelism configs.
Update the Llama3-like initializer so lm_head is only initialized separately when weight tying is disabled.
Expose has_tied_word_embeddings on GPT-2 models and add a default implementation on the base model class.
Add tests covering selective checkpoint component loading and tied-embedding validation behavior.

Breaking Changes

Tensor Parallelism and Pipeline Parallelism configs now fail validation when tied word embeddings are enabled.
DCP app-state loading config now includes allow_partial_load, which changes how partial checkpoint restores can be configured explicitly.

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

…eFactory

… weights separately from the input embedding weights, since they will be tied together and should share the same initialization. The lm head weights will be initialized as part of the input embedding weights initialization, so we can remove the separate initialization for the lm head weights when weight tying is enabled.

rrutmann · 2026-06-19T13:21:13Z

+                app_state=self,
+                state_dict=state_dict[StatefulComponents.OPTIMIZER.value],
+            )
+        if self._lr_scheduler is not None and StatefulComponents.LR_SCHEDULER in self._components_to_load:


Should we raise an error if self._components_to_load contains something unexpected?

added a check and also a test case for this.

rrutmann · 2026-06-19T13:56:48Z

@@ -0,0 +1,137 @@
+from unittest.mock import MagicMock


Maybe add a test for invalid combinations of allow_partial_load and components_to_load

…load

le1nux added 4 commits May 27, 2026 17:01

feat: implemented selective component loading in AppState and AppStat…

290c6c5

…eFactory

test: added test for partial checkpoint loading

f08c937

feat: added allow_partial_load option to DCP checkpoint loading

ec1ac4f

feat: hardend weight tying against misconfigurations

33c55a4

le1nux marked this pull request as draft June 15, 2026 09:51

le1nux marked this pull request as ready for review June 19, 2026 13:49

rrutmann approved these changes Jun 19, 2026

View reviewed changes

le1nux added 3 commits June 19, 2026 16:05

chore: added use_weight_tying to Llama3InitializerConfig

62bc8c4

feat: added check for invalid components to AppState's components_to_…

32c8146

…load

chore: minor cleanup

e88b8aa

le1nux merged commit 8db7d24 into main Jun 19, 2026
3 checks passed

le1nux deleted the 3B_training_prep branch June 19, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3 b training prep#452

3 b training prep#452
le1nux merged 8 commits into
mainfrom
3B_training_prep

le1nux commented Jun 15, 2026 •

edited

Loading

Uh oh!

rrutmann Jun 19, 2026

Uh oh!

le1nux Jun 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rrutmann Jun 19, 2026

Uh oh!

le1nux Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

le1nux commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

Uh oh!

rrutmann Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rrutmann Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

le1nux commented Jun 15, 2026 •

edited

Loading