autotagging: introduce Source class for matching#6681
Conversation
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6681 +/- ##
==========================================
+ Coverage 72.60% 72.64% +0.03%
==========================================
Files 162 163 +1
Lines 20810 20819 +9
Branches 3294 3290 -4
==========================================
+ Hits 15109 15123 +14
+ Misses 4977 4973 -4
+ Partials 724 723 -1
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
big change — grug nervous but also see good idea. PR introduce Source value object so importer, matcher, distance, and UI all share one current-metadata shape instead of each layer re-cooking common tags from raw items. also move AttrDict, Match/AlbumMatch/TrackMatch, and many exports around so beets.autotag becomes flatter public surface.
Changes:
- new
beets/autotag/source.pywithSource.from_items/Source.from_item;tag_albumandtag_itemnow takeSourceand return onlyProposal distance()now takeoriginal: AttrDict+unmatched_count(no longer recompute common tags);get_most_common_tags()returns onlylikeliesAttrDictmoved tobeets.util;Match/AlbumMatch/TrackMatchmoved tobeets/autotag/match.py;cur_artist/cur_albumremoved fromImportTask; many import paths flattened tobeets.autotag
Reviewed changes
Copilot reviewed 42 out of 42 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| beets/autotag/source.py | new Source NamedTuple, central current-metadata object |
| beets/autotag/match.py | tag_album/tag_item take Source; Match/AlbumMatch/TrackMatch moved here |
| beets/autotag/distance.py | distance() takes original AttrDict + unmatched_count |
| beets/autotag/hooks.py | AttrDict and Match classes removed; uses beets.util.AttrDict |
| beets/autotag/init.py | flatter re-exports incl. Source, Distance, Match, etc. |
| beets/util/init.py | AttrDict lives here; get_most_common_tags returns one dict |
| beets/importer/tasks.py | task.source cached_property; removes cur_artist/cur_album; new lookup_candidates signatures |
| beets/ui/commands/import_/session.py | choose_candidate/manual flows use task.source |
| beets/ui/commands/import_/display.py | show_change/show_item_change take artist/name strings directly |
| beetsplug/mbpseudo.py | use Source.from_items(...).data + unmatched_count for distance() |
| beetsplug/bench.py | only updates tag_album import (call site still passes items — broken) |
| many beetsplug/* + tests | mechanical import path updates to from beets.autotag import ... |
| .git-blame-ignore-revs | adds refactor commit hashes |
other notable findings: beetsplug/ihate.py still references removed task.cur_artist/cur_album; beetsplug/titlecase.py TYPE_CHECKING imports Info from beets.autotag where it is not exported; several docstrings (tag_album, get_most_common_tags, choose_candidate, Source.from_item, importer/stages.py) still describe pre-refactor return values or parameters.
Comments suppressed due to low confidence (1)
beets/ui/commands/import_/session.py:371
- docstring still describe old parameters
cur_artist,cur_album,itemcount,itemandsingleton. new signature only takecandidates,rec,source,choices. update docstring so future grug not chase ghost args.
def choose_candidate(candidates, rec, source: Source, choices=[]):
"""Given a sorted list of candidates, ask the user for a selection
of which candidate to use. Applies to both full albums and
singletons (tracks). Candidates are either AlbumMatch or TrackMatch
objects depending on `singleton`. for albums, `cur_artist`,
`cur_album`, and `itemcount` must be provided. For singletons,
`item` must be provided.
7322755 to
95e0a81
Compare
|
Am having a bit of a hard time understanding the changes here since you also moved functions around quite a bit. For reviewing purpose, would you mind splitting the refactor into another PR? |
Of course - I will split the first two commits into a separate PR. |
|
See #6687 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
test/ui/commands/test_import.py:94
- grug spy bug here.
show_change()signature now(source: Source, match: AlbumMatch)perbeets/ui/commands/import_/display.py, but this test still pass old three-arg shape(artist_str, album_str, match). Test will fail at call time. Need aSourcewhoseartist/namematch the expected "another artist with ..." / "another album" strings used in the assertions below, e.g. build one viaSource.from_itemsover items carrying those values, or constructSource(...)directly. grug skip fix because rightSourceshape is judgment call that affect expected snapshot text.
show_change(
f"another artist with {long_name}",
"another album",
AlbumMatch(change_dist, info, dict(item_info_pairs)),
)
da82f8f to
067663c
Compare
ab82b5c to
b435443
Compare
ecf83ab to
3e67a14
Compare
238a028 to
fb9de7a
Compare
|
|
||
| def distance( | ||
| items: Sequence[Item], | ||
| original: AttrDict[Any], |
There was a problem hiding this comment.
I would assume this should take a Source object now instead of an AttrDict[Any]?
The function arguments should only be these three in my opinion:
- Candidate (AlbumInfo)
- Source
- Assignment (item_info_pairs)
See here for more information and my understanding of it.
There was a problem hiding this comment.
See part of the docstring:
`original` is an AttrDict with the original album metadata.
It previously needed items in order to calculate likelies:
likelies, _ = get_most_common_tags(items)Now, this calculation is done by Source and the data is assigned to an attribute Source.data. It is then fed into distance, since it doesn't need any other data.
I made it an AttrDict so that keys can be accessed as attribute, to make these calls consistent:
dist.add_string("album", original.album, album_info.album)There was a problem hiding this comment.
How about we make Source.data a cached property and give Source as parameter here than. Seems like a slightly more clear abstraction for my mental model.
There was a problem hiding this comment.
Seems a bit weird that we need to precompute the get_most_common_tags in the from_item methods anyways.
There was a problem hiding this comment.
How about we make
Source.dataa cached property and giveSourceas parameter here than. Seems like a slightly more clear abstraction for my mental model.
Two issues with this:
datais different for eachSourcetype so we compute it either infrom_itemorfrom_itemsdatais needed in two places: for distance and inImportTask.chosen_infoso it does indeed need to be pre-computed
There was a problem hiding this comment.
Both of that shouldn't be an issue anymore once the ImportTask and SentienlImportTask combine. This is one of the main goals here, right?
| ) | ||
|
|
||
| @classmethod | ||
| def from_items(cls, items: Sequence[Item]) -> Source: |
There was a problem hiding this comment.
Can we move the items creation into here? Maybe with a from_path alternate constructor?
I can see this being a bit difficulties with integrating this into current task factory but might simplify some things further and would align with my understanding of source.
There was a problem hiding this comment.
I think I kind of see the direction here. Are you saying we could get Source within ImportTaskFactory.singleton and ImportTaskFactory.album? And then replace ImportTask.items with ImportTask.source in the construction?
There was a problem hiding this comment.
Yes that is roughtly my idea. If we dont work with items but work with the source it should allow us to abstract most operations regardless of if we look at an album or singleton.
Introduce Source class to encapsulate metadata extraction logic, replacing scattered `get_most_common_tags` calls throughout codebase. - Add Source class in importer.tasks with artist, name, and metadata - Simplify `distance()` to accept Source.data instead of items list - Update display functions to use Source objects - Cache Source creation with @cached_property on ImportTask This centralizes metadata handling and reduces coupling between autotag and library modules.
beets.autotagSource-Centric RefactorFixes: #6684
Core change
This PR restructures autotagging around a new
beets.autotag.source.Sourceobject.Before this change, album and singleton matching pulled "current metadata" from several places:
After this change, those flows all receive the same input shape: a single
Sourcevalue that carries:itemsartist,name)dataid,id_consensus)That makes
Sourcethe new "current thing being matched" abstraction for both albums and tracks.Architecture
1. New single input object:
SourceSourcecentralizes metadata extraction that used to be scattered across the autotag pipeline.This is the biggest architectural shift in the PR.
2.
tag_album()andtag_item()now operate onSourceThe matching entry points now take a
Sourceinstead of a loose mix of:itemsitemThat gives album and track matching the same calling convention and reduces special-case plumbing.
3.
distance()is now narrower and more explicitdistance()no longer derives "current metadata" internally from a list of items.Instead, it receives:
original: the already-extracted metadata containerunmatched_count: the number of unmatched local tracksThis is a good separation of responsibilities:
Sourceprepares current metadatadistance()only compares metadata4. Importer tasks now cache the current source
ImportTaskandSingletonImportTasknow expose a cachedsourceproperty:Source.from_items(...)Source.from_item(...)That means importer state no longer needs separate
cur_artistandcur_albumfields, and downstream code can read fromtask.sourceinstead.5. UI/display code now depends on the same source model
Import session and display functions now use
task.source.artistandtask.source.namewhen showing candidate changes or prompting for manual search/ID entry.This removes another copy of "current metadata" logic from the UI layer.
6.
AttrDictmoves tobeets.utilAttrDictwas previously defined insidebeets.autotag.hooks. It now lives inbeets.util.High-level effect:
7.
beets.autotagpublic API is flatterRelated cleanup continues by exporting more autotag types directly from
beets.autotag, includingSource.This reduces import-path churn for callers and makes the package boundary easier to understand.
High-level impact
What this improves
1. One place defines "the current metadata we are matching from"
The main benefit is conceptual clarity.
Reviewers can now think about autotagging as:
instead of following several separate paths that each rediscover the same metadata.
2. Album and singleton flows now look alike
Albums and single tracks now use the same structure and similar call patterns.
That should make future autotag changes easier because new behavior can usually be added once at the
Sourceboundary rather than twice in parallel code paths.3. Less coupling between importer, matcher, distance logic, and UI
Previously, each layer needed to know how to reconstruct "current artist/album/common tags".
Now:
SourceSourceSourcedistance()receives prepared metadataEach layer does less guessing about the others.
4. Simpler data flow for debugging and future refactors
When a match looks wrong, there is now a clearer place to inspect the original input:
Source.That should help with debugging, logging, and future changes to metadata extraction rules.
Data flow at a glance
Notable follow-on cleanup included here
get_most_common_tags()now returns only the extracted metadata dictionary, not a(likelies, consensus)pairid_consensusorva_likelySource-based flow.git-blame-ignore-revswas updated for mechanical refactor commitsReviewer guide
If reviewing this PR, the easiest way to read it is:
beets/autotag/source.pytag_album()andtag_item()inbeets/autotag/match.pydistance()signature changetask.sourceThat path shows the architectural intent without getting lost in the mechanical updates.