PICARD-2750: Handle invalid LINK frames when saving MP3 files #2630

francescouteiro · 2025-04-06T17:15:11Z

Summary

This is a…
- Bug fix
- Feature addition
- Refactoring
- Minor / simple change (like a typo)
- Other
Description: MP3 files with LINK frames that do not follow the ID3 specification cause mutagen exceptions when saving with ID3 v2.3 format. This fix handles invalid LINK frames to prevent errors during save operations.

Problem

When processing MP3 files containing LINK frames that don't follow the ID3 specification, Picard returns an error of "Invalid frame ID" exception when saving with ID3 v2.3 format enabled.

Files from archive.org (like https://archive.org/details/yallcomesoundrec00page) store a null byte followed by a URL in the LINK frame instead of the required frame ID followed by URL. This invalid format causes mutagen to raise an exception during the ID3 v2.3 downgrade process.

Current workarounds include setting ID3 format to v2.4 or enabling "Clear existing tags" to remove the problematic frames. This fix proposes a handle for malformed LINK frames, preventing exceptions when saving with ID3 v2.3.

JIRA ticket: PICARD-2750

Solution

This fix addresses the exception by detecting and handling malformed LINK frames in MP3 files during the ID3 v2.3 saving process. When an invalid LINK frame is encountered (containing a null byte followed by URL instead of proper frame ID), it's transformed into a WXXX frame (user-defined website metadata/label).

This approach preserves the URL data from the malformed LINK frames by converting them to the appropriate frame type for URLs, rather than simply discarding the information. The conversion only happens when saving in ID3 v2.3 format, which would otherwise cause errors with these malformed frames.

While not the most comprehensive solution, this approach specifically targets the common issue with archive.org files without requiring UI changes or affecting performance. A more robust solution as suggested in the community discussion could involve user prompts to review and decide on invalid tag data, but that would be a larger feature enhancement rather than a focused bug fix.

This implementation prioritizes stability for users of archive.org files while intelligently preserving the URL data in a proper ID3 frame format.

Action

Additional actions required:

Update Picard documentation (please include a reference to this PR)
Other (please specify below)

picard/formats/id3.py

zas

Overall it should work, but I think the code can be simplified (too much code redundancy, too long methods).

Also please fix linting errors.

picard/formats/id3.py

MP3 files with LINK frames that do not follow the ID3 specification cause mutagen exceptions when saving with ID3 v2.3 format. This fix handles invalid LINK frames to prevent errors during save operations.

francescouteiro · 2025-04-07T14:41:29Z

Thanks for your insights, @zas! I think that I've addressed your concerns for redundancy, as well fixed linting errors!

zas

Code looks much better after recent changes, few more comments though.

zas · 2025-04-07T15:39:27Z

picard/formats/id3.py

+        for frame_id, frame in malformed_link_frames:
+            for url in self._extract_and_convert_link_frame(frame, frame_id):
+                metadata.add('user_website', url)
+


Perhaps I miss something but can't this method be simplified even more:

def _process_link_frames_on_load(self, tags, metadata): """Process malformed LINK frames during initial file load and add them as website URLs.""" for frame_id, frame in list(tags.items()): if self._is_malformed_frame(frame_id, frame): for url in self._extract_and_convert_link_frame(frame, frame_id): metadata.add('user_website', url)

zas · 2025-04-07T15:41:03Z

picard/formats/id3.py

+
+        return confirmed_problematic_frames
+
+    def _extract_and_convert_link_frame(self, frame, frame_id):


Just a minor thing, but everywhere we have frame_id, frame but here, perhaps swap arguments.

zas · 2025-04-07T15:43:14Z

picard/formats/id3.py

+        to_add = []
+        confirmed_problematic_frames = []
+        extracted_urls = []
+        for frame_id, frame in list(tags.items()):


Is there a reason to convert tags.items() to list here?
Elsewhere tags are modified during the loop, but not here.

zas · 2025-04-07T15:47:37Z

picard/formats/id3.py

@@ -1035,6 +1162,53 @@ def _remove_other_supported_tag(self, tags, real_name):
        """Remove other supported tag from ID3 frames."""
        del tags[real_name]

+    def _contains_url_pattern(self, text):


Since it doesn't depend on self it could be made a simple utility function at module level:

def _contains_url_pattern(text): """Check if the given text contains URL-like patterns.""" return any(term in text for term in ('http', 'www.', '://'))

It would shorten calls:

if part.strip() and self._contains_url_pattern(part):

would become:

if part.strip() and _contains_url_pattern(part):

If you prefer to keep it at class level, make it a static method at least (using @staticmethod decoration), but I think the code would be more readable with shorter calls.

zas · 2025-04-07T16:06:43Z

picard/formats/id3.py

+                for frame_id in list(tags.keys()):
+                    if not (frame_id.isalnum() and frame_id.isupper()):
+                        log.warning("Removing invalid frame ID: %r", frame_id)
+                        del tags[frame_id]


@phw Perhaps we should ask user about what do to in this case? Or is it ok to just delete invalid frames and move on?
We could also have an option for that.

An option seems a bit cleaner to me so that the user has some measure of control, but I don't have a feel for how common this situation is encountered. It may not be worth having another setting (that most users won't understand).

I agree that this could be an option. The question is however what we do, if the user disables the removal? We can't keep it as it is, as it will fail saving.

I actually also don't quite see why this code is part of the PR, it seems to try to solve an entirely different problem. Are there cases where the frame IDs are invalid like this?

rdswift · 2025-04-08T19:44:24Z

picard/formats/id3.py

+
+            log.warning("Could not extract URL from malformed LINK frame %r", frame_id)
+
+        except Exception as e:


This is a pretty broad exception capture. What exceptions do you anticipate could occur (and where), and could they be specifically captured?

I agree fully - this is way way way way way way way too broad.

Overall I would really like to see all exception handling like this removed, and instead for all things that would throw exceptions to be handled properly in normal code, leaving exceptions to be exactly that i.e. exceptions!!

rdswift · 2025-04-08T19:44:33Z

picard/formats/id3.py

+            combined_url = ''.join(url_parts).strip(':;,. \t\n\r')
+
+            if combined_url:
+                url_match = re.search(r'(?:https?:?(?:/+|\\+)|www\.)[a-zA-Z0-9][-a-zA-Z0-9\.]+\.[a-zA-Z]{2,}(?:/[^\s:;,]*)?', combined_url)


Just a question because I'm really not sure, but would it be more efficient if the search/match string was pre-compiled as a constant?

rdswift · 2025-04-08T19:44:37Z

picard/formats/id3.py

+                        parts = frame_data.split('\x00')
+                        for part in parts:


Minor comment: What you have currently is likely clearer to read and understand, but since you only use parts in the for line, it could be shortened to for part in frame_data.split('\x00'):.

rdswift · 2025-04-08T19:44:43Z

picard/formats/id3.py

+        """Extract URL from a malformed LINK frame and convert it to a WXXX frame. Handles cases where URLs
+        are split between frame ID and data or separated by null bytes (common for archive.org)"""
+
+        try:


Does everything need to be included within the try: block?

rdswift · 2025-04-08T19:44:47Z

picard/formats/id3.py

+                if self._is_malformed_frame(frame_id, frame):
+                    invalid_id = True
+                    for url in self._extract_and_convert_link_frame(frame, frame_id):
+                        log.debug("Successfully converted malformed LINK frame %r to URL: %r", frame_id, url)


Is this an unusual enough situation that it might warrant logging at one level above debug (i.e. info)?

rdswift · 2025-04-08T19:44:49Z

picard/formats/id3.py

+                invalid_id = False
+                if '\x00' in frame_id:
+                    invalid_id = True


I think this could be shortened to:

Suggested change

invalid_id = False

if '\x00' in frame_id:

invalid_id = True

invalid_id = '\x00' in frame_id

rdswift · 2025-04-08T19:44:55Z

picard/formats/id3.py

+                for frame_id in list(tags.keys()):
+                    if not (frame_id.isalnum() and frame_id.isupper()):
+                        log.warning("Removing invalid frame ID: %r", frame_id)
+                        del tags[frame_id]


An option seems a bit cleaner to me so that the user has some measure of control, but I don't have a feel for how common this situation is encountered. It may not be worth having another setting (that most users won't understand).

rdswift · 2025-04-08T19:44:59Z

picard/formats/id3.py

@@ -170,6 +170,7 @@ class ID3File(File):
        'TSOT': 'titlesort',
        'WCOP': 'license',
        'WOAR': 'website',
+        'WXXX': 'user_website',


If we're going to standardize on the use of this tag, is it something that should be documented in https://picard-docs.musicbrainz.org/en/appendices/tag_mapping.html and/or https://picard-docs.musicbrainz.org/en/variables/variables_other_information.html or should we remain silent about it?

I'm not really happy with the name here. Also not sure if this should be written to other file formats just like this. Maybe just use url? And we should consider how this can be mapped to other formats or prevent writing it there.

@zas @rdswift What are your thoughts?

Do we really need this? I mean it creates complications and the best would to avoid it totally.

francescouteiro · 2025-04-09T10:54:45Z

Thanks a lot for the insightful observations, @zas & @rdswift ! I’ll take some time to work through these suggestions and update the PR accordingly as soon as I can!

Sophist-UK

This PR is problematic for me for several reasons:

Perhaps I don't understand exactly what cases this is supposed to fix. Is it just the LINK tag, or all URL tags (W000-WZZZ + LINK according to the ID3 v2.4 spec)? Why does it process every tag and not just these?
Is this simply an issue over the misplacing of a single 0x00 byte or something more complicated. I cannot understand how it needs 200 lines of new code (which is c. 15% of the size of the existing file) just to handle a simple issue over a mis-placed 0x00 byte.
I have a real problem about the use of exceptions to handle failures in the new code rather than avoiding them by checks in the code, leaving exceptions to indicate that something unexpected has happened as a genuine exception to what is expected to happen. But if you are going to use exception handling, it needs to be confined to as few lines as possible and catch THE specific exception that is expected and not just any old exception that might happen.

I cannot really review the bulk of the code until points 1. and 2. are addressed - because this seems to be a massive sledgehammer for a very tiny nut.

Sophist-UK · 2025-04-09T12:54:55Z

picard/formats/id3.py

+
+        try:
+            self._sanitize_id3_frames(tags)
+        except Exception as e:


I have never liked the idea of using global exceptions as it can hide all sorts of unexpected issues.

Personally I would prefer that the _sanitize_id3_frames code to check for things that would cause exceptions and logs them.

I think the methods should be _sanitize_link_frames since this is in id3.py (so id3 is implied) and it doesn't santitize any frames other than link frames.

Sophist-UK · 2025-04-09T13:05:53Z

picard/formats/id3.py

-            tags.save(filename, v2_version=3, v1=v1, v23_sep=separator)
+            try:
+                tags.save(filename, v2_version=3, v1=v1, v23_sep=separator)
+            except ValueError:


I dislike the idea of this try/except too.

Surely this is simply another form of sanitization that should be done before attempting to save the file.

Sophist-UK · 2025-04-09T13:08:56Z

picard/formats/id3.py

+
+            log.warning("Could not extract URL from malformed LINK frame %r", frame_id)
+
+        except Exception as e:


I agree fully - this is way way way way way way way too broad.

Overall I would really like to see all exception handling like this removed, and instead for all things that would throw exceptions to be handled properly in normal code, leaving exceptions to be exactly that i.e. exceptions!!

Sophist-UK · 2025-04-09T13:10:36Z

picard/util/tags.py

@@ -136,6 +136,7 @@
    'totaltracks': N_('Total Tracks'),
    'tracknumber': N_('Track Number'),
    'website': N_('Artist Website'),
+    'user_website': N_('User Defined Website'),


What about all the other non-ID3 formats - how will they handle this completely new tag?????

Yes, this needs to be defined. We shouldn't just write USER_WEBSITE to Vorbis tags, this seems odd. I actually wouldn't expect a fix for the LINK frame issue to introduce a new tag. If we add a new tag it should be well handled from the start. If in doubt I would rather not write a new tag to some formats initially until we have figured out a proper mapping.

Sophist-UK · 2025-04-09T13:25:12Z

picard/formats/id3.py

        else:
            tags.update_to_v24()
            tags.save(filename, v2_version=4, v1=v1)

+    def _sanitize_id3_frames(self, tags):
+        """This method attempts to fix various issues with ID3 frames:
+        1. Handle invalid LINK frames (convert to WXXX if possible)


Perhaps I don't understand exactly what cases this is supposed to fix - so it might be helpful if it was spelled out in the comments here.

Perhaps the most obvious confusion is that this PR talks specifically about issues with the LINK tag (see section 4.20 of the ID3 2.4 spec but appears to be looking at all tags and not just LINK tags (and not even just tags starting with W (see section 4.3 of the ID3 2.4 spec which might potentially have this issue).

I would like to see the specific tags to be sanitized listed here and the specific issues and fixes spelled out.

Yes, I agree. I'd also prefer if this would fix the specific case we have.

Also something is strange here: For some reason this issue seems only to happen when saving ID3 v2.3, but not witth v2.4 (at least if what I wrote on the ticket is still true). Looking at the specs for "Linked information" in both ID3 v2.3 and v2.4 I don't really understand why it should be. The specs are very similar. According to the problem description (files containing a LINK frame without frame ID, only the null byte) the LINK frame would be invalid for both.

So maybe mutagen does something different here, but for some reason can successfully save it to ID3 v2.4.

phw · 2025-04-12T09:58:07Z

picard/formats/id3.py

@@ -170,6 +170,7 @@ class ID3File(File):
        'TSOT': 'titlesort',
        'WCOP': 'license',
        'WOAR': 'website',
+        'WXXX': 'user_website',


I'm not really happy with the name here. Also not sure if this should be written to other file formats just like this. Maybe just use url? And we should consider how this can be mapped to other formats or prevent writing it there.

@zas @rdswift What are your thoughts?

phw · 2025-04-12T10:02:59Z

picard/formats/id3.py

+                for frame_id in list(tags.keys()):
+                    if not (frame_id.isalnum() and frame_id.isupper()):
+                        log.warning("Removing invalid frame ID: %r", frame_id)
+                        del tags[frame_id]


I agree that this could be an option. The question is however what we do, if the user disables the removal? We can't keep it as it is, as it will fail saving.

I actually also don't quite see why this code is part of the PR, it seems to try to solve an entirely different problem. Are there cases where the frame IDs are invalid like this?

phw · 2025-04-12T10:12:55Z

picard/formats/id3.py

        else:
            tags.update_to_v24()
            tags.save(filename, v2_version=4, v1=v1)

+    def _sanitize_id3_frames(self, tags):
+        """This method attempts to fix various issues with ID3 frames:
+        1. Handle invalid LINK frames (convert to WXXX if possible)


Yes, I agree. I'd also prefer if this would fix the specific case we have.

Also something is strange here: For some reason this issue seems only to happen when saving ID3 v2.3, but not witth v2.4 (at least if what I wrote on the ticket is still true). Looking at the specs for "Linked information" in both ID3 v2.3 and v2.4 I don't really understand why it should be. The specs are very similar. According to the problem description (files containing a LINK frame without frame ID, only the null byte) the LINK frame would be invalid for both.

So maybe mutagen does something different here, but for some reason can successfully save it to ID3 v2.4.

phw · 2025-04-12T10:15:19Z

picard/util/tags.py

@@ -136,6 +136,7 @@
    'totaltracks': N_('Total Tracks'),
    'tracknumber': N_('Track Number'),
    'website': N_('Artist Website'),
+    'user_website': N_('User Defined Website'),


Yes, this needs to be defined. We shouldn't just write USER_WEBSITE to Vorbis tags, this seems odd. I actually wouldn't expect a fix for the LINK frame issue to introduce a new tag. If we add a new tag it should be well handled from the start. If in doubt I would rather not write a new tag to some formats initially until we have figured out a proper mapping.

zas · 2025-04-23T08:10:15Z

@francescouteiro any update on this?

francescouteiro · 2025-04-23T16:28:13Z

@francescouteiro any update on this?

Very sorry for the delay! Last week was my exams' week, so I just started looking into all the things mentioned here recently! I'll address them between today and this Friday!

francescouteiro · 2025-05-01T14:24:12Z

I'm again very sorry for the delay! I've been out of Wi-Fi at my address since Apr. 24 (plus no mobile data remaining), and with the recent power outage in Portugal it's still not solved. Thankfully, my data plan renewed today, so I can get back to this at 100%! TL;DR: This issue has not been forgotten or abandoned; I am still working on it!

Sophist-UK · 2025-05-01T14:29:05Z

@francescouteiro

@Sophist-UK waves from the Algarve.

As a fellow Portuguese resident, I can confirm just how big an impact the power outage has had. We also had an internet outage of over a week after a huge storm when multiple lightening strikes took out the radio broadband tower, and high winds took out multiple telegraph poles carrying the fibre optics.

zas reviewed Apr 6, 2025

View reviewed changes

picard/formats/id3.py Outdated Show resolved Hide resolved

zas requested changes Apr 7, 2025

View reviewed changes

picard/formats/id3.py Outdated Show resolved Hide resolved

picard/formats/id3.py Outdated Show resolved Hide resolved

picard/formats/id3.py Outdated Show resolved Hide resolved

picard/formats/id3.py Outdated Show resolved Hide resolved

zas requested a review from phw April 7, 2025 14:04

Fix #2750: Handle invalid LINK frames when saving MP3 files

ee68982

MP3 files with LINK frames that do not follow the ID3 specification cause mutagen exceptions when saving with ID3 v2.3 format. This fix handles invalid LINK frames to prevent errors during save operations.

francescouteiro force-pushed the picard-2750-fix-invalid-link-frame branch from 6eee65a to ee68982 Compare April 7, 2025 14:37

francescouteiro requested a review from zas April 7, 2025 14:43

zas requested changes Apr 7, 2025

View reviewed changes

zas reviewed Apr 7, 2025

View reviewed changes

zas requested a review from rdswift April 8, 2025 07:43

rdswift reviewed Apr 8, 2025

View reviewed changes

Sophist-UK suggested changes Apr 9, 2025

View reviewed changes

phw requested changes Apr 12, 2025

View reviewed changes

phw added the Needs more work label Jun 18, 2025


		return confirmed_problematic_frames

		def _extract_and_convert_link_frame(self, frame, frame_id):


		log.warning("Could not extract URL from malformed LINK frame %r", frame_id)

		except Exception as e:

Uh oh!

PICARD-2750: Handle invalid LINK frames when saving MP3 files #2630

Are you sure you want to change the base?

PICARD-2750: Handle invalid LINK frames when saving MP3 files #2630

Uh oh!

Conversation

francescouteiro commented Apr 6, 2025

Summary

Problem

Solution

Action

Uh oh!

Uh oh!

zas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

francescouteiro commented Apr 7, 2025

Uh oh!

zas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdswift Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francescouteiro commented Apr 9, 2025

Uh oh!

Sophist-UK left a comment

Choose a reason for hiding this comment

Uh oh!

Sophist-UK Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdswift Apr 8, 2025 •

edited

Loading

Sophist-UK Apr 9, 2025 •

edited

Loading