Skip to content

Tweak understanding for 1.2.3 and 1.2.5 #1790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

patrickhlauke
Copy link
Member

@patrickhlauke patrickhlauke commented May 10, 2021

  • turn the "During existing pauses..." sentence around, so as not to suggest that an exception exists
  • add a note for 1.2.5 that explicitly says that a lack of gaps is not an excuse/exemption
  • expanded the note to also touch on cases where AD is insufficient/partial
  • in both 1.2.3 and 1.2.5, wrap up example transcript in a <blockquote> for clarity
  • general code cleanup (removing unnecessary empty lines, mixed spaced+tabs indentation)

Closes #1768


Preview | Diff

* turn the "During existing pauses..." sentence around, so as not to suggest that an exception exists
* add mention of audio ducking (quack)
* add a note for 1.2.3 that explicitly says that a lack of gaps is not an excuse/exemption
@patrickhlauke
Copy link
Member Author

Any news on this @alastc ?

@patrickhlauke
Copy link
Member Author

any chance this could be considered/discussed at some point?

Copy link
Contributor

@detlevhfischer detlevhfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to replace the slightly odd "redone": "as it would either require the audio to be edited to have sufficient pauses for audio description"

@mbgower
Copy link
Contributor

mbgower commented Apr 29, 2024

IMO, this change to include audio ducking risks over-reaching the current guidance and requirements of 1.2.5.
I have created a draft response to the original issue #1768

I am moving this PR back to the Drafted project column, until we have time to address that Response.

Copy link
Contributor

@bruce-usab bruce-usab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In standard audio description, narration is added during existing pauses in dialogue.

I don't know of a "standard" to cite. Maybe "traditional"?

This may require lowering the volume of background music...

Any concerns for use of "require" here?

@patrickhlauke
Copy link
Member Author

I will revisit this, as I may have run head-first into changes - particularly for the 1.2.3 case. Will reconsider.

@patrickhlauke
Copy link
Member Author

After discussion in the WCAG 2.x TF call, and mulling this over further, I'd suggest reworking my PR here to:

  • leave guidelines/terms/20/audio-description.html as it is (the little bit of information about ducking is not essential here, and not worth the pain of going over a normative change for the glossary definition)
  • for both understanding/20/audio-description-or-media-alternative-prerecorded.html and understanding/20/audio-description-prerecorded.html remove the suggestion of ducking actual dialog if AD is deemed more important - that only confuses matters and introduces ambiguity about how authors should decide when one is more important than the other
  • for understanding/20/audio-description-prerecorded.html leave in the note that absence of breaks in the dialogue absolve authors from needing to do AD
  • for understanding/20/audio-description-or-media-alternative-prerecorded.html create a new note similar to the one proposed for the AAA one, clarifying again that absence of breaks in dialogue does not exempt authors from providing either AD (in a separate version of the video) or a media alternative

@patrickhlauke patrickhlauke force-pushed the patrickhlauke-understanding-audiodescription branch from 2616bd9 to 3103888 Compare June 10, 2024 22:45
@TestPartners
Copy link

There is a technique for something very close to that situation, Using a static text alternative to describe a talking head video.

This technique only applies if there is a single talking head and if there is no other important visual information. Neither of those is the case in most of the videos we test.

@TestPartners
Copy link

@mbgower Video without audio needs to meet 1.2.1. And you are right that there are plenty of pauses so AD is achievable, but since the media is not "synchronized" with audio it may be preferable to just provide the text alternative for time-based media.

WCAG does not have an adequate definition for "synchronized media". I raised the question a couple of years ago (possibly on WebAIM rather than here) and opinion was split. The consensus was that a video with a music track, but no spoken content, was still synchronized media. The rationale was that a single control is used to start and stop the visual and audio content, and this constitutes them being synchronized. This means that AD is required at level AA. I can't say I am happy with that interpretation, but it was the majority view.

Even if you reject that argument and say that that video is video-only, there are other difficulties. What if the video lasts several minutes and there are a few seconds of spoken content. Does that mean it's now synchronised media?

What if the audio track contains all the visual information but they are substantially unsynchronised? That would be really confusing for assistive technology users who have some sight. We really need a solid definition because the meaning of "synchronized media" isn't as obvious as might first appear.

@awkawk
Copy link
Member

awkawk commented Apr 17, 2025

WCAG does not have an adequate definition for "synchronized media". I raised the question a couple of years ago (possibly on WebAIM rather than here) and opinion was split. The consensus was that a video with a music track, but no spoken content, was still synchronized media. The rationale was that a single control is used to start and stop the visual and audio content, and this constitutes them being synchronized. This means that AD is required at level AA. I can't say I am happy with that interpretation, but it was the majority view.

I'd agree with that. The goal when we replaced "multimedia" (which is even worse to try to define) with "synchronized media" was to highlight that we are talking about the combination of at least one time-based media (audio or video) with other content (more audio or video, images, etc) where the additional content and the time-based media are synchronized to each other. Video with a music track is definitely synchronized media. Not all of the content that is synchronized with the original media is carrying needed information, but it may be.

Even if you reject that argument and say that that video is video-only, there are other difficulties. What if the video lasts several minutes and there are a few seconds of spoken content. Does that mean it's now synchronised media?

Yes.

What if the audio track contains all the visual information but they are substantially unsynchronised? That would be really confusing for assistive technology users who have some sight. We really need a solid definition because the meaning of "synchronized media" isn't as obvious as might first appear.

What does "substantially unsynchronized" mean? If both are playing at the same time then it may be poorly edited and confusing content, but it is still synchronized media.

@TestPartners
Copy link

What does "substantially unsynchronized" mean? If both are playing at the same time then it may be poorly edited and confusing content, but it is still synchronized media.

I mean that audio and the visual content it relates to occur at different times in the video. Some of the videos we test comprise a sequence of numerous short video clips. Sometimes the audio for a clip needs to be significantly longer than the visual content, which leads to the audio and video becoming out of sync with each other.

From your comments it appears that there is a clear and straightforward definition for synchronized media, but it is not specified in WCAG. It would be good if that can be done.

@mbgower
Copy link
Contributor

mbgower commented Apr 17, 2025

@TestPartners

From your comments it appears that there is a clear and straightforward definition for synchronized media, but it is not specified in WCAG. It would be good if that can be done.

In what way do you not find this definition clear?

synchronized media
audio or video synchronized with another format for presenting information and/or with time-based interactive components, unless the media is a media alternative for text that is clearly labeled as such

It feels like you are implying that because audio is out of synch with the video (due to technology gaffs, poor sound editing or even intentional techniques) it is no longer synchronized media. But that is confusing the medium with the message. We are defining the medium. Whether we're talking about Jean-Luc Godard's unconventional combinations of asynchronous images and sound or the lastest novel approaches, it is still a combination of image and sound designed to be played in concert over time, which is what is defined as synchronized media. This is to differentiate it from an audio-only or video-only delivery.

Would a note something like this help?

Note: "Synchronized" refers to the fact the audio and video are intended to be played according to the same time base. It is not intended to imply that images and videos that are out of synch with each other (intentionally or unintentionally) in the same experience are excluded by the definition. Asynchronous sound, for example, is still being experienced in the context of time-based, synchronized media.

@mraccess77
Copy link

I don't think technique G203 should be sufficient for SC 1.2.5. It seems sufficient for SC 1.2.3 and perhaps SC 1.1.1. Audio description needs to be synchronized when used and a static text alternative is not. It may be possible that a talking head video already passes 1.2.5 because of no pauses - but this technique confuses the situation and implies a video with audio can be treated like video-only content for 1.2.5 and I don't think that is the case.

@patrickhlauke
Copy link
Member Author

patrickhlauke commented Apr 18, 2025

@awkawk

@patrickhlauke I don't think that we can do that given how 1.2.3 is written. The "alternative for time-based media"/media alternative is the first option and on the other side of the OR is audio description for the prerecorded video content. Given that the second part of 1.2.3 is equal to the entirety of 1.2.5 I don't think that we can say that doing something for 1.2.5 passes but doing the same thing fails 1.2.3. And yes, I understand that people may think that this situation is doing nothing, but I would say that to pass 1.2.5 a site owner needs to assess a video to determine if there is any information that is only provided via video visuals, assess the ability to incorporate descriptions, incorporate the descriptions (if any are possible), and assume any risk associated with their decisions. That isn't nothing.

I was trying to find an awkward but workable way out of the impasse, but I'm not clear why you think that isn't acceptable either. So in effect you're saying: yes, by design, if a video has NO media alternative and NO sufficient gaps in the audio that would have allowed AD, it passes both 1.2.3 and 1.2.5 in your view? and then you say that this is not nothing? the end result is that you have a video that has visual information not currently conveyed in the existing audio, but because there's no sufficient audio gaps for AD you're handwaving it through as a pass for both 1.2.3 and 1.2.5? that, to me, IS nothing, sorry. the end result is still that a blind user won't get any of the visual information, but this conforms happily to WCAG? I really fail to see the logic here, other than "it's easier for content providers to pass WCAG" instead of "we're trying to provide a baseline access to information for blind/VI users with this"

yes, the weird interconnection between 1.2.3 and 1.2.5 (unless i'm mistaken, the only case in WCAG where this happens? where the aspect of one SC is actually a separate SC as well) is tricky. but as we're trying to at least mitigate the problems this has caused without outright changing the normative wording, I would have thought my proposal was the least worst option. then at the very least if we accept (begrudgingly) that no sufficient audio gaps gives you a free pass for AD, then at least the need for media alternative remains as a last resort. otherwise we may as well just auto-pass these SCs and be done with them....

@awkawk
Copy link
Member

awkawk commented Apr 18, 2025

So in effect you're saying: yes, by design, if a video has NO media alternative and NO sufficient gaps in the audio that would have allowed AD, it passes both 1.2.3 and 1.2.5 in your view?

Yes.

and then you say that this is not nothing?

To recap whatI said, I wrote: "I understand that people may think that this situation is doing nothing, but I would say that to pass 1.2.5 a site owner needs to assess a video to determine if there is any information that is only provided via video visuals, assess the ability to incorporate descriptions, incorporate the descriptions (if any are possible), and assume any risk associated with their decisions. That isn't nothing."

the end result is that you have a video that has visual information not currently conveyed in the existing audio, but because there's no sufficient audio gaps for AD you're handwaving it through as a pass for both 1.2.3 and 1.2.5? that, to me, IS nothing, sorry. the end result is still that a blind user won't get any of the visual information, but this conforms happily to WCAG? I really fail to see the logic here, other than "it's easier for content providers to pass WCAG" instead of "we're trying to provide a baseline access to information for blind/VI users with this"

I do object to "happily" in your characterization. When talking about conformance there is no happy or sad, good or bad, just whether something meets the success criteria. I don't think that this is enough to provide a good experience for B/VI users either, but do think that it conforms, yes. And I'm not happy about that, either, but want to follow the wording in the standard and the original intent.

yes, the weird interconnection between 1.2.3 and 1.2.5 (unless i'm mistaken, the only case in WCAG where this happens? where the aspect of one SC is actually a separate SC as well) is tricky.

1.2.3 and 1.2.8 share the same relationship, as do 3.3.4 and 3.3.6. Possibly 1.3.4 and 1.4.6 also. But this seems to be the only pairing where neither SC is AAA.

We should make sure that this is on the list of items to work on in WCAG 3.

@awkawk
Copy link
Member

awkawk commented Apr 18, 2025

I don't think technique G203 should be sufficient for SC 1.2.5. It seems sufficient for SC 1.2.3 and perhaps SC 1.1.1. Audio description needs to be synchronized when used and a static text alternative is not. It may be possible that a talking head video already passes 1.2.5 because of no pauses - but this technique confuses the situation and implies a video with audio can be treated like video-only content for 1.2.5 and I don't think that is the case.

Totally agree, @mraccess77.

@patrickhlauke
Copy link
Member Author

So in effect you're saying: yes, by design, if a video has NO media alternative and NO sufficient gaps in the audio that would have allowed AD, it passes both 1.2.3 and 1.2.5 in your view?

Yes.

and then you say that this is not nothing?

To recap whatI said, I wrote: "I understand that people may think that this situation is doing nothing, but I would say that to pass 1.2.5 a site owner needs to assess a video to determine if there is any information that is only provided via video visuals, assess the ability to incorporate descriptions, incorporate the descriptions (if any are possible), and assume any risk associated with their decisions. That isn't nothing."

So, the author "assumes the risk". When challenged, they can say "i'm fully conformant with WCAG SCs 1.2.3 and 1.2.5, so I don't see what the problem is".

Again then, if people are convinced that this is indeed the intention, let's add explicit, clearly written out notes in the understanding documents for 1.2.3 and 1.2.5, where this interpretation is explicitly stated: if there aren't sufficient gaps in the audio for AD, you can automatically pass 1.2.3 and 1.2.5 (but you then need to assume any risk associated with your decision)".

@scottaohara
Copy link
Member

scottaohara commented Apr 18, 2025

Either way this goes, I agree with Patrick. Be more explicit, even if that means gaps are pointed out rather than glossed over / left to differing interpretations. State that what is required is not necessarily the best user experience. And then strive for addressing that gap with wcag 3

@mbgower
Copy link
Contributor

mbgower commented Apr 21, 2025

Either way this goes, I agree with Patrick. Be more explicit, even if that means gaps are pointed out rather than glossed over / left to differing interpretations. State that what is required is not necessarily the best user experience. And then strive for addressing that gap with wcag 3

I agree we want to strive for some clarity here. I guess I just feel like we have a fairly profound disagreement about interpretation in this discussion.

All the incremental PRs for 1.2 I've created, and which are currently working through the TF, are an attempt to chip away at stuff that I believe we can agree on.

Once the dust has settled from those, maybe we'll have a bit more clarity/alignment on audio description.

@bruce-usab
Copy link
Contributor

bruce-usab commented Apr 21, 2025

From another recent discussion thread about correcting glossary formatting:

The word “pause” never appears in normative text of the SC or definition; it's in a note, yet it has become the focus of a considerable amount of commentary.

Should we consider moving the substance of Notes 1 to 3 into the definition for audio description? That might look something like:

narration added (where there are gaps in dialog of the soundtrack) to describe important visual details that cannot be understood from the main soundtrack alone (where that narration provides information about actions, characters, scene changes, on-screen text, and other visual content) and where all of the video information is not already provided in existing audio

Parenthesis added to help with parsing. Also, this approach implies it might be better to move the “not already provided in existing audio” condition into the 1.2.3/5/7 SC phrasing.

@mbgower
Copy link
Contributor

mbgower commented Apr 21, 2025

From #4122 (comment) thread about correcting glossary formatting:

@bruce-usab Normative changes that could cause folks to reinterpret the standard go in a different bucket. I'm not saying we can't go that route; just that it cannot be incorporated without a new WCAG 2.x release. It would end up in our "Future version updates" column of the TF project board.

My point in the other discussion you quoted from is that folks are using notes in 1.2.5 to justify an interpretation which cannot be arrived at from the normative text alone.

@stevefaulkner
Copy link

Had some late night thoughts on this which may or may not be helpful.

Assumption: Audio description is primarily for those who cannot see the video
Pausing the video to make time for additonal audio description is a possible method to resolve the issue of audio with "no gaps" By doing this the 'no gaps' argument goes out of the window as any length of audio description can be added to the video for those who require audio description. What would appear on screen for AD versions is a static image while the description is announced, then the audio and video can continue.

@patrickhlauke
Copy link
Member Author

patrickhlauke commented Apr 22, 2025

Had some late night thoughts on this which may or may not be helpful.

Assumption: Audio description is primarily for those who cannot see the video Pausing the video to make time for additonal audio description is a possible method to resolve the issue of audio with "no gaps" By doing this the 'no gaps' argument goes out of the window as any length of audio description can be added to the video for those who require audio description. What would appear on screen for AD versions is a static image while the description is announced, then the audio and video can continue.

what you're describing here is 1.2.7 Extended Audio Description (AAA) https://www.w3.org/WAI/WCAG22/Understanding/extended-audio-description-prerecorded.html

yes, that's how you'd solve it (assuming you can't do anything better), but that doesn't help with the 1.2.3/1.2.5 determination of whether or not absence of sufficient gaps is a get-out-of-jail pass or fail

@TestPartners
Copy link

TestPartners commented Apr 22, 2025

In what way do you not find this definition clear?

synchronized media
audio or video synchronized with another format for presenting information and/or with time-based interactive components, unless the media is a media alternative for text that is clearly labeled as such

Bad synchronisation between the audio and visual content isn't the main problem, although I do think the definition should explicitly state that it is permitted, because definitions should be as unambiguous as possible. The bigger issue is that the definition does not address whether a video with no spoken content but some music or other noises (which may have a shorter duration than the visual content) is synchronised media. Andrew has confirmed that it is, but it is far from obvious.

Your proposed definition works for me.

@bruce-usab
Copy link
Contributor

bruce-usab commented Apr 25, 2025

The bigger issue is that the definition does not address whether a video with no spoken content but some music or other noises ... is synchronized media.

I think Understanding (somewhere) includes an example of classic silent movies — which were traditionally paired with dramatically timed music — as being good candidates for both audio description (including ducking of the music) and static alternatives (i.e., a screenplay-like document). That example could address this question.

@awkawk
Copy link
Member

awkawk commented Apr 25, 2025

I think Understanding (somewhere) includes an example of classic silent movies — which were traditionally paired with dramatically timed music — as being good candidates for both audio description (including ducking of the music) and static alternatives (i.e., a screenplay-like document). That example Could address this question.

In a quick review of the examples in Understanding 1.2.x SC the closest I could fine was this in 1.2.1:

A video-only file with an audio track
A silent movie includes an audio track which includes a description of the action in the video.

@GreggVan
Copy link

GreggVan commented Apr 25, 2025 via email

@TestPartners
Copy link

While the silent movie example is fine, as a general rule I don't find such positive examples particularly useful because they generally describe situations where the developer has done the right thing. As a tester who is invariably testing something that has not been designed optimally, it's useful to have examples that demonstrate non-conformances or clarify non-obvious situations.

For example, a video with no spoken content in which the music bears no relation at all to the visual content - we are agreed that this is synchronized media even though that's rather counter-intuitive to people who are not intimately familiar with the SC and the relevant definitions.

@mbgower
Copy link
Contributor

mbgower commented Apr 28, 2025

In what way do you not find this definition clear?

synchronized media
audio or video synchronized with another format for presenting information and/or with time-based interactive components, unless the media is a media alternative for text that is clearly labeled as such

Bad synchronisation between the audio and visual content isn't the main problem, although I do think the definition should explicitly state that it is permitted, because definitions should be as unambiguous as possible. The bigger issue is that the definition does not address whether a video with no spoken content but some music or other noises (which may have a shorter duration than the visual content) is synchronised media. Andrew has confirmed that it is, but it is far from obvious.

Your proposed definition works for me.

@TestPartners I've tweaked the definition some more and put it in PR #4371

@GreggVan
Copy link

GreggVan commented May 1, 2025

My thoughts/ opinion on this

  1. RE captions. For a silent movie - the music is not necessary to understand the movie. In fact - the movies were always SILENT and an organist played music to set the tone but you can understand the movie just fine and there were no sound effects. So captions not needed for silent movie. And having a note appear in caption area to indicate music is being played is not of any real use to anyone. Descriptions of the type of music or its tempo could be, and could add to the enjoymet of the movie. And would be great to have. I'm, not sure it is important to understanding the movie much though. I could see this one called both ways by different people.

  2. RE Audio description -- Silent movies in theaters do not have a sound track - but on the web they do. And if there is a way to have audio description then you need to provide it - at least as an option. And since there is no speech - you have the entire movie available to do the description -- which would include reading the text on screen when it appears.

  3. for a movie where there are no gaps in the dialog (that is there are zeor gaps) then THAT movie with NO gaps would pass without AD because it would meet the requirement -- all gaps (all zero gaps) in the movie were used for AD. However, this is really a fringe case. There is almost no movies or plays where there is absolutely no gap -- including during the lead in to the movie.

@patrickhlauke
Copy link
Member Author

coming back in late, but as i see some folks really getting hung up on "zero gaps is an edge case" ... we're talking about zero usable gaps. there may be a few half second gaps in narration here and there, but i'd still count those as not being "gaps" that are in any way usable to cram in AD. think the majority of tiktok videos with voiceovers. those are the ones I take issue with claiming that they are magically exempt and PASS the SC

@mraccess77
Copy link

It is my understanding that music did enhance silent films and helped to convey the situations occurring in the movie. Almost all silent films had accompanying music - so it would seem to be synchronized in my opinion.

We should poll whether having room at the front or end of the video counts as a pause as it seems like some of us think it could.

@mbgower
Copy link
Contributor

mbgower commented May 2, 2025

for a movie where there are no gaps in the dialog (that is there are zeor gaps) then THAT movie with NO gaps would pass without AD because it would meet the requirement -- all gaps (all zero gaps) in the movie were used for AD. However, this is really a fringe case. There is almost no movies or plays where there is absolutely no gap -- including during the lead in to the movie.

@GreggVan I agree that where there are any audio descriptions and no available gaps to add more, the video meets 1.2.5. But I disagree fundamentally that a movie with zero audio descriptions can pass 1.2.5. That is not supported by the normative language. As I've mentioned elsewhere, you are citing a non-normative note to support that argument, not the normative text which stakes clearly that audio descriptions must be provided unless it is a media alternative. If there is no audio description, how can we say we've met the the normative wording:

Audio description is provided for all prerecorded video content in synchronized media.

Here is that wording with all relevant normative definitions (in bold) incorporated, with minor editorial tweaks and swizzling to make it comprehensible:

narration added to the soundtrack to describe important visual details that cannot be understood from the main soundtrack alone is provided for all moving or sequenced pictures or images content that is not live and is synchronized with another format for presenting information and/or with time-based interactive components

I do not understand how a video that has ZERO narration added can pass that language.

@mbgower
Copy link
Contributor

mbgower commented May 2, 2025

@mraccess77

It is my understanding that music did enhance silent films and helped to convey the situations occurring in the movie. Almost all silent films had accompanying music - so it would seem to be synchronized in my opinion.

The original film of the silent movie obviously had not sound. If someone takes that source material and creates a video-only record, that is not synchronized media, and it could be met through a media alternative to meet 1.2.3, and pass 1.2.5 (i.e., N/A) because it is not synchronized media.

However, if someone attempts to replicate the original movie theatre experience of the time by adding sound to the video, they are now providing synchronized media (it has images and audio) -- at which point it is entirely possible to add audio descriptions and it needs to pass 1.2.5 with audio descriptions.

@awkawk
Copy link
Member

awkawk commented May 2, 2025

@mbgower You are right that the note clarifying that audio description is added during existing pauses is not normative. So there is some ambiguity there.

I do think that the definition for extended audio description helps clarify this.

extended audio description
audio description that is added to an audiovisual presentation by pausing the video so that there is time to add additional description

Note: This technique is only used when the sense of the video would be lost without the additional audio description and the pauses between dialogue/narration are too short.

Of course, the note there is also clarifying, and also non-normative. But I believe that the presence of both of these notes signals the intent of the WG at the time (and the memories of @GreggVan, @bruce-usab, and myself are in agreement on this point also).

You say that you are ok with only a single audio description if there is no additional space for other audio description. Why just one? Would you fail a video with a single audio description if there was space for additional description and content that would be helpful to clarify in a description? Are you pointing out that there is no actual normative language that says that audio description needs to include information about all visual information necessary for comprehension of the video?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to add audio descriptions to videos?