Description
The alternative for time-based media needs to be provided in the same language as the original content.
(Most) spoken languages have an orthographic system to define how the spoken language should be spelled to a written form. Sign languages do not. Nor do they have official versions of how they should be spoken. In the definitions it's correctly stated that sign languages are separate from their "corresponding" spoken languages. This means that transition from a spoken language to a sign language is always an interpretation.
So if there's a sign-language video (video-only content), it needs to meet the criterion 1.2.1 by providing an alternative for time-based media or an audio track. Be definition this should be done with same language as the original one but for sign language content it's not possible. It would always require translation to some spoken language (and it's not clear to which one).
In WCAG, it should be defined whether this frees the sign-language videos from providing transcripts or should the requirement be met by translating the content to a suitable spelled language.
If a video has no audio track and contains only signs it's not accessible to users with limited vision. So this is more than just a theoretical question. The requirement is somewhat a counterpart for the transcript requirement that applies also for podcasts of the visually impaired communities.