Skip to content

An error occurred during the speech processing process #5

@jiTest921

Description

@jiTest921

(m3_agent_env_311) (base) ubuntu@VM-0-13-ubuntu:~/m3-agent-master$ python m3_agent/memorization_intermediate_outputs.py \

--data_file data/data.jsonl
/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/albumentations/check_version.py:147: UserWarning: Error fetching version info The read operation timed out
data = fetch_version_info()
/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:121: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
warnings.warn(
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
/home/ubuntu/m3-agent-master/mmagent/voice_processing.py:35: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
pretrained_state = torch.load("models/pretrained_eres2netv2.ckpt", map_location='cpu')
{'video_found': True, 'audio_found': True, 'metadata': {'major_brand': 'isom', 'minor_version': '512', 'compatible_brands': 'isomiso2mp41', 'encoder': 'Lavf58.29.100'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [1578, 720], 'bitrate': 7249, 'fps': 30.0, 'codec_name': 'hevc', 'profile': '(Main)', 'metadata': {'Metadata': '', 'handler_name': 'VideoHandler', 'vendor_id': '[0][0][0][0]'}}, {'input_number': 0, 'stream_number': 1, 'stream_type': 'audio', 'language': None, 'default': True, 'fps': 44100, 'bitrate': 129, 'metadata': {'Metadata': '', 'handler_name': 'SoundHandler', 'vendor_id': '[0][0][0][0]'}}], 'input_number': 0}], 'duration': 30.17, 'bitrate': 7397, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'hevc', 'video_profile': '(Main)', 'video_size': [1578, 720], 'video_bitrate': 7249, 'video_fps': 30.0, 'default_audio_input_number': 0, 'default_audio_stream_number': 1, 'audio_fps': 44100, 'audio_bitrate': 129, 'video_duration': 30.17, 'video_n_frames': 905}
/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i data/clips/robot/bedroom_01/48.mp4 -loglevel error -f image2pipe -vf scale=1578:720 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
MoviePy - Writing audio in /tmp/tmp_1wfsi4j.wav
MoviePy - Done.
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
You are attempting to use Flash Attention 2 without specifying a torch dtype. This might lead to unexpected behaviour
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.27it/s]
The image processor of type Qwen2VLImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. Note that this behavior will be extended to all models in a future release.
You have video processor config saved in preprocessor.json file which is deprecated. Video processor configs should be saved in their own video_preprocessor.json file. You can rename the file or load and save the processor back which renames it automatically. Loading from preprocessor.json will be removed in v5.0.
Traceback (most recent call last):
File "/home/ubuntu/m3-agent-master/mmagent/voice_processing.py", line 234, in process_voices
with open(save_path, "r") as f:
^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'data/intermediate_outputs/robot/bedroom_01/clip_8_voices.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/m3-agent-master/m3_agent/memorization_intermediate_outputs.py", line 93, in
streaming_process_video(json.loads(line))
File "/home/ubuntu/m3-agent-master/m3_agent/memorization_intermediate_outputs.py", line 77, in streaming_process_video
process_segment(
File "/home/ubuntu/m3-agent-master/m3_agent/memorization_intermediate_outputs.py", line 40, in process_segment
process_voices(
File "/home/ubuntu/m3-agent-master/mmagent/voice_processing.py", line 239, in process_voices
asrs = diarize_audio(base64_video, filter=filter_duration_based)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/m3-agent-master/mmagent/voice_processing.py", line 158, in diarize_audio
response, _ = qwen_get_response(messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/m3-agent-master/mmagent/utils/chat_qwen.py", line 48, in get_response
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py", line 343, in apply_chat_template
or conversation[0]["content"][0]["text"]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
TypeError: string indices must be integers, not 'str'
(m3_agent_env_311) (base) ubuntu@VM-0-13-ubuntu:~/m3-agent-master$

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions