An error occurred during the speech processing process

(m3_agent_env_311) (base) ubuntu@VM-0-13-ubuntu:~/m3-agent-master$ python m3_agent/memorization_intermediate_outputs.py \
>    --data_file data/data.jsonl
/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/albumentations/check_version.py:147: UserWarning: Error fetching version info The read operation timed out
  data = fetch_version_info()
/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:121: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/ubuntu/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
/home/ubuntu/m3-agent-master/mmagent/voice_processing.py:35: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  pretrained_state = torch.load("models/pretrained_eres2netv2.ckpt", map_location='cpu')
{'video_found': True, 'audio_found': True, 'metadata': {'major_brand': 'isom', 'minor_version': '512', 'compatible_brands': 'isomiso2mp41', 'encoder': 'Lavf58.29.100'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [1578, 720], 'bitrate': 7249, 'fps': 30.0, 'codec_name': 'hevc', 'profile': '(Main)', 'metadata': {'Metadata': '', 'handler_name': 'VideoHandler', 'vendor_id': '[0][0][0][0]'}}, {'input_number': 0, 'stream_number': 1, 'stream_type': 'audio', 'language': None, 'default': True, 'fps': 44100, 'bitrate': 129, 'metadata': {'Metadata': '', 'handler_name': 'SoundHandler', 'vendor_id': '[0][0][0][0]'}}], 'input_number': 0}], 'duration': 30.17, 'bitrate': 7397, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'hevc', 'video_profile': '(Main)', 'video_size': [1578, 720], 'video_bitrate': 7249, 'video_fps': 30.0, 'default_audio_input_number': 0, 'default_audio_stream_number': 1, 'audio_fps': 44100, 'audio_bitrate': 129, 'video_duration': 30.17, 'video_n_frames': 905}
/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i data/clips/robot/bedroom_01/48.mp4 -loglevel error -f image2pipe -vf scale=1578:720 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
MoviePy - Writing audio in /tmp/tmp_1wfsi4j.wav
MoviePy - Done.                                                                                                                                                                      
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
You are attempting to use Flash Attention 2 without specifying a torch dtype. This might lead to unexpected behaviour
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.27it/s]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
Traceback (most recent call last):
  File "/home/ubuntu/m3-agent-master/mmagent/voice_processing.py", line 234, in process_voices
    with open(save_path, "r") as f:
         ^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'data/intermediate_outputs/robot/bedroom_01/clip_8_voices.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/m3-agent-master/m3_agent/memorization_intermediate_outputs.py", line 93, in <module>
    streaming_process_video(json.loads(line))
  File "/home/ubuntu/m3-agent-master/m3_agent/memorization_intermediate_outputs.py", line 77, in streaming_process_video
    process_segment(
  File "/home/ubuntu/m3-agent-master/m3_agent/memorization_intermediate_outputs.py", line 40, in process_segment
    process_voices(
  File "/home/ubuntu/m3-agent-master/mmagent/voice_processing.py", line 239, in process_voices
    asrs = diarize_audio(base64_video, filter=filter_duration_based)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/m3-agent-master/mmagent/voice_processing.py", line 158, in diarize_audio
    response, _ = qwen_get_response(messages)  
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/m3-agent-master/mmagent/utils/chat_qwen.py", line 48, in get_response
    text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/m3-agent-master/m3_agent_env_311/lib/python3.11/site-packages/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py", line 343, in apply_chat_template
    or conversation[0]["content"][0]["text"]
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
TypeError: string indices must be integers, not 'str'
(m3_agent_env_311) (base) ubuntu@VM-0-13-ubuntu:~/m3-agent-master$ 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurred during the speech processing process #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

An error occurred during the speech processing process #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions