[kdenlive] [Bug 502530] New: When trying to use the transcribe feature I get an error and no subtitles

Mon Apr 7 18:39:25 BST 2025

https://bugs.kde.org/show_bug.cgi?id=502530

            Bug ID: 502530
           Summary: When trying to use the transcribe feature I get an
                    error and no subtitles
    Classification: Applications
           Product: kdenlive
           Version: 24.12.3
          Platform: Homebrew (macOS)
                OS: macOS
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: Title Clips & Subtitles
          Assignee: jb at kdenlive.org
          Reporter: roy432002.rd at gmail.com
  Target Milestone: ---

SUMMARY
I wanted to use the transcribe feature to add subtitles to my home movie, but I
keep getting "No speech detected" and an error when I press "Show log"
I had to download the whisper model myself since the downloader from Kdenlive
seems to be stuck; I don't know if this might be relevant.

STEPS TO REPRODUCE
1. Go to a sequence
2. Select a clip in it
3. Press "Transcribe" in the "Speech Editor" menu

OBSERVED RESULT
"No speech detected" and the following error:

/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py:75:
FutureWarning: You are using `torch.load` with `weights_only=False` (the
current default value), which uses the default pickle module implicitly. It is
possible to construct malicious pickle data which will execute arbitrary code
during unpickling (See
https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for
more details). In a future release, the default value for `weights_only` will
be flipped to `True`. This limits the functions that could be executed during
unpickling. Arbitrary objects will no longer be allowed to be loaded via this
mode unless they are explicitly allowlisted by the user via
`torch.serialization.add_safe_globals`. We recommend you start setting
`weights_only=True` for any use case where you don't have full control of the
loaded file. Please open an issue on GitHub for any issues related to this
experimental feature.
  checkpoint = torch.load(fp, map_location=device)
Traceback (most recent call last):
  File "/Users/<My Username>/Library/Application
Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 58,
in load_audio
    out = run(cmd, capture_output=True, check=True).stdout
  File
"/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py",
line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0',
'-i',
'/private/var/folders/_t/3_t8tnnx3cb0j7bdgw3hsdd40000gn/T/kdenlive-ZcKKVn.wav',
'-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']'
returned non-zero exit status 183.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File
"/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py",
line 176, in <module>
    sys.exit(main())
  File
"/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py",
line 158, in main
    result = run_whisper(source, model, device, task, language)
  File
"/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py",
line 140, in run_whisper
    result = loadedModel.transcribe(source, **transcribe_kwargs)
  File "/Users/<My Username>/Library/Application
Support/kdenlive/venv/lib/python3.9/site-packages/whisper/transcribe.py", line
133, in transcribe
    mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
  File "/Users/<My Username>/Library/Application
Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 140,
in log_mel_spectrogram
    audio = load_audio(audio)
  File "/Users/<My Username>/Library/Application
Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 60,
in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 7.1 Copyright (c) 2000-2024
the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --enable-libmp3lame --cc=/usr/bin/clang --cxx=/usr/bin/clang++
--enable-libopus --enable-libvorbis --enable-libvpx --enable-libass
--enable-libaom --enable-libdav1d --enable-libzimg --arch=arm64 --disable-debug
--disable-doc --enable-gpl --enable-version3 --enable-nonfree --enable-openssl
--disable-xlib --disable-libxcb --enable-libx264 --enable-libx265
--enable-rpath --install-name-dir='@rpath'
--prefix=/Users/gitlab/ws/builds/GZwHuM5x/0/sysadmin/ci-management/macos-arm-clang
--libdir=/Users/gitlab/ws/builds/GZwHuM5x/0/sysadmin/ci-management/macos-arm-clang/lib
--disable-static --enable-shared
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[in#0 @ 0x60000313c200] Error opening input: Invalid data found when processing
input
Error opening input file
/private/var/folders/_t/3_t8tnnx3cb0j7bdgw3hsdd40000gn/T/kdenlive-ZcKKVn.wav.
Error opening input files: Invalid data found when processing input

EXPECTED RESULT
Some subtitles for my clip

ADDITIONAL INFORMATION
Even though I have downloaded the model manually, I did pass the "Check model
integrity" test in the "Manage models" menu, and I have "Check Configuration"
and have updated dependencies, all in Kdenlive directly

-- 
You are receiving this mail because:
You are watching all bug changes.