Skip to content

Releases: NullMagic2/SoftWhisper

August 2025 Release

04 Aug 22:41
737d81c

Choose a tag to compare

Hello, everyone,

SoftWhisper August 2025 is out!

This release fixes a bug where custom time ranges were being ignored and the transcription would get stuck at a given point. Hopefully, these should be addressed now.

SoftWhisper May 2025: minor bugfixes!

02 Jun 18:28
163aa58

Choose a tag to compare

Hello, everyone,

SoftWhisper April 2025 is out – now with minor bugfixes.

  • This release adds support for basic japanese documentation (a big thank you to Sunwood, and checks for a few libraries which were missing from detection. Otherwise, it is identical to the April release.

  • Updated the source code in the main tree to properly use InASpeechSegmenter, and hopefully fixed the "too many values to unpack" error.

SoftWhisper April 2025 – Now with speaker identification!

05 Apr 16:15

Choose a tag to compare

Hello, my dear SoftWhisper users!

It is with great joy that I announce that SoftWhisper April 2025 is out – now with speaker identification!

image

A tricky feature

Originally, I wanted to implement diarization with Pyannote, but because APIs are usually not widelly documented,
not only learning how to use them, but also how effective they are for the project, is a bit difficult.

Identifying speakers is still somewhat primitive even with state-of-the-art solutions. Usually, the best results are achieved with fine-tuned models and controlled conditions (for example, two speakers in studio recordings).

The crux of the matter is: not only do we require a lot of money to create those specialized models, but they are incredibly hard to use. That does not align with my vision of having something that works reasonably well and is easy to setup, so I did a few tests with 3-4 different approaches.

A balanced compromise

After careful testing, I believe inaSpeechSegmenter will provide our users the best balance between usability and accuracy: it's fast, identifies speakers to a more or less consistent degree out of the box, and does not require a complicated setup. Give it a try!

Known issues

Please note: while speaker identification is more or less consistent, the current approach is still not perfect and will sometimes not identify cross speech or add more speakers than present in the audio, so manual review is still needed. This feature is provided with the hopes to make diarization easier, not a solved problem.

Increased loading times

Also keep in mind that the current diarization solution will increase the loading times slightly and if you select diarization, computation will also increase. Please be patient.

Other bugfixes

This release also fixes a few other bugs, namely that the exported content sometimes would not match the content in the textbox.

SoftWhisper March 2025 v.2 is out!

22 Mar 22:51

Choose a tag to compare

Small big changes around

Well, unfortunately, not everything is perfect.

When I thought that everyone would be able to enjoy a nice, faster program, we ran into a nasty bug: SoftWhisper would silently fail when one of the settings exceeded the maximum beam size defined by WHISPER_MAX_DECODERS.

I also took a few opportunity to fix other bugs:

  • Deselecting subtitles does not show timestamps into the text.
  • Transcription progress works properly now.
  • Console output textbox was broken. It is now restored to normal.

But no more mystery failures: I've applied two experimental patches to Whisper.cpp.

  • The first patch improves path handling and tries to modernizes it under Windows.
  • The second patch allows our backend to continue running if the user sets a maximum setting greater than supported by whisper-cli. Now, instead of the program silently failing, it just defaults to the maximum size supported by this variable.

This also means that our Cuda build is probably not needed, so I will be back to providing a Vulkan-only build for now.

How to run this software

First, download ffmpeg. The instructions vary according to whether you run Windows or Linux. More on that below.

Windows

  • First, download FFMPEG from here.
  • Now, download the Microsoft Visual C++ runtime. The 2017 version will do.

Aligning with my design philosophy that software should be as uncomplicated as possible to use, I'm making available a 64-bit pre-compiled version of Whisper.cpp that supports Vulkan, so all you need to do this time around, if you use Windows, is to download our repository and run the main script clicking on the batch file Softwhisper.bat.

You will also be prompted to install any missing piece of software.

You can also simply run the software more manually by typing the following command:

Python SoftWhisper.py

Linux

For now, convenience scripts are not available.
Just like windows, you will be required to install FFMPEG.
This can be done invoking the package manager. For example, if you are using Ubuntu, you can type:

sudo apt install ffmpeg

Or use your favorite package manager.

Then, install the dependencies with:

pip install -r requirements.txt

and then run SoftWhisper with:

python SoftWhisper.py

Please note that you will also need to build whisper.cpp or install a pre-compiled build.
However, I have designed the application so that replacing the frontend is a simple point and click operation.

Known bugs

Despite being very performant, this software still has many more lines of code than it should, which I will probably address in the future.
I couldn't get speaker identification to work properly on this release, so it was disabled and removed from the interface.
When you select a new video, it won't load the video right away. You will need to press play.

March-2025

14 Mar 11:42
c09c0c9

Choose a tag to compare

SoftWhisper March 2025 is out!

Big Changes Around

In the previous release, I was unhappy with the performance and accessibility of our application. Our previous implementation was too heavily reliant on CUDA.
AMD users would have to install a specific Pytorch package, but it was too difficult to install, and did not provide much of a benefit anyway.

After some research, I created a ZLUDA branch to mimic CUDA; unfortunately, none of the ZLUDA implementations support Pytorch.

But not all hope was lost.

After more research and frustration, I heard of Whisper.cpp. It is a reimplementation of the OpenAI Whisper API in pure C++, and has minimal dependencies.
Since it can easily use Vulkan, combines CPU + GPU acceleration and can be easily compiled on Linux, it would be worth a shot.

The results are very surprising: Whisper.cpp can transcribe 2 hours of audio in around 2-3 minutes with my current hardware.
By comparison,with multiprocessing and the regular Whisper API, 20-30 minutes of audio will take you around 40 minutes.

Aligning with my design philosophy that software should be as uncomplicated as possible to use, I'm making available a 64-bit pre-compiled version of Whisper.cpp
that supports Vulkan, so all you need to do this time around, if you use Windows, is to download our repository and run the main script with Python:

Python SoftWhisper.py

And that's it! The models will also be downloaded for you if you don't have them.

Please note that I haven't tested this application under Linux; however, just placing a compiled Whisper.cpp of your choice under the same folder
as the project should work. The default name the application will look for is Whisper_lin-x64; however, you can also select the directory of your choice
by simply starting the application and changing the directory under the option "Whisper.cpp executable."

Installation steps

Windows

Download the binary package. It has been compiled with Vulkan support, so it should work in any GPU. However, If you are using an NVIDIA card and notice it doesn't work properly (i.e, the program silently fails), please download instead an NVIDIA-optimized build here

DO NOT try to run the CUDA version on an AMD or Intel system: the program will silently fail!

Just click on SoftWhisper.bat. If any dependency is missing, you will be prompted to install it.
If that fails, install the dependencies manually with the command:
pip install -r requirements.txt

Linux

For now, convenience scripts are not available.
Install the dependencies with:
pip install -r requirements.txt
and then run SoftWhisper with:
python SoftWhisper.py

Known bugs

  • Despite being very performant, this software still has many more lines of code than it should, which I will probably address in the future.
  • I couldn't get speaker identification to work properly on this release, so it was disabled and removed from the interface.
  • When you select a new video, it won't load the video right away. You will need to press play.