Skip to content

bitt-ar/SubFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SubFlow

Image

ko-fi GitHub Discord

SubFlow is an advanced MKV subtitle extractor, translator, and multiplexer. It allows you to seamlessly extract English subtitles from MKV video files, translate them to Arabic using a local Large Language Model (via Ollama or LM Studio) or cloud providers (such as Google Gemini and OpenRouter), and embed the translated subtitle back into the video file without manual intervention.

Key Features

  • Automated Workflow: Extracts, translates, and re-muxes subtitles in a single pipeline.
  • Single File & Directory Support: You have the choice to process a single video or select an entire folder. It will recursively enter all subdirectories and translate what is inside.
  • AI-Powered Translation: Integrates with local AI models and cloud APIs to deliver high-quality, context-aware translations. For optimal results, it is highly recommended to use highly capable smart models. The tool has been tested on gemma4:26b and provides excellent results.
  • Interactive CLI Setup: Complete control over your workflow through simple subcommands and an interactive wrapper.
  • Standalone Modes: Options to solely extract a subtitle track from videos, or to just inject (mux) custom subtitles manually without translation.

Prerequisites

Before using SubFlow, ensure you have the following requirements ready:

  1. Python 3.10+: Required to run the main application logic.
  2. FFmpeg: Essential for extracting and multiplexing subtitle tracks. You must have FFmpeg binaries available on your system.
  3. An LLM Provider (You need at least one of the following):
    • Local Providers (Privacy-focused & Free):
      • Ollama: Install Ollama and pull a solid model via command line (e.g., ollama run gemma4:26b). Ensure it runs in the background.
      • LM Studio: Load your preferred GGUF model and start the Local Inference Server.
    • Cloud APIs:
      • Google Gemini: Get an API key from Google AI Studio.
      • OpenRouter: Get an API key from OpenRouter to access a wide variety of open-weight and proprietary models.

Installation

Ensure you have Python 3.10+ installed. Install the dependencies using the following command:

pip install -r requirements.txt

Additionally, FFmpeg must be present. You can place the FFmpeg binaries inside a folder named ffmpeg-master-latest-win64-gpl-shared/bin/ or configure the path dynamically inside your config.yaml.

Usage & Commands

Video Tutorial

For a complete visual guide on how to install and use SubFlow, watch the tutorial on YouTube:

SubFlow Video Tutorial

You can rapidly start the interactive CLI using the provided wrapper scripts:

  • Windows: Run start.bat
  • Linux/macOS: Run start.sh

Alternatively, you can manually run the script via python main.py.

Batch Processing & Directory Structure

When you select an entire directory as the input, the tool will automatically discover and process all MKV files inside it, including subdirectories.

For example, given the following folder structure:

series/
├── S01/
│   ├── episode1.mkv
│   └── episode2.mkv
└── S02/
    └── episode1.mkv

When selecting the series directory, the tool will automatically enter S01 and S02 and sequentially process (episode1.mkv, episode2.mkv, etc.) independently!

1. Translate Mode

The primary pipeline mode. It extracts subtitles, translates them, and optionally re-muxes them.

Syntax:

python main.py translate <input_path> [options]

Options:

Flag Description
--output-dir or -o Setup a specific directory to save output files.
--chunk-size or -c Number of subtitle lines to send per LLM request.
--no-mux Save the translated SRT file only and skip injecting it into the MKV.
--provider Override the translation provider (ollama, lmstudio, google, openrouter).
--prompt or -p Provide the path to a custom system prompt file.

Examples:

Process a single file:

python main.py translate movie.mkv

Process an entire directory (recursively scans subdirectories):

python main.py translate /series/S01/ --provider openrouter

2. Extract Mode

A standalone utility to interactively pick and extract a subtitle track from an MKV file to an offline SRT file.

Syntax:

python main.py extract <input_file> -o <output.srt>

3. Mux Mode

A standalone utility to inject a local SRT file into an MKV video.

Syntax:

python main.py mux <video.mkv> <subtitle.srt> -o <output.mkv>

Configuration (config.yaml)

The config.yaml file acts as the central control center for SubFlow. It allows you to customize LLM providers, translation behavior, required binary paths, and output formatting. Here is a breakdown of the available settings:

1. General & Providers

  • provider: Specifies the default LLM provider to be used (e.g., "ollama", "lmstudio", "google", "openrouter"). Can be overridden via CLI.
  • Provider Blocks (ollama, lmstudio, openrouter, google): Each block defines specific parameters for the respective service. You can set the entrypoint url (for local instances), the target model name, and the api_key (if required by the cloud API).

2. Translation Settings (translation)

  • chunk_size: Determines how many subtitle lines are grouped together and sent in a single request to the LLM. Default is 20.
  • temperature: Controls the creativity/randomness of the LLM responses. A lower value like 0.2 is recommended to ensure the translation remains formal and accurate.
  • max_tokens_per_line: The maximum length of tokens permitted per translated line to avoid exceeding timing boundaries.
  • timeout: API response timeout limit in seconds.
  • glossary_file: Points to a JSON file containing specific words (dictionary) that must be translated strictly in a predefined format.
  • parallel: Enable parallel execution (enable: true) and define the number of concurrent workers to drastically speed up the translation process.
  • context: A vital feature for contextual-awareness. When enable is true, the tool will search the MKV tracks for languages defined in priority_languages (like French, Spanish, Russian, etc.). These native tracks are extracted and appended as background context for the LLM to improve translation accuracy and understand pronouns effectively.

3. FFmpeg Configuration (ffmpeg)

  • Defines the location of your ffmpeg executables. Set dir to point to the bin folder. If FFmpeg is installed system-wide (within your PATH environment variable), you can assign the absolute paths to the executables instead.

4. Output Preferences (output)

  • directory: Decide where to dump the resulting files. Using "same" will keep output assets next to the original video source.
  • mux_video: Set to true to instruct the application to automatically inject the newly translated subtitle into a new MKV container.
  • srt_suffix / mkv_suffix: Defines the suffix string appended to the filenames of outputs (e.g., .ar.srt).
  • subtitle_language_code / subtitle_title: Determines the MKV metadata tags representing your newly generated subtitle track (e.g. ISO 639 code "ara", title "Arabic").

License & Usage Restrictions

The use of this software is subject to the following strict conditions:

  • No Commercial Use: It is strictly prohibited to sell this project or use it for any commercial or profitable purposes.
  • Ethical & Religious Compliance: It is strictly prohibited to use this project for any unethical purposes, immoral content, or anything that contradicts the Islamic religion.

About

SubFlow is an automated workflow tool that extracts subtitles from MKV videos, translates them using local AI (Ollama/LM Studio) or Cloud APIs (Gemini/OpenRouter), and re-muxes them back. Supports batch processing and context-aware translations.

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

Contributors