GPU Acceleration

BallonTranslator supports GPU acceleration for text detection, OCR, and inpainting modules. This guide covers setup for NVIDIA CUDA, AMD ROCm, and AMD ZLUDA.

Overview

GPU acceleration can significantly speed up:

Text Detection: 3-5x faster than CPU
OCR: 2-4x faster than CPU
Inpainting: 5-10x faster than CPU

Translation speed depends on the translator service (online API or local LLM), not GPU acceleration.

NVIDIA CUDA Setup (Recommended)

Requirements

NVIDIA GPU with CUDA support (GTX 900 series or newer)
CUDA Compute Capability 3.5 or higher
Windows, Linux, or macOS (Apple Silicon uses different method)

Automatic Installation

BallonTranslator automatically installs CUDA-enabled PyTorch on first run:

python launch.py

The installer will:

Detect your system (Windows/Linux)
Install PyTorch with CUDA 11.8 support
Install torchvision and torchaudio

From launch.py:374-377:

torch_command = os.environ.get('TORCH_COMMAND', 
    "pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 "
    "--index-url https://download.pytorch.org/whl/cu118 --disable-pip-version-check"
)

Manual Installation

If you prefer to install PyTorch manually:

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
  --index-url https://download.pytorch.org/whl/cu118

Or for CUDA 12.1:

pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu121

Reinstall PyTorch

To force reinstallation of PyTorch (useful after GPU driver updates):

python launch.py --reinstall-torch

Verify CUDA

Check if CUDA is working:

import torch
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")

Expected output:

CUDA Available: True
CUDA Version: 11.8
Device Name: NVIDIA GeForce RTX 3080

Configure Modules for CUDA

In Settings → Module, set device to CUDA for:

Text Detector
OCR
Inpainter

AMD GPU Support

BallonTranslator supports AMD GPUs through two methods:

ZLUDA - Works with more AMD GPUs (RDNA 2/3/4), easier setup
Native ROCm - Official AMD solution, better performance, limited GPU support

Option 1: ZLUDA Setup (Recommended for Most Users)

What is ZLUDA?

ZLUDA is a CUDA compatibility layer that allows CUDA applications to run on AMD GPUs using ROCm.

Advantages

Works with RDNA 2, RDNA 3, and RDNA 4 GPUs
Easier setup than native ROCm
Faster than CPU processing

Disadvantages

Slower than native ROCm
First run requires 5-10 minutes of compilation
Requires driver warmup after updates

Requirements

AMD GPU: RX 6000 series (RDNA 2), RX 7000 series (RDNA 3), or RX 9000 series (RDNA 4)
AMD Adrenalin Driver 24.12.1 or newer
AMD HIP SDK (see version table below)
Windows 10 or 11

Version Compatibility

Windows Version	HIP SDK Version	ZLUDA Version
Windows 11	7.1.1	3.9.6
Windows 10/11	6.4.2	3.9.5
Windows 10/11	6.2.4	3.9.5
Windows 10/11	6.1.2	3.9.5

Installation Steps

1. Update GPU Driver Download and install the latest AMD Adrenalin driver (24.12.1 or newer):

AMD Driver Download

2. Install HIP SDK Download and install the appropriate HIP SDK version:

AMD HIP SDK Page

Recommended: HIP SDK 6.4.2 for best compatibility 3. Download ZLUDA Download ZLUDA from the releases page:

ZLUDA Releases

Download version matching your HIP SDK (e.g., 3.9.5 for HIP SDK 6.4.2) 4. Extract ZLUDA Extract ZLUDA to C:\zluda:

C:\zluda\
├── cublas.dll
├── cusparse.dll
├── nvrtc.dll
└── ... other files

5. Set Environment Variables Add to Windows System Environment Variables:

Open: Settings → System → About → Advanced system settings → Environment Variables
Under “System variables”, find Path
Click “Edit”
Add two new entries:
- C:\zluda
- %HIP_PATH%bin

6. Replace CUDA DLLs Copy files from C:\zluda to your desktop and rename:

cublas.dll    → cublas64_11.dll
cusparse.dll  → cusparse64_11.dll
nvrtc.dll     → nvrtc64_112_0.dll

Replace these files in:

BallonsTranslator\ballontrans_pylibs_win\Lib\site-packages\torch\lib\

7. Configure BallonTranslator

Launch BallonTranslator
Go to Settings → Module
Set Text Detector device to CUDA
Set OCR device to CUDA
Keep Inpainter on CPU (ZLUDA doesn’t support all inpainting operations)

8. First Run Compilation Run OCR or text detection for the first time:

ZLUDA will compile PTX files
This takes 5-10 minutes depending on CPU
Subsequent runs will be fast (compilation is cached)

ZLUDA Configuration

BallonTranslator automatically detects and configures ZLUDA. From utils/zluda_config.py:

def enable_zluda_config():
    if hasattr(torch, 'cuda') and torch.cuda.is_available():
        device_name = torch.cuda.get_device_name(0)
        
        if "[ZLUDA]" in device_name:
            # Disable cuDNN for ZLUDA compatibility
            torch.backends.cudnn.enabled = False
            
            # Configure CUDA backends
            if hasattr(torch.backends.cuda, 'enable_flash_sdp'):
                torch.backends.cuda.enable_flash_sdp(False)
            if hasattr(torch.backends.cuda, 'enable_math_sdp'):
                torch.backends.cuda.enable_math_sdp(True)
            if hasattr(torch.backends.cuda, 'enable_mem_efficient_sdp'):
                torch.backends.cuda.enable_mem_efficient_sdp(False)
            if hasattr(torch.backends.cuda, 'enable_cudnn_sdp'):
                torch.backends.cuda.enable_cudnn_sdp(False)

This configuration is automatically applied on startup (from launch.py:150-151):

from utils.zluda_config import enable_zluda_config
enable_zluda_config()

Option 2: Native ROCm for AMD (Advanced)

Warning

Native ROCm requires Python 3.12, HIP SDK 6.4+, and specific AMD GPUs. This is an advanced setup.

Advantages

Better performance than ZLUDA
Inpainting works with GPU acceleration
Official AMD support

Disadvantages

Limited GPU support (RDNA 3/4 only)
Requires Python 3.12
Complex setup (need to reinstall dependencies)
Windows only (as of 2026)

Requirements

AMD GPU: RX 7000 series (RDNA 3) or RX 9000 series (RDNA 4)
AMD Adrenalin Driver 2026.1.1 or newer
HIP SDK 6.4.x
Python 3.12

Supported GPUs

RDNA 3 (RX 7000 series):

RX 7900 XT/XTX
RX 7800 XT
RX 7700 XT
RX 7600
PRO W7900/W7800/W7700

RDNA 4 (RX 9000 series):

RX 9070
RX 9060

Installation Steps

1. Install Python 3.12 Download and install Python 3.12:

Python 3.12 Download

2. Uninstall Old Dependencies If you previously used Python 3.10/3.11:

rm -rf ballontrans_pylibs_win
pip uninstall torch torchvision torchaudio -y

3. Install AMD ROCm PyTorch Use the AMD-provided launcher script:

launch_win_amd_nightly.bat

This script automatically installs ROCm 6.4 PyTorch: From launch.py:366-370:

if amd_nightly_gpu == "RDNA3" or amd_nightly_gpu == "RDNA4":
    torch_command = os.environ.get('TORCH_COMMAND',
        "pip install "
        "https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torch-2.8.0a0%2Bgitfc14c65-cp312-cp312-win_amd64.whl "
        "https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchvision-0.24.0a0%2Bc85f008-cp312-cp312-win_amd64.whl "
        "https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchaudio-2.6.0a0%2B1a8f621-cp312-cp312-win_amd64.whl")

4. Configure All Modules for CUDA

Launch BallonTranslator
Go to Settings → Module
Set Text Detector to CUDA
Set OCR to CUDA
Set Inpainter to CUDA (works with native ROCm!)

5. Verify ROCm

import torch
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")
print(f"HIP Version: {torch.version.hip}")

ROCm 7 (Optional)

For ROCm 7, you need to:

Manually install ROCm 7 SDK libraries
Update PyTorch to ROCm 7 version:
```
pip install <rocm7-pytorch-wheel-url>
```

ROCm 7 support is experimental. ROCm 6.4 is recommended for stability.

Apple Silicon (macOS)

Apple Silicon Macs use Metal Performance Shaders (MPS) for GPU acceleration.

Requirements

Mac with Apple Silicon (M1, M2, M3, M4)
macOS 12.3 or later

Installation

PyTorch with MPS support is installed automatically:

python3 launch.py

Verify MPS

import torch
print(f"MPS Available: {torch.backends.mps.is_available()}")
print(f"MPS Built: {torch.backends.mps.is_built()}")

Configure for MPS

MPS is automatically used when available. No manual configuration needed.

Performance Optimization

Model Selection

Choose faster models for better GPU utilization:

Module	Fast Model	Accurate Model
Text Detector	`ctd`	`YSGDetector`
OCR	`mit32px`	`manga_ocr`
Inpainter	`lama_mpe`	`lama_large_512px`

Batch Size

For headless mode, process multiple images in parallel:

# Process 4 directories simultaneously (if you have enough VRAM)
python launch.py --headless --exec_dirs "/manga/ch1" &
python launch.py --headless --exec_dirs "/manga/ch2" &
python launch.py --headless --exec_dirs "/manga/ch3" &
python launch.py --headless --exec_dirs "/manga/ch4" &

Memory Management

Enable on-demand loading to reduce VRAM usage: In config/config.json:

{
  "module": {
    "load_model_on_demand": true
  }
}

Or use the flag:

python launch.py --headless --exec_dirs "/manga"

Headless mode automatically enables this (from launch.py:189-191).

Low VRAM Mode

For local LLM translators (like Sakura), enable low VRAM mode:

{
  "module": {
    "translator_params": {
      "Sakura-13B-Galgame": {
        "low_vram_mode": true
      }
    }
  }
}

Troubleshooting

CUDA Not Detected

Check PyTorch installation:

import torch
print(torch.__version__)
print(torch.version.cuda)

Reinstall PyTorch:

pip uninstall torch torchvision torchaudio
python launch.py --reinstall-torch

Update GPU drivers:

NVIDIA: NVIDIA Drivers
AMD: AMD Drivers

ZLUDA Compilation Hangs

Wait at least 10 minutes on first run
Check CPU usage (compilation is CPU-intensive)
Ensure you have at least 8GB RAM
Close other applications

Out of Memory Errors

Reduce model size:

Use lama_mpe instead of lama_large_512px
Use mit32px instead of mit48px

Enable on-demand loading:

"load_model_on_demand": true

Process one directory at a time:

python launch.py --headless --exec_dirs "/manga/ch1"

AMD GPU Not Recognized (ZLUDA)

Check HIP SDK installation:

echo %HIP_PATH%

Should output something like:

C:\Program Files\AMD\ROCm\6.4\

Verify ZLUDA DLLs: Check that renamed DLLs are in:

BallonsTranslator\ballontrans_pylibs_win\Lib\site-packages\torch\lib\

Check environment variables:

Open Command Prompt
Run: echo %PATH%
Verify C:\zluda and %HIP_PATH%bin are present

Inpainting Fails with ZLUDA

Inpainting doesn’t work well with ZLUDA. Use CPU for inpainting:

Settings → Module
Set Inpainter device to CPU
Keep Text Detector and OCR on CUDA

Benchmarks

Typical performance improvements with GPU acceleration:

Text Detection

CPU: ~2 seconds per page
CUDA (NVIDIA): ~0.5 seconds per page
ZLUDA (AMD): ~0.8 seconds per page
ROCm (AMD): ~0.6 seconds per page

OCR

CPU: ~1.5 seconds per page
CUDA (NVIDIA): ~0.4 seconds per page
ZLUDA (AMD): ~0.6 seconds per page
ROCm (AMD): ~0.5 seconds per page

Inpainting

CPU: ~5 seconds per page
CUDA (NVIDIA): ~0.8 seconds per page
ROCm (AMD): ~1.0 seconds per page
ZLUDA (AMD): Not recommended

Benchmarks vary based on GPU model, image resolution, and complexity.

Next Steps

Configure Settings for optimal performance
Use Headless Mode for batch processing
Create Custom Translators with GPU-accelerated LLMs

Get Started

Core Features

Modules

Advanced

​GPU Acceleration

​Overview

​NVIDIA CUDA Setup (Recommended)

​Requirements

​Automatic Installation

​Manual Installation

​Reinstall PyTorch

​Verify CUDA

​Configure Modules for CUDA

​AMD GPU Support

​Option 1: ZLUDA Setup (Recommended for Most Users)

​What is ZLUDA?

​Advantages

​Disadvantages

​Requirements

​Version Compatibility

​Installation Steps

​ZLUDA Configuration

​Option 2: Native ROCm for AMD (Advanced)

​Warning

​Advantages

​Disadvantages

​Requirements

​Supported GPUs

​Installation Steps

​ROCm 7 (Optional)

​Apple Silicon (macOS)

​Requirements

​Installation

​Verify MPS

​Configure for MPS

​Performance Optimization

​Model Selection

​Batch Size

​Memory Management

​Low VRAM Mode

​Troubleshooting

​CUDA Not Detected

​ZLUDA Compilation Hangs

​Out of Memory Errors

​AMD GPU Not Recognized (ZLUDA)

​Inpainting Fails with ZLUDA

​Benchmarks

​Text Detection

​OCR

​Inpainting

​Next Steps

GPU Acceleration

Overview

NVIDIA CUDA Setup (Recommended)

Requirements

Automatic Installation

Manual Installation

Reinstall PyTorch

Verify CUDA

Configure Modules for CUDA

AMD GPU Support

Option 1: ZLUDA Setup (Recommended for Most Users)

What is ZLUDA?

Advantages

Disadvantages

Requirements

Version Compatibility

Installation Steps

ZLUDA Configuration

Option 2: Native ROCm for AMD (Advanced)

Warning

Advantages

Disadvantages

Requirements

Supported GPUs

Installation Steps

ROCm 7 (Optional)

Apple Silicon (macOS)

Requirements

Installation

Verify MPS

Configure for MPS

Performance Optimization

Model Selection

Batch Size

Memory Management

Low VRAM Mode

Troubleshooting

CUDA Not Detected

ZLUDA Compilation Hangs

Out of Memory Errors

AMD GPU Not Recognized (ZLUDA)

Inpainting Fails with ZLUDA

Benchmarks

Text Detection

OCR

Inpainting

Next Steps