Skip to main content

GPU Acceleration

BallonTranslator supports GPU acceleration for text detection, OCR, and inpainting modules. This guide covers setup for NVIDIA CUDA, AMD ROCm, and AMD ZLUDA.

Overview

GPU acceleration can significantly speed up:
  • Text Detection: 3-5x faster than CPU
  • OCR: 2-4x faster than CPU
  • Inpainting: 5-10x faster than CPU
Translation speed depends on the translator service (online API or local LLM), not GPU acceleration.

Requirements

  • NVIDIA GPU with CUDA support (GTX 900 series or newer)
  • CUDA Compute Capability 3.5 or higher
  • Windows, Linux, or macOS (Apple Silicon uses different method)

Automatic Installation

BallonTranslator automatically installs CUDA-enabled PyTorch on first run:
python launch.py
The installer will:
  1. Detect your system (Windows/Linux)
  2. Install PyTorch with CUDA 11.8 support
  3. Install torchvision and torchaudio
From launch.py:374-377:
torch_command = os.environ.get('TORCH_COMMAND', 
    "pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 "
    "--index-url https://download.pytorch.org/whl/cu118 --disable-pip-version-check"
)

Manual Installation

If you prefer to install PyTorch manually:
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
  --index-url https://download.pytorch.org/whl/cu118
Or for CUDA 12.1:
pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu121

Reinstall PyTorch

To force reinstallation of PyTorch (useful after GPU driver updates):
python launch.py --reinstall-torch

Verify CUDA

Check if CUDA is working:
import torch
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")
Expected output:
CUDA Available: True
CUDA Version: 11.8
Device Name: NVIDIA GeForce RTX 3080

Configure Modules for CUDA

In Settings → Module, set device to CUDA for:
  • Text Detector
  • OCR
  • Inpainter

AMD GPU Support

BallonTranslator supports AMD GPUs through two methods:
  1. ZLUDA - Works with more AMD GPUs (RDNA 2/3/4), easier setup
  2. Native ROCm - Official AMD solution, better performance, limited GPU support

What is ZLUDA?

ZLUDA is a CUDA compatibility layer that allows CUDA applications to run on AMD GPUs using ROCm.

Advantages

  • Works with RDNA 2, RDNA 3, and RDNA 4 GPUs
  • Easier setup than native ROCm
  • Faster than CPU processing

Disadvantages

  • Slower than native ROCm
  • First run requires 5-10 minutes of compilation
  • Requires driver warmup after updates

Requirements

  • AMD GPU: RX 6000 series (RDNA 2), RX 7000 series (RDNA 3), or RX 9000 series (RDNA 4)
  • AMD Adrenalin Driver 24.12.1 or newer
  • AMD HIP SDK (see version table below)
  • Windows 10 or 11

Version Compatibility

Windows VersionHIP SDK VersionZLUDA Version
Windows 117.1.13.9.6
Windows 10/116.4.23.9.5
Windows 10/116.2.43.9.5
Windows 10/116.1.23.9.5

Installation Steps

1. Update GPU Driver Download and install the latest AMD Adrenalin driver (24.12.1 or newer): 2. Install HIP SDK Download and install the appropriate HIP SDK version: Recommended: HIP SDK 6.4.2 for best compatibility 3. Download ZLUDA Download ZLUDA from the releases page: Download version matching your HIP SDK (e.g., 3.9.5 for HIP SDK 6.4.2) 4. Extract ZLUDA Extract ZLUDA to C:\zluda:
C:\zluda\
├── cublas.dll
├── cusparse.dll
├── nvrtc.dll
└── ... other files
5. Set Environment Variables Add to Windows System Environment Variables:
  1. Open: Settings → System → About → Advanced system settings → Environment Variables
  2. Under “System variables”, find Path
  3. Click “Edit”
  4. Add two new entries:
    • C:\zluda
    • %HIP_PATH%bin
6. Replace CUDA DLLs Copy files from C:\zluda to your desktop and rename:
cublas.dll    → cublas64_11.dll
cusparse.dll  → cusparse64_11.dll
nvrtc.dll     → nvrtc64_112_0.dll
Replace these files in:
BallonsTranslator\ballontrans_pylibs_win\Lib\site-packages\torch\lib\
7. Configure BallonTranslator
  1. Launch BallonTranslator
  2. Go to Settings → Module
  3. Set Text Detector device to CUDA
  4. Set OCR device to CUDA
  5. Keep Inpainter on CPU (ZLUDA doesn’t support all inpainting operations)
8. First Run Compilation Run OCR or text detection for the first time:
  • ZLUDA will compile PTX files
  • This takes 5-10 minutes depending on CPU
  • Subsequent runs will be fast (compilation is cached)

ZLUDA Configuration

BallonTranslator automatically detects and configures ZLUDA. From utils/zluda_config.py:
def enable_zluda_config():
    if hasattr(torch, 'cuda') and torch.cuda.is_available():
        device_name = torch.cuda.get_device_name(0)
        
        if "[ZLUDA]" in device_name:
            # Disable cuDNN for ZLUDA compatibility
            torch.backends.cudnn.enabled = False
            
            # Configure CUDA backends
            if hasattr(torch.backends.cuda, 'enable_flash_sdp'):
                torch.backends.cuda.enable_flash_sdp(False)
            if hasattr(torch.backends.cuda, 'enable_math_sdp'):
                torch.backends.cuda.enable_math_sdp(True)
            if hasattr(torch.backends.cuda, 'enable_mem_efficient_sdp'):
                torch.backends.cuda.enable_mem_efficient_sdp(False)
            if hasattr(torch.backends.cuda, 'enable_cudnn_sdp'):
                torch.backends.cuda.enable_cudnn_sdp(False)
This configuration is automatically applied on startup (from launch.py:150-151):
from utils.zluda_config import enable_zluda_config
enable_zluda_config()

Option 2: Native ROCm for AMD (Advanced)

Warning

Native ROCm requires Python 3.12, HIP SDK 6.4+, and specific AMD GPUs. This is an advanced setup.

Advantages

  • Better performance than ZLUDA
  • Inpainting works with GPU acceleration
  • Official AMD support

Disadvantages

  • Limited GPU support (RDNA 3/4 only)
  • Requires Python 3.12
  • Complex setup (need to reinstall dependencies)
  • Windows only (as of 2026)

Requirements

  • AMD GPU: RX 7000 series (RDNA 3) or RX 9000 series (RDNA 4)
  • AMD Adrenalin Driver 2026.1.1 or newer
  • HIP SDK 6.4.x
  • Python 3.12

Supported GPUs

RDNA 3 (RX 7000 series):
  • RX 7900 XT/XTX
  • RX 7800 XT
  • RX 7700 XT
  • RX 7600
  • PRO W7900/W7800/W7700
RDNA 4 (RX 9000 series):
  • RX 9070
  • RX 9060

Installation Steps

1. Install Python 3.12 Download and install Python 3.12: 2. Uninstall Old Dependencies If you previously used Python 3.10/3.11:
rm -rf ballontrans_pylibs_win
pip uninstall torch torchvision torchaudio -y
3. Install AMD ROCm PyTorch Use the AMD-provided launcher script:
launch_win_amd_nightly.bat
This script automatically installs ROCm 6.4 PyTorch: From launch.py:366-370:
if amd_nightly_gpu == "RDNA3" or amd_nightly_gpu == "RDNA4":
    torch_command = os.environ.get('TORCH_COMMAND',
        "pip install "
        "https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torch-2.8.0a0%2Bgitfc14c65-cp312-cp312-win_amd64.whl "
        "https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchvision-0.24.0a0%2Bc85f008-cp312-cp312-win_amd64.whl "
        "https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchaudio-2.6.0a0%2B1a8f621-cp312-cp312-win_amd64.whl")
4. Configure All Modules for CUDA
  1. Launch BallonTranslator
  2. Go to Settings → Module
  3. Set Text Detector to CUDA
  4. Set OCR to CUDA
  5. Set Inpainter to CUDA (works with native ROCm!)
5. Verify ROCm
import torch
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")
print(f"HIP Version: {torch.version.hip}")

ROCm 7 (Optional)

For ROCm 7, you need to:
  1. Manually install ROCm 7 SDK libraries
  2. Update PyTorch to ROCm 7 version:
    pip install <rocm7-pytorch-wheel-url>
    
ROCm 7 support is experimental. ROCm 6.4 is recommended for stability.

Apple Silicon (macOS)

Apple Silicon Macs use Metal Performance Shaders (MPS) for GPU acceleration.

Requirements

  • Mac with Apple Silicon (M1, M2, M3, M4)
  • macOS 12.3 or later

Installation

PyTorch with MPS support is installed automatically:
python3 launch.py

Verify MPS

import torch
print(f"MPS Available: {torch.backends.mps.is_available()}")
print(f"MPS Built: {torch.backends.mps.is_built()}")

Configure for MPS

MPS is automatically used when available. No manual configuration needed.

Performance Optimization

Model Selection

Choose faster models for better GPU utilization:
ModuleFast ModelAccurate Model
Text DetectorctdYSGDetector
OCRmit32pxmanga_ocr
Inpainterlama_mpelama_large_512px

Batch Size

For headless mode, process multiple images in parallel:
# Process 4 directories simultaneously (if you have enough VRAM)
python launch.py --headless --exec_dirs "/manga/ch1" &
python launch.py --headless --exec_dirs "/manga/ch2" &
python launch.py --headless --exec_dirs "/manga/ch3" &
python launch.py --headless --exec_dirs "/manga/ch4" &

Memory Management

Enable on-demand loading to reduce VRAM usage: In config/config.json:
{
  "module": {
    "load_model_on_demand": true
  }
}
Or use the flag:
python launch.py --headless --exec_dirs "/manga"
Headless mode automatically enables this (from launch.py:189-191).

Low VRAM Mode

For local LLM translators (like Sakura), enable low VRAM mode:
{
  "module": {
    "translator_params": {
      "Sakura-13B-Galgame": {
        "low_vram_mode": true
      }
    }
  }
}

Troubleshooting

CUDA Not Detected

Check PyTorch installation:
import torch
print(torch.__version__)
print(torch.version.cuda)
Reinstall PyTorch:
pip uninstall torch torchvision torchaudio
python launch.py --reinstall-torch
Update GPU drivers:

ZLUDA Compilation Hangs

  • Wait at least 10 minutes on first run
  • Check CPU usage (compilation is CPU-intensive)
  • Ensure you have at least 8GB RAM
  • Close other applications

Out of Memory Errors

Reduce model size:
  • Use lama_mpe instead of lama_large_512px
  • Use mit32px instead of mit48px
Enable on-demand loading:
"load_model_on_demand": true
Process one directory at a time:
python launch.py --headless --exec_dirs "/manga/ch1"

AMD GPU Not Recognized (ZLUDA)

Check HIP SDK installation:
echo %HIP_PATH%
Should output something like:
C:\Program Files\AMD\ROCm\6.4\
Verify ZLUDA DLLs: Check that renamed DLLs are in:
BallonsTranslator\ballontrans_pylibs_win\Lib\site-packages\torch\lib\
Check environment variables:
  • Open Command Prompt
  • Run: echo %PATH%
  • Verify C:\zluda and %HIP_PATH%bin are present

Inpainting Fails with ZLUDA

Inpainting doesn’t work well with ZLUDA. Use CPU for inpainting:
  1. Settings → Module
  2. Set Inpainter device to CPU
  3. Keep Text Detector and OCR on CUDA

Benchmarks

Typical performance improvements with GPU acceleration:

Text Detection

  • CPU: ~2 seconds per page
  • CUDA (NVIDIA): ~0.5 seconds per page
  • ZLUDA (AMD): ~0.8 seconds per page
  • ROCm (AMD): ~0.6 seconds per page

OCR

  • CPU: ~1.5 seconds per page
  • CUDA (NVIDIA): ~0.4 seconds per page
  • ZLUDA (AMD): ~0.6 seconds per page
  • ROCm (AMD): ~0.5 seconds per page

Inpainting

  • CPU: ~5 seconds per page
  • CUDA (NVIDIA): ~0.8 seconds per page
  • ROCm (AMD): ~1.0 seconds per page
  • ZLUDA (AMD): Not recommended
Benchmarks vary based on GPU model, image resolution, and complexity.

Next Steps