GPU Acceleration
BallonTranslator supports GPU acceleration for text detection, OCR, and inpainting modules. This guide covers setup for NVIDIA CUDA, AMD ROCm, and AMD ZLUDA.Overview
GPU acceleration can significantly speed up:- Text Detection: 3-5x faster than CPU
- OCR: 2-4x faster than CPU
- Inpainting: 5-10x faster than CPU
Translation speed depends on the translator service (online API or local LLM), not GPU acceleration.
NVIDIA CUDA Setup (Recommended)
Requirements
- NVIDIA GPU with CUDA support (GTX 900 series or newer)
- CUDA Compute Capability 3.5 or higher
- Windows, Linux, or macOS (Apple Silicon uses different method)
Automatic Installation
BallonTranslator automatically installs CUDA-enabled PyTorch on first run:- Detect your system (Windows/Linux)
- Install PyTorch with CUDA 11.8 support
- Install torchvision and torchaudio
launch.py:374-377:
Manual Installation
If you prefer to install PyTorch manually:Reinstall PyTorch
To force reinstallation of PyTorch (useful after GPU driver updates):Verify CUDA
Check if CUDA is working:Configure Modules for CUDA
In Settings → Module, set device to CUDA for:- Text Detector
- OCR
- Inpainter
AMD GPU Support
BallonTranslator supports AMD GPUs through two methods:- ZLUDA - Works with more AMD GPUs (RDNA 2/3/4), easier setup
- Native ROCm - Official AMD solution, better performance, limited GPU support
Option 1: ZLUDA Setup (Recommended for Most Users)
What is ZLUDA?
ZLUDA is a CUDA compatibility layer that allows CUDA applications to run on AMD GPUs using ROCm.Advantages
- Works with RDNA 2, RDNA 3, and RDNA 4 GPUs
- Easier setup than native ROCm
- Faster than CPU processing
Disadvantages
- Slower than native ROCm
- First run requires 5-10 minutes of compilation
- Requires driver warmup after updates
Requirements
- AMD GPU: RX 6000 series (RDNA 2), RX 7000 series (RDNA 3), or RX 9000 series (RDNA 4)
- AMD Adrenalin Driver 24.12.1 or newer
- AMD HIP SDK (see version table below)
- Windows 10 or 11
Version Compatibility
| Windows Version | HIP SDK Version | ZLUDA Version |
|---|---|---|
| Windows 11 | 7.1.1 | 3.9.6 |
| Windows 10/11 | 6.4.2 | 3.9.5 |
| Windows 10/11 | 6.2.4 | 3.9.5 |
| Windows 10/11 | 6.1.2 | 3.9.5 |
Installation Steps
1. Update GPU Driver Download and install the latest AMD Adrenalin driver (24.12.1 or newer): 2. Install HIP SDK Download and install the appropriate HIP SDK version: Recommended: HIP SDK 6.4.2 for best compatibility 3. Download ZLUDA Download ZLUDA from the releases page: Download version matching your HIP SDK (e.g., 3.9.5 for HIP SDK 6.4.2) 4. Extract ZLUDA Extract ZLUDA toC:\zluda:
- Open: Settings → System → About → Advanced system settings → Environment Variables
- Under “System variables”, find
Path - Click “Edit”
- Add two new entries:
C:\zluda%HIP_PATH%bin
C:\zluda to your desktop and rename:
- Launch BallonTranslator
- Go to Settings → Module
- Set Text Detector device to CUDA
- Set OCR device to CUDA
- Keep Inpainter on CPU (ZLUDA doesn’t support all inpainting operations)
- ZLUDA will compile PTX files
- This takes 5-10 minutes depending on CPU
- Subsequent runs will be fast (compilation is cached)
ZLUDA Configuration
BallonTranslator automatically detects and configures ZLUDA. Fromutils/zluda_config.py:
launch.py:150-151):
Option 2: Native ROCm for AMD (Advanced)
Warning
Advantages
- Better performance than ZLUDA
- Inpainting works with GPU acceleration
- Official AMD support
Disadvantages
- Limited GPU support (RDNA 3/4 only)
- Requires Python 3.12
- Complex setup (need to reinstall dependencies)
- Windows only (as of 2026)
Requirements
- AMD GPU: RX 7000 series (RDNA 3) or RX 9000 series (RDNA 4)
- AMD Adrenalin Driver 2026.1.1 or newer
- HIP SDK 6.4.x
- Python 3.12
Supported GPUs
RDNA 3 (RX 7000 series):- RX 7900 XT/XTX
- RX 7800 XT
- RX 7700 XT
- RX 7600
- PRO W7900/W7800/W7700
- RX 9070
- RX 9060
Installation Steps
1. Install Python 3.12 Download and install Python 3.12: 2. Uninstall Old Dependencies If you previously used Python 3.10/3.11:launch.py:366-370:
- Launch BallonTranslator
- Go to Settings → Module
- Set Text Detector to CUDA
- Set OCR to CUDA
- Set Inpainter to CUDA (works with native ROCm!)
ROCm 7 (Optional)
For ROCm 7, you need to:- Manually install ROCm 7 SDK libraries
- Update PyTorch to ROCm 7 version:
ROCm 7 support is experimental. ROCm 6.4 is recommended for stability.
Apple Silicon (macOS)
Apple Silicon Macs use Metal Performance Shaders (MPS) for GPU acceleration.Requirements
- Mac with Apple Silicon (M1, M2, M3, M4)
- macOS 12.3 or later
Installation
PyTorch with MPS support is installed automatically:Verify MPS
Configure for MPS
MPS is automatically used when available. No manual configuration needed.Performance Optimization
Model Selection
Choose faster models for better GPU utilization:| Module | Fast Model | Accurate Model |
|---|---|---|
| Text Detector | ctd | YSGDetector |
| OCR | mit32px | manga_ocr |
| Inpainter | lama_mpe | lama_large_512px |
Batch Size
For headless mode, process multiple images in parallel:Memory Management
Enable on-demand loading to reduce VRAM usage: Inconfig/config.json:
launch.py:189-191).
Low VRAM Mode
For local LLM translators (like Sakura), enable low VRAM mode:Troubleshooting
CUDA Not Detected
Check PyTorch installation:- NVIDIA: NVIDIA Drivers
- AMD: AMD Drivers
ZLUDA Compilation Hangs
- Wait at least 10 minutes on first run
- Check CPU usage (compilation is CPU-intensive)
- Ensure you have at least 8GB RAM
- Close other applications
Out of Memory Errors
Reduce model size:- Use
lama_mpeinstead oflama_large_512px - Use
mit32pxinstead ofmit48px
AMD GPU Not Recognized (ZLUDA)
Check HIP SDK installation:- Open Command Prompt
- Run:
echo %PATH% - Verify
C:\zludaand%HIP_PATH%binare present
Inpainting Fails with ZLUDA
Inpainting doesn’t work well with ZLUDA. Use CPU for inpainting:- Settings → Module
- Set Inpainter device to CPU
- Keep Text Detector and OCR on CUDA
Benchmarks
Typical performance improvements with GPU acceleration:Text Detection
- CPU: ~2 seconds per page
- CUDA (NVIDIA): ~0.5 seconds per page
- ZLUDA (AMD): ~0.8 seconds per page
- ROCm (AMD): ~0.6 seconds per page
OCR
- CPU: ~1.5 seconds per page
- CUDA (NVIDIA): ~0.4 seconds per page
- ZLUDA (AMD): ~0.6 seconds per page
- ROCm (AMD): ~0.5 seconds per page
Inpainting
- CPU: ~5 seconds per page
- CUDA (NVIDIA): ~0.8 seconds per page
- ROCm (AMD): ~1.0 seconds per page
- ZLUDA (AMD): Not recommended
Benchmarks vary based on GPU model, image resolution, and complexity.
Next Steps
- Configure Settings for optimal performance
- Use Headless Mode for batch processing
- Create Custom Translators with GPU-accelerated LLMs
