OCR (Optical Character Recognition)

The OCR module recognizes text within detected regions. BallonTranslator supports multiple OCR engines optimized for different languages and use cases.

Available OCR Engines

MIT OCR Models (Recommended)

Overview

Manga-image-translator OCR models. Supports Japanese, English, and Chinese with automatic color extraction.

Available Variants

MIT 32px OCR

from modules.ocr import OCRMIT32px

ocr = OCRMIT32px(
    chunk_size=16,
    device='cuda'
)

text_blocks = ocr.run_ocr(img, text_blocks)

MIT 48px OCR (Better Accuracy)

from modules.ocr import OCRMIT48px

ocr = OCRMIT48px(
    chunk_size=16,
    device='cuda'
)

MIT 48px CTC OCR

from modules.ocr import OCRMIT48pxCTC

ocr = OCRMIT48pxCTC(
    chunk_size=16,
    device='cuda'
)

Parameters

Parameter	Type	Default	Description
`chunk_size`	selector	16	Batch size for processing (8, 16, 24, 32)
`device`	selector	auto	Processing device (cuda/cpu/mps)

Features

Multi-language: Japanese, English, Chinese
Color Extraction: Automatically extracts foreground/background colors
GPU Accelerated: Fast processing with CUDA support
Line-based: Processes text line by line

Model Files

MIT 32px: data/models/mit32px_ocr.ckpt
MIT 48px: data/models/ocr_ar_48px.ckpt + data/alphabet-all-v7.txt
MIT 48px CTC: data/models/mit48pxctc_ocr.ckpt + data/alphabet-all-v5.txt

Manga OCR

Overview

manga-ocr by kha-white. Optimized for Japanese manga text.

Does not extract text colors. Use MIT OCR if you need color information.

Configuration

from modules.ocr import MangaOCR

ocr = MangaOCR(device='cuda')
text_blocks = ocr.run_ocr(img, text_blocks)

Parameters

Parameter	Type	Default	Description
`device`	selector	auto	Processing device (cuda/cpu/mps)

Features

Japanese Focused: Excellent for Japanese manga
Transformer-based: Uses Vision Encoder-Decoder architecture
No Color Detection: Returns text only

Model Files

data/models/manga-ocr-base/ (multiple files from HuggingFace)

PaddleOCR

Overview

Multi-language OCR supporting 80+ languages. Highly configurable with multiple model versions.

Configuration

from modules.ocr import PaddleOCRModule

ocr = PaddleOCRModule(
    language='English',
    device='cuda',
    ocr_version='PP-OCRv4',
    use_angle_cls=False
)

Parameters

Parameter	Type	Default	Description
`language`	selector	English	Target language (80+ options)
`device`	selector	auto	Processing device
`use_angle_cls`	checkbox	False	Enable rotation detection
`ocr_version`	selector	PP-OCRv4	Model version (v2-v4)
`enable_mkldnn`	checkbox	False	CPU acceleration via MKL-DNN
`det_limit_side_len`	int	960	Max detection side length
`rec_batch_num`	int	6	Recognition batch size
`drop_score`	float	0.5	Confidence threshold
`text_case`	selector	Capitalize	Output text case
`output_format`	selector	As Recognized	Format output text

Supported Languages

Over 80 languages including:

Asian: Chinese (Simplified/Traditional), Japanese, Korean, Thai, Vietnamese
European: English, French, German, Spanish, Italian, Russian, etc.
Middle Eastern: Arabic, Persian, Urdu, Hebrew
Indic: Hindi, Tamil, Telugu, Bengali, etc.

See full list in language selector.

Text Case Options

Uppercase: ALL CAPS
Capitalize Sentences: First word capitalized
Lowercase: all lowercase

Output Format

Single Line: Concatenate all text to one line
As Recognized: Preserve detected line breaks

Features

Multi-language: 80+ languages
Version Selection: PP-OCRv2, v3, v4
Angle Classification: Detect and correct rotated text
Text Formatting: Built-in case conversion
CPU Acceleration: MKL-DNN support

PaddleOCR VL Manga

Overview

PaddleOCR-VL-For-Manga - Vision-Language model for manga.

Does not extract text colors.

Features

Manga-Optimized: Trained specifically for manga text
Japanese Support: Excellent Japanese recognition
VL Architecture: Vision-Language model approach

Stariver OCR

Overview

Cloud-based OCR from Stariver Cloud.

Requires Stariver Cloud account. When using Stariver Detector, set OCR to none_ocr to use detector’s text extraction.

Setup

from modules.ocr import StariverOCR

ocr = StariverOCR(
    User='your_username',
    Password='your_password'
)

Features

Cloud-Based: No local model required
Combined with Detector: Best used with Stariver Detector
High Accuracy: Professional OCR service

Recommended Usage

For optimal performance with Stariver:

Use Stariver Detector for text detection
Set OCR to none_ocr
Text is extracted directly from detector
Saves API credits and processing time

Google Vision OCR

Overview

Google Cloud Vision API for OCR. Requires API key.

Setup

ocr = GoogleVisionOCR(
    api_key='your_google_api_key'
)

Features

High Accuracy: Google’s production OCR
Multi-language: Excellent language support
Cloud-Based: Requires API key and internet

Bing Lens OCR

Overview

Microsoft Bing Lens OCR integration.

Features

Free: No API key required
Multi-language: Good language coverage
Cloud-Based: Requires internet connection

macOS/Windows Built-in OCR

Overview

Use system-native OCR capabilities.

macOS OCR

from modules.ocr import MacOCR

ocr = MacOCR()

Windows OCR

from modules.ocr import WindowsOCR

ocr = WindowsOCR()

Features

Native: Uses OS-provided OCR
No Setup: Works out of the box
Fast: Optimized for the platform
Limited Languages: Depends on OS language packs

LLM-based OCR

Overview

Experimental OCR using Large Language Models (GPT-4V, etc.).

Configuration

from modules.ocr import LLMOCR

ocr = LLMOCR(
    api_key='your_openai_key',
    model='gpt-4-vision-preview'
)

Features

Experimental: Uses vision-capable LLMs
Context-Aware: Can understand image context
Expensive: Higher API costs
Slower: Not optimized for speed

OneOCR

Overview

Universal OCR model supporting multiple languages.

Features

Multi-language: Broad language support
Single Model: One model for all languages

None OCR

Overview

Skip OCR processing. Use when text is already extracted (e.g., from Stariver Detector).

from modules.ocr import register_OCR

@register_OCR('none_ocr')
class NoneOCR(OCRBase):
    def _ocr_blk_list(self, img, blk_list):
        pass  # Text already in blk.text

Usage

Set OCR to none_ocr when:

Using Stariver Detector (includes text extraction)
Processing pre-OCR’d data
Testing without OCR

Usage Example

import cv2
from modules.ocr import OCR
from modules.textdetector import TEXTDETECTORS

# Load image
img = cv2.imread('manga_page.jpg')

# Detect text regions
detector = TEXTDETECTORS.module_dict['ctd']()
mask, text_blocks = detector.detect(img)

# Run OCR
ocr_class = OCR.module_dict['mit48px']
ocr = ocr_class(device='cuda', chunk_size=16)
ocr.run_ocr(img, text_blocks)

# Access results
for blk in text_blocks:
    print(f"Detected text: {blk.text}")
    print(f"Foreground color: {blk.fg_color}")
    print(f"Background color: {blk.bg_color}")

Choosing an OCR Engine

Use Case	Recommended Engine	Notes
Japanese manga	MIT 48px or Manga OCR	Best accuracy for Japanese
English comics	MIT 48px or PaddleOCR	Good general performance
Chinese manga	MIT 48px or PaddleOCR	Supports both variants
Multi-language	PaddleOCR	80+ languages
Need colors	MIT models	Only MIT extracts colors
Cloud-based	Stariver or Google Vision	High accuracy, costs credits
Offline/Free	MIT or PaddleOCR	Best offline options

Performance Tips

Chunk Size: Larger chunks (24-32) faster but use more VRAM
Device: Always use CUDA if available
Model Selection: MIT 48px offers best accuracy/speed balance
Batch Processing: Process multiple pages in batches

Color Extraction

Only MIT OCR models extract text colors:

for blk in text_blocks:
    fg = blk.fg_color  # Foreground color (RGB)
    bg = blk.bg_color  # Background color (RGB)
    # Use colors for rendering translated text

Other OCR engines return text only without color information.

Advanced Configuration

PaddleOCR Custom Models

Models are stored in data/models/paddle-ocr/{lang}/{version}/:

det/: Detection model
rec/: Recognition model
cls/: Angle classification model (optional)

Custom OCR Integration

To add custom OCR engines:

from modules.ocr import OCRBase, register_OCR

@register_OCR('my_custom_ocr')
class CustomOCR(OCRBase):
    def _ocr_blk_list(self, img, blk_list):
        for blk in blk_list:
            x1, y1, x2, y2 = blk.xyxy
            region = img[y1:y2, x1:x2]
            blk.text = self.my_ocr_function(region)

Get Started

Core Features

Modules

Advanced

​Available OCR Engines

​Overview

​Available Variants

​MIT 32px OCR

​MIT 48px OCR (Better Accuracy)

​MIT 48px CTC OCR

​Parameters

​Features

​Model Files

​Overview

​Configuration

​Parameters

​Features

​Model Files

​Overview

​Configuration

​Parameters

​Supported Languages

​Text Case Options

​Output Format

​Features

​Overview

​Features

​Overview

​Setup

​Features

​Recommended Usage

​Overview

​Setup

​Features

​Overview

​Features

​Overview

​macOS OCR

​Windows OCR

​Features

​Overview

​Configuration

​Features

​Overview

​Features

​Overview

​Usage

​Usage Example

​Choosing an OCR Engine

​Performance Tips

​Color Extraction

​Advanced Configuration

​PaddleOCR Custom Models

​Custom OCR Integration

Available OCR Engines

Overview

Available Variants

MIT 32px OCR

MIT 48px OCR (Better Accuracy)

MIT 48px CTC OCR

Parameters

Features

Model Files

Overview

Configuration

Parameters

Features

Model Files

Overview

Configuration

Parameters

Supported Languages

Text Case Options

Output Format

Features

Overview

Features

Overview

Setup

Features

Recommended Usage

Overview

Setup

Features

Overview

Features

Overview

macOS OCR

Windows OCR

Features

Overview

Configuration

Features

Overview

Features

Overview

Usage

Usage Example

Choosing an OCR Engine

Performance Tips

Color Extraction

Advanced Configuration

PaddleOCR Custom Models

Custom OCR Integration