Skip to main content
The OCR module recognizes text within detected regions. BallonTranslator supports multiple OCR engines optimized for different languages and use cases.

Available OCR Engines

Overview

manga-ocr by kha-white. Optimized for Japanese manga text.
Does not extract text colors. Use MIT OCR if you need color information.

Configuration

from modules.ocr import MangaOCR

ocr = MangaOCR(device='cuda')
text_blocks = ocr.run_ocr(img, text_blocks)

Parameters

ParameterTypeDefaultDescription
deviceselectorautoProcessing device (cuda/cpu/mps)

Features

  • Japanese Focused: Excellent for Japanese manga
  • Transformer-based: Uses Vision Encoder-Decoder architecture
  • No Color Detection: Returns text only

Model Files

  • data/models/manga-ocr-base/ (multiple files from HuggingFace)

Overview

Multi-language OCR supporting 80+ languages. Highly configurable with multiple model versions.

Configuration

from modules.ocr import PaddleOCRModule

ocr = PaddleOCRModule(
    language='English',
    device='cuda',
    ocr_version='PP-OCRv4',
    use_angle_cls=False
)

Parameters

ParameterTypeDefaultDescription
languageselectorEnglishTarget language (80+ options)
deviceselectorautoProcessing device
use_angle_clscheckboxFalseEnable rotation detection
ocr_versionselectorPP-OCRv4Model version (v2-v4)
enable_mkldnncheckboxFalseCPU acceleration via MKL-DNN
det_limit_side_lenint960Max detection side length
rec_batch_numint6Recognition batch size
drop_scorefloat0.5Confidence threshold
text_caseselectorCapitalizeOutput text case
output_formatselectorAs RecognizedFormat output text

Supported Languages

Over 80 languages including:
  • Asian: Chinese (Simplified/Traditional), Japanese, Korean, Thai, Vietnamese
  • European: English, French, German, Spanish, Italian, Russian, etc.
  • Middle Eastern: Arabic, Persian, Urdu, Hebrew
  • Indic: Hindi, Tamil, Telugu, Bengali, etc.
See full list in language selector.

Text Case Options

  • Uppercase: ALL CAPS
  • Capitalize Sentences: First word capitalized
  • Lowercase: all lowercase

Output Format

  • Single Line: Concatenate all text to one line
  • As Recognized: Preserve detected line breaks

Features

  • Multi-language: 80+ languages
  • Version Selection: PP-OCRv2, v3, v4
  • Angle Classification: Detect and correct rotated text
  • Text Formatting: Built-in case conversion
  • CPU Acceleration: MKL-DNN support

Overview

PaddleOCR-VL-For-Manga - Vision-Language model for manga.
Does not extract text colors.

Features

  • Manga-Optimized: Trained specifically for manga text
  • Japanese Support: Excellent Japanese recognition
  • VL Architecture: Vision-Language model approach

Overview

Cloud-based OCR from Stariver Cloud.
Requires Stariver Cloud account. When using Stariver Detector, set OCR to none_ocr to use detector’s text extraction.

Setup

from modules.ocr import StariverOCR

ocr = StariverOCR(
    User='your_username',
    Password='your_password'
)

Features

  • Cloud-Based: No local model required
  • Combined with Detector: Best used with Stariver Detector
  • High Accuracy: Professional OCR service
For optimal performance with Stariver:
  1. Use Stariver Detector for text detection
  2. Set OCR to none_ocr
  3. Text is extracted directly from detector
  4. Saves API credits and processing time

Overview

Google Cloud Vision API for OCR. Requires API key.

Setup

ocr = GoogleVisionOCR(
    api_key='your_google_api_key'
)

Features

  • High Accuracy: Google’s production OCR
  • Multi-language: Excellent language support
  • Cloud-Based: Requires API key and internet

Overview

Microsoft Bing Lens OCR integration.

Features

  • Free: No API key required
  • Multi-language: Good language coverage
  • Cloud-Based: Requires internet connection

Overview

Use system-native OCR capabilities.

macOS OCR

from modules.ocr import MacOCR

ocr = MacOCR()

Windows OCR

from modules.ocr import WindowsOCR

ocr = WindowsOCR()

Features

  • Native: Uses OS-provided OCR
  • No Setup: Works out of the box
  • Fast: Optimized for the platform
  • Limited Languages: Depends on OS language packs

Overview

Experimental OCR using Large Language Models (GPT-4V, etc.).

Configuration

from modules.ocr import LLMOCR

ocr = LLMOCR(
    api_key='your_openai_key',
    model='gpt-4-vision-preview'
)

Features

  • Experimental: Uses vision-capable LLMs
  • Context-Aware: Can understand image context
  • Expensive: Higher API costs
  • Slower: Not optimized for speed

Overview

Universal OCR model supporting multiple languages.

Features

  • Multi-language: Broad language support
  • Single Model: One model for all languages

Overview

Skip OCR processing. Use when text is already extracted (e.g., from Stariver Detector).
from modules.ocr import register_OCR

@register_OCR('none_ocr')
class NoneOCR(OCRBase):
    def _ocr_blk_list(self, img, blk_list):
        pass  # Text already in blk.text

Usage

Set OCR to none_ocr when:
  • Using Stariver Detector (includes text extraction)
  • Processing pre-OCR’d data
  • Testing without OCR

Usage Example

import cv2
from modules.ocr import OCR
from modules.textdetector import TEXTDETECTORS

# Load image
img = cv2.imread('manga_page.jpg')

# Detect text regions
detector = TEXTDETECTORS.module_dict['ctd']()
mask, text_blocks = detector.detect(img)

# Run OCR
ocr_class = OCR.module_dict['mit48px']
ocr = ocr_class(device='cuda', chunk_size=16)
ocr.run_ocr(img, text_blocks)

# Access results
for blk in text_blocks:
    print(f"Detected text: {blk.text}")
    print(f"Foreground color: {blk.fg_color}")
    print(f"Background color: {blk.bg_color}")

Choosing an OCR Engine

Use CaseRecommended EngineNotes
Japanese mangaMIT 48px or Manga OCRBest accuracy for Japanese
English comicsMIT 48px or PaddleOCRGood general performance
Chinese mangaMIT 48px or PaddleOCRSupports both variants
Multi-languagePaddleOCR80+ languages
Need colorsMIT modelsOnly MIT extracts colors
Cloud-basedStariver or Google VisionHigh accuracy, costs credits
Offline/FreeMIT or PaddleOCRBest offline options

Performance Tips

  1. Chunk Size: Larger chunks (24-32) faster but use more VRAM
  2. Device: Always use CUDA if available
  3. Model Selection: MIT 48px offers best accuracy/speed balance
  4. Batch Processing: Process multiple pages in batches

Color Extraction

Only MIT OCR models extract text colors:
for blk in text_blocks:
    fg = blk.fg_color  # Foreground color (RGB)
    bg = blk.bg_color  # Background color (RGB)
    # Use colors for rendering translated text
Other OCR engines return text only without color information.

Advanced Configuration

PaddleOCR Custom Models

Models are stored in data/models/paddle-ocr/{lang}/{version}/:
  • det/: Detection model
  • rec/: Recognition model
  • cls/: Angle classification model (optional)

Custom OCR Integration

To add custom OCR engines:
from modules.ocr import OCRBase, register_OCR

@register_OCR('my_custom_ocr')
class CustomOCR(OCRBase):
    def _ocr_blk_list(self, img, blk_list):
        for blk in blk_list:
            x1, y1, x2, y2 = blk.xyxy
            region = img[y1:y2, x1:x2]
            blk.text = self.my_ocr_function(region)