Skip to main content
The text detection module identifies text regions in manga and comic images. BallonTranslator supports multiple detection engines optimized for different languages and content types.

Available Detectors

Overview

The default text detector trained on manga and comic images. Supports both Japanese and English text detection.

Configuration

from modules.textdetector import ComicTextDetector

detector = ComicTextDetector(
    detect_size=1280,
    device='cuda',
    det_rearrange_max_batches=4
)

mask, text_blocks = detector.detect(img)

Parameters

ParameterTypeDefaultDescription
detect_sizeselector1280Detection resolution (896, 1024, 1152, 1280)
det_rearrange_max_batchesselector4Max batches for processing (1-32)
deviceselectorautoProcessing device (cuda/cpu/mps)
font size multiplierfloat1.0Adjust detected font sizes
font size maxint-1Maximum font size limit (-1 to disable)
font size minint-1Minimum font size limit (-1 to disable)
mask dilate sizeint2Mask dilation kernel size

Model Files

  • data/models/comictextdetector.pt (PyTorch - for CUDA)
  • data/models/comictextdetector.pt.onnx (ONNX - for CPU)

Features

  • Language Support: Japanese, Chinese, English
  • GPU Acceleration: CUDA support with automatic fallback to ONNX on CPU
  • Font Size Detection: Automatically estimates font sizes for detected text
  • Configurable Resolution: Higher detection sizes improve accuracy but require more VRAM

Overview

Community-trained YOLO-based detector by lhj5426. Better at filtering onomatopoeia in Japanese manga/CG.

Setup

Requires manual model download from YSGYoloDetector. Place model files in data/models/ directory.

Configuration

from modules.textdetector import YSGYoloDetector

detector = YSGYoloDetector(
    model_path='data/models/ysgyolo_1.2_OS1.0.pt',
    detect_size=1024,
    confidence_threshold=0.3,
    device='cuda'
)

mask, text_blocks = detector.detect(img)

Parameters

ParameterTypeDefaultDescription
model pathselectorysgyolo_1.2_OS1.0.ptYOLO model checkpoint path
merge text linescheckboxTrueMerge detected lines into blocks
confidence thresholdfloat0.3Detection confidence threshold
IoU thresholdfloat0.5NMS IoU threshold
font size multiplierfloat1.0Font size adjustment multiplier
font size maxint-1Maximum font size
font size minint-1Minimum font size
detect sizeint1024Input image size for detection
deviceselectorautoProcessing device
source text is verticalcheckboxTrueHandle vertical text
mask dilate sizeint2Mask dilation size

Label Filtering

Control which text types to detect:
  • balloon: Speech bubbles
  • qipao: Chinese-style speech bubbles
  • shuqing: Narrative text boxes
  • changfangtiao: Rectangular text boxes
  • hengxie: Horizontal slanted text
  • other: Other text types

Features

  • Onomatopoeia Filtering: Better at distinguishing text from sound effects
  • Oriented Bounding Boxes: Supports rotated text detection
  • Customizable Labels: Filter detection by text type
  • Multiple Model Support: Compatible with YOLOv5 and RT-DETR models

Overview

Cloud-based detector using Stariver Cloud (团子翻译器). Requires username and password.
Requires an active Stariver Cloud account. Credits are consumed per request.

Setup

  1. Create account at https://cloud.stariver.org.cn/
  2. Configure credentials in settings:
detector = StariverDetector(
    User='your_username',
    Password='your_password'
)

Configuration

ParameterTypeDefaultDescription
Userstring-Stariver username
Passwordstring-Stariver password (stored in plaintext)
expand_ratiofloat0.01Mask expansion ratio
refinecheckboxTrueRefine detection results
filtratecheckboxTrueFilter low-quality detections
disable_skip_areacheckboxTrueDisable skip area detection
detect_scaleint3Detection scale factor
merge_thresholdfloat2.0Text line merge threshold
low_accuracy_modecheckboxFalseUse lower resolution (faster)
force_expandcheckboxFalseForce mask expansion
font_size_offsetint0Font size adjustment offset
font_size_minint-1Minimum font size
font_size_maxint-1Maximum font size
font_size_multiplierfloat1.0Font size multiplier

Features

  • Cloud-Based: No local model required
  • High Accuracy: Professional OCR service
  • Automatic Login: Token management handled automatically
  • Combined Detection + OCR: Can extract text directly
Security Note: Password is stored in plaintext. Do not use on public computers.

Usage Notes

  • When using Stariver Detector, set OCR to none_ocr to use extracted text directly
  • Saves time and API request credits
  • Auto-detects language and text orientation
  • Returns foreground/background colors

Usage Example

import cv2
from modules.textdetector import TEXTDETECTORS

# Load image
img = cv2.imread('manga_page.jpg')

# Initialize detector
detector_class = TEXTDETECTORS.module_dict['ctd']
detector = detector_class()

# Detect text regions
mask, text_blocks = detector.detect(img)

# Process results
for blk in text_blocks:
    print(f"Text block at {blk.xyxy}")
    print(f"Font size: {blk.font_size}")
    print(f"Vertical: {blk.vertical}")

Choosing a Detector

  • ComicTextDetector (ctd): Best all-around choice for manga and comics
  • YSGYoloDetector: Better for Japanese manga with lots of sound effects
  • Stariver: Best accuracy but requires cloud service account

Performance Tips

  1. Resolution: Higher detect_size improves accuracy but uses more VRAM
  2. Batching: Adjust det_rearrange_max_batches based on available memory
  3. Device Selection: Use CUDA for faster processing on NVIDIA GPUs
  4. Mask Dilation: Increase for better coverage of text regions

Font Detection

Overview

Optional font detection module from YuzuMarker.FontDetection.

Setup

  1. Download model files:
    • data/models/YuzuMarker.FontDetection/font_dataset
    • data/models/YuzuMarker.FontDetection/name=4x-epoch=18-step=368676.ckpt
    • data/font_demo_cache.bin
  2. Enable in OCR settings

Features

  • Detects font names with confidence > 60%
  • Stores results in JSON _detected_font_name field
  • Useful for exporting to PS/ID for manual typesetting

Training Custom Detectors

For training custom text detection models, see comic-text-detector.