Text Detection

The text detection module identifies text regions in manga and comic images. BallonTranslator supports multiple detection engines optimized for different languages and content types.

Available Detectors

ComicTextDetector (ctd)

Overview

The default text detector trained on manga and comic images. Supports both Japanese and English text detection.

Configuration

from modules.textdetector import ComicTextDetector

detector = ComicTextDetector(
    detect_size=1280,
    device='cuda',
    det_rearrange_max_batches=4
)

mask, text_blocks = detector.detect(img)

Parameters

Parameter	Type	Default	Description
`detect_size`	selector	1280	Detection resolution (896, 1024, 1152, 1280)
`det_rearrange_max_batches`	selector	4	Max batches for processing (1-32)
`device`	selector	auto	Processing device (cuda/cpu/mps)
`font size multiplier`	float	1.0	Adjust detected font sizes
`font size max`	int	-1	Maximum font size limit (-1 to disable)
`font size min`	int	-1	Minimum font size limit (-1 to disable)
`mask dilate size`	int	2	Mask dilation kernel size

Model Files

data/models/comictextdetector.pt (PyTorch - for CUDA)
data/models/comictextdetector.pt.onnx (ONNX - for CPU)

Features

Language Support: Japanese, Chinese, English
GPU Acceleration: CUDA support with automatic fallback to ONNX on CPU
Font Size Detection: Automatically estimates font sizes for detected text
Configurable Resolution: Higher detection sizes improve accuracy but require more VRAM

YSGYoloDetector (ysgyolo)

Overview

Community-trained YOLO-based detector by lhj5426. Better at filtering onomatopoeia in Japanese manga/CG.

Setup

Requires manual model download from YSGYoloDetector. Place model files in data/models/ directory.

Configuration

from modules.textdetector import YSGYoloDetector

detector = YSGYoloDetector(
    model_path='data/models/ysgyolo_1.2_OS1.0.pt',
    detect_size=1024,
    confidence_threshold=0.3,
    device='cuda'
)

mask, text_blocks = detector.detect(img)

Parameters

Parameter	Type	Default	Description
`model path`	selector	ysgyolo_1.2_OS1.0.pt	YOLO model checkpoint path
`merge text lines`	checkbox	True	Merge detected lines into blocks
`confidence threshold`	float	0.3	Detection confidence threshold
`IoU threshold`	float	0.5	NMS IoU threshold
`font size multiplier`	float	1.0	Font size adjustment multiplier
`font size max`	int	-1	Maximum font size
`font size min`	int	-1	Minimum font size
`detect size`	int	1024	Input image size for detection
`device`	selector	auto	Processing device
`source text is vertical`	checkbox	True	Handle vertical text
`mask dilate size`	int	2	Mask dilation size

Label Filtering

Control which text types to detect:

balloon: Speech bubbles
qipao: Chinese-style speech bubbles
shuqing: Narrative text boxes
changfangtiao: Rectangular text boxes
hengxie: Horizontal slanted text
other: Other text types

Features

Onomatopoeia Filtering: Better at distinguishing text from sound effects
Oriented Bounding Boxes: Supports rotated text detection
Customizable Labels: Filter detection by text type
Multiple Model Support: Compatible with YOLOv5 and RT-DETR models

Stariver OCR Detector (stariver_ocr)

Overview

Cloud-based detector using Stariver Cloud (团子翻译器). Requires username and password.

Requires an active Stariver Cloud account. Credits are consumed per request.

Setup

Create account at https://cloud.stariver.org.cn/
Configure credentials in settings:

detector = StariverDetector(
    User='your_username',
    Password='your_password'
)

Configuration

Parameter	Type	Default	Description
`User`	string	-	Stariver username
`Password`	string	-	Stariver password (stored in plaintext)
`expand_ratio`	float	0.01	Mask expansion ratio
`refine`	checkbox	True	Refine detection results
`filtrate`	checkbox	True	Filter low-quality detections
`disable_skip_area`	checkbox	True	Disable skip area detection
`detect_scale`	int	3	Detection scale factor
`merge_threshold`	float	2.0	Text line merge threshold
`low_accuracy_mode`	checkbox	False	Use lower resolution (faster)
`force_expand`	checkbox	False	Force mask expansion
`font_size_offset`	int	0	Font size adjustment offset
`font_size_min`	int	-1	Minimum font size
`font_size_max`	int	-1	Maximum font size
`font_size_multiplier`	float	1.0	Font size multiplier

Features

Cloud-Based: No local model required
High Accuracy: Professional OCR service
Automatic Login: Token management handled automatically
Combined Detection + OCR: Can extract text directly

Security Note: Password is stored in plaintext. Do not use on public computers.

Usage Notes

When using Stariver Detector, set OCR to none_ocr to use extracted text directly
Saves time and API request credits
Auto-detects language and text orientation
Returns foreground/background colors

Usage Example

import cv2
from modules.textdetector import TEXTDETECTORS

# Load image
img = cv2.imread('manga_page.jpg')

# Initialize detector
detector_class = TEXTDETECTORS.module_dict['ctd']
detector = detector_class()

# Detect text regions
mask, text_blocks = detector.detect(img)

# Process results
for blk in text_blocks:
    print(f"Text block at {blk.xyxy}")
    print(f"Font size: {blk.font_size}")
    print(f"Vertical: {blk.vertical}")

Choosing a Detector

ComicTextDetector (ctd): Best all-around choice for manga and comics
YSGYoloDetector: Better for Japanese manga with lots of sound effects
Stariver: Best accuracy but requires cloud service account

Performance Tips

Resolution: Higher detect_size improves accuracy but uses more VRAM
Batching: Adjust det_rearrange_max_batches based on available memory
Device Selection: Use CUDA for faster processing on NVIDIA GPUs
Mask Dilation: Increase for better coverage of text regions

Font Detection

YuzuMarker Font Detection (Optional)

Overview

Optional font detection module from YuzuMarker.FontDetection.

Setup

Download model files:
- data/models/YuzuMarker.FontDetection/font_dataset
- data/models/YuzuMarker.FontDetection/name=4x-epoch=18-step=368676.ckpt
- data/font_demo_cache.bin
Enable in OCR settings

Features

Detects font names with confidence > 60%
Stores results in JSON _detected_font_name field
Useful for exporting to PS/ID for manual typesetting

Training Custom Detectors

For training custom text detection models, see comic-text-detector.

Get Started

Core Features

Modules

Advanced

Available Detectors

Overview

Configuration

Parameters

Model Files

Features

Overview

Setup

Configuration

Parameters

Label Filtering

Features

Overview

Setup

Configuration

Features

Usage Notes

Usage Example

Choosing a Detector

Performance Tips

Font Detection

Overview

Setup

Features

Training Custom Detectors

Get Started

Core Features

Modules

Advanced

​Available Detectors

​Overview

​Configuration

​Parameters

​Model Files

​Features

​Overview

​Setup

​Configuration

​Parameters

​Label Filtering

​Features

​Overview

​Setup

​Configuration

​Features

​Usage Notes

​Usage Example

​Choosing a Detector

​Performance Tips

​Font Detection

​Overview

​Setup

​Features

​Training Custom Detectors

Available Detectors

Overview

Configuration

Parameters

Model Files

Features

Overview

Setup

Configuration

Parameters

Label Filtering

Features

Overview

Setup

Configuration

Features

Usage Notes

Usage Example

Choosing a Detector

Performance Tips

Font Detection

Overview

Setup

Features

Training Custom Detectors