Available Detectors
ComicTextDetector (ctd)
ComicTextDetector (ctd)
Overview
The default text detector trained on manga and comic images. Supports both Japanese and English text detection.Configuration
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
detect_size | selector | 1280 | Detection resolution (896, 1024, 1152, 1280) |
det_rearrange_max_batches | selector | 4 | Max batches for processing (1-32) |
device | selector | auto | Processing device (cuda/cpu/mps) |
font size multiplier | float | 1.0 | Adjust detected font sizes |
font size max | int | -1 | Maximum font size limit (-1 to disable) |
font size min | int | -1 | Minimum font size limit (-1 to disable) |
mask dilate size | int | 2 | Mask dilation kernel size |
Model Files
data/models/comictextdetector.pt(PyTorch - for CUDA)data/models/comictextdetector.pt.onnx(ONNX - for CPU)
Features
- Language Support: Japanese, Chinese, English
- GPU Acceleration: CUDA support with automatic fallback to ONNX on CPU
- Font Size Detection: Automatically estimates font sizes for detected text
- Configurable Resolution: Higher detection sizes improve accuracy but require more VRAM
YSGYoloDetector (ysgyolo)
YSGYoloDetector (ysgyolo)
Overview
Community-trained YOLO-based detector by lhj5426. Better at filtering onomatopoeia in Japanese manga/CG.Setup
Configuration
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model path | selector | ysgyolo_1.2_OS1.0.pt | YOLO model checkpoint path |
merge text lines | checkbox | True | Merge detected lines into blocks |
confidence threshold | float | 0.3 | Detection confidence threshold |
IoU threshold | float | 0.5 | NMS IoU threshold |
font size multiplier | float | 1.0 | Font size adjustment multiplier |
font size max | int | -1 | Maximum font size |
font size min | int | -1 | Minimum font size |
detect size | int | 1024 | Input image size for detection |
device | selector | auto | Processing device |
source text is vertical | checkbox | True | Handle vertical text |
mask dilate size | int | 2 | Mask dilation size |
Label Filtering
Control which text types to detect:- balloon: Speech bubbles
- qipao: Chinese-style speech bubbles
- shuqing: Narrative text boxes
- changfangtiao: Rectangular text boxes
- hengxie: Horizontal slanted text
- other: Other text types
Features
- Onomatopoeia Filtering: Better at distinguishing text from sound effects
- Oriented Bounding Boxes: Supports rotated text detection
- Customizable Labels: Filter detection by text type
- Multiple Model Support: Compatible with YOLOv5 and RT-DETR models
Stariver OCR Detector (stariver_ocr)
Stariver OCR Detector (stariver_ocr)
Overview
Cloud-based detector using Stariver Cloud (团子翻译器). Requires username and password.Setup
- Create account at https://cloud.stariver.org.cn/
- Configure credentials in settings:
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
User | string | - | Stariver username |
Password | string | - | Stariver password (stored in plaintext) |
expand_ratio | float | 0.01 | Mask expansion ratio |
refine | checkbox | True | Refine detection results |
filtrate | checkbox | True | Filter low-quality detections |
disable_skip_area | checkbox | True | Disable skip area detection |
detect_scale | int | 3 | Detection scale factor |
merge_threshold | float | 2.0 | Text line merge threshold |
low_accuracy_mode | checkbox | False | Use lower resolution (faster) |
force_expand | checkbox | False | Force mask expansion |
font_size_offset | int | 0 | Font size adjustment offset |
font_size_min | int | -1 | Minimum font size |
font_size_max | int | -1 | Maximum font size |
font_size_multiplier | float | 1.0 | Font size multiplier |
Features
- Cloud-Based: No local model required
- High Accuracy: Professional OCR service
- Automatic Login: Token management handled automatically
- Combined Detection + OCR: Can extract text directly
Usage Notes
- When using Stariver Detector, set OCR to
none_ocrto use extracted text directly - Saves time and API request credits
- Auto-detects language and text orientation
- Returns foreground/background colors
Usage Example
Choosing a Detector
- ComicTextDetector (ctd): Best all-around choice for manga and comics
- YSGYoloDetector: Better for Japanese manga with lots of sound effects
- Stariver: Best accuracy but requires cloud service account
Performance Tips
- Resolution: Higher
detect_sizeimproves accuracy but uses more VRAM - Batching: Adjust
det_rearrange_max_batchesbased on available memory - Device Selection: Use CUDA for faster processing on NVIDIA GPUs
- Mask Dilation: Increase for better coverage of text regions
Font Detection
YuzuMarker Font Detection (Optional)
YuzuMarker Font Detection (Optional)
Overview
Optional font detection module from YuzuMarker.FontDetection.Setup
-
Download model files:
data/models/YuzuMarker.FontDetection/font_datasetdata/models/YuzuMarker.FontDetection/name=4x-epoch=18-step=368676.ckptdata/font_demo_cache.bin
- Enable in OCR settings
Features
- Detects font names with confidence > 60%
- Stores results in JSON
_detected_font_namefield - Useful for exporting to PS/ID for manual typesetting
