Journal article
SynJAC: synthetic-data-driven joint-granular adaptation and calibration for domain specific scanned document key information extraction
Y Ding, SC Han, Z Li, H Chung
Information Fusion | Elsevier BV | Published : 2026
Abstract
Visually Rich Documents (VRDs), comprising elements such as charts, tables, and paragraphs, convey complex information across diverse domains. However, extracting key information from these documents remains labour-intensive, particularly for scanned formats with inconsistent layouts and domain-specific requirements. Despite advances in pretrained models for VRD understanding, their dependence on large annotated datasets for fine-tuning hinders scalability. This paper proposes SynJAC (Synthetic-data-driven Joint-granular Adaptation and Calibration), a method for key information extraction in scanned documents. SynJAC leverages synthetic, machine-generated data for domain adaptation and emplo..
View full abstractGrants
Awarded by Ministry of Trade, Industry and Energy