Circuit diagram:
- Bethlehem Steel Dataset (in collaboration with Lehigh University)
- CircuitGraphHandDrawn: handwritten circuit diagram (paper)
Floor plan:
- BRIDGE: Building Plan Repository for Image Description Generation, and Evaluation (paper)
- CVC-FP (Database for structural floor plan analysis)
- FPLAN-POLY dataset of vectorized graphic documents (floorplans)
- SESYD: Systems Evaluation SYnthetic Documents11 types of synthetic documents (paper)
- R-FP-500: Floor plan from Rakuten Real Estate and pixel-wise wall label (by Rakuten Institute of Technology)
Maps/cadastral:
- Map Border Dataset: dataset for Detection and Segmentation tasks in Historical Cadastral Maps
Music Scores:
- List of Music Scores datasets
- ICDAR/GREC competitions on music scores (CVC-MUSCIMA)
Comic book:
- BCBID: Bangla Comic Book Image Dataset contains a total of 3327 images of different kinds of ‘Bengali Comic Books’ from a diverse set of renowned authors (published at ICDAR 2019).
- CDVSR: Comics Dataset for Visual Sentiment Recognition, 10,281 images of comic and manga.
- ComSet: 54K strips, harvested from 13 popular comics available online.
- COMICORDA: A Novel Dataset for Dialogue Act Recognition in Comics, an extension of the EmoRecCom dataset.
- COMICS: 1.2 million panels paired with automatic textbox transcriptions from Golden Age collection of the Digital Comics Museum (published at CVPR 2017). New OCRed text 2024 here.
- Comics Datasets Framework: Mix of Comics datasets for detection benchmarking (ICDAR 2024)
- COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
- DCM772: 772 annotated images from 27 Golden Age collection of the Digital Comics Museum. It includes ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like (published at MDPI Journal Imaging 2018).
- eBDtheque: a representative database of comics of 100 pages including manual annotations of 850 panels and 1092 balloons paired with 1620 comic characters and 4693 text lines. (published at ICDAR 2013).
- EmoRecCom: ICDAR2021 Competition Multimodal Emotion Recognition on Comics scenes (codalab) (published at ICDAR 2021).
- FGC 2019: ICDAR 2019 Competition on Fine-Grained Classification of Comic Characters
- GNC: the Graphic Narrative Corpus currently contains textual metadata of about 219 titles written in English. Corresponding image are not provided due to copyright issue (published at ICDAR 2017).
- iCartoonFace: a large-scale challenging dataset established for cartoon face recognition. 389,678 images of 5,013 cartoon persons collected from 1,302 cartoon albums (published at ICM 2020).
- IMCDB: Indian Mythological Comic Dataset – digitized Indian comic storybook in the English language (published at ICDAR 2021).
- KABOOM ONOMATOPEA: Comic Onomatopoeia Dataset for Extracting Arbitrary or Truncated Texts
- Manga 109: 109 manga volumes from “Manga Library Z” drawn by professional manga artists in Japan (published in Multimedia Tools and Applications Journal 2017).
- OpenMantra: evaluation dataset of 5 manga titles (JA/EN/ZH text+images) for machine translation, presented in AAAI 2021.
- SSGCI 2016 ICPR 2016 Competition on Subgraph Spotting in Graph representation of Comic Book Images
- VLRC: Visual Language Research Corpus made up of ~36,000 coded panels from 300+ comics from Europe, Asia, and the United States, across time periods (1940-present), and various genres.