Engineering drawings datasets:
- Bethlehem Steel Dataset (in collaboration with Lehigh University)
- BRIDGE (by Shreya Goyal, Chiranjoy Chattopadhyay) (Paper)(Dataset)
- Dataset for Handwritten Circuit Diagram Images (paper)
- SESYD (11 types of synthetic documents, with the corresponding ground-truth)
Floor plan datasets:
- CVC-FP (Database for structural floor plan analysis)
- FPLAN-POLY dataset of vectorized graphic documents (floorplans)
- SESYD (11 types of synthetic documents, with the corresponding ground-truth)
- R-FP-500: Floor plan from Rakuten Real Estate and pixel-wise wall label (by Rakuten Institute of Technology)
Maps / cadastral datasets:
- Map Border Dataset: dataset for Detection and Segmentation tasks in Historical Cadastral Maps
Music Scores datasets:
- List of Music Scores datasets
- ICDAR/GREC competitions on music scores (CVC-MUSCIMA)
Comic book datasets:
- BCBID: Bangla Comic Book Image Dataset contains a total of 3327 images of different kinds of ‘Bengali Comic Books’ from a diverse set of renowned authors (published at ICDAR 2019).
- COMICORDA: a dataset with annotated Dialogue Act and link between balloon text and faces.
- COMICS: 1.2 million panels paired with automatic textbox transcriptions from Golden Age collection of the Digital Comics Museum (published at CVPR 2017).
- DCM772: 772 annotated images from 27 Golden Age collection of the Digital Comics Museum. It includes ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like (published at MDPI Journal Imaging 2018).
- eBDtheque: a representative database of comics of 100 pages including manual annotations of 850 panels and 1092 balloons paired with 1620 comic characters and 4693 text lines. (published at ICDAR 2013).
- EmoRecCom: ICDAR2021 Competition Multimodal Emotion Recognition on Comics scenes (codalab) (published at ICDAR 2021).
- FGC 2019: ICDAR 2019 Competition on Fine-Grained Classification of Comic Characters
- GNC: the Graphic Narrative Corpus currently contains textual metadata of about 219 titles written in English. Corresponding image are not provided due to copyright issue (published at ICDAR 2017).
- IMCDB: Indian Mythological Comic Dataset – digitized Indian comic storybook in the English language (published at ICDAR 2021).
- Manga 109: 109 manga volumes from “Manga Library Z” drawn by professional manga artists in Japan (published in Multimedia Tools and Applications Journal 2017).
- SSGCI 2016 ICPR 2016 Competition on Subgraph Spotting in Graph representation of Comic Book Images