Datasets/Softwares

Circuit diagram:

Floor plan:

  • BRIDGE: Building Plan Repository for Image Description Generation, and Evaluation (paper)
  • CVC-FP (Database for structural floor plan analysis)
  • FPLAN-POLY dataset of vectorized graphic documents (floorplans)
  • SESYD: Systems Evaluation SYnthetic Documents11 types of synthetic documents (paper)
  • R-FP-500: Floor plan from Rakuten Real Estate and pixel-wise wall label (by Rakuten Institute of Technology)

Maps/cadastral:

  • Map Border Dataset: dataset for Detection and Segmentation tasks in Historical Cadastral Maps

Music Scores:

Comic book:

  • BCBID: Bangla Comic Book Image Dataset contains a total of 3327 images of different kinds of ‘Bengali Comic Books’ from a diverse set of renowned authors (published at ICDAR 2019).
  • CDVSR: Comics Dataset for Visual Sentiment Recognition, 10,281​ images of comic and manga.
  • ComSet: 54K strips, harvested from 13 popular comics available online.
  • COMICORDA: A Novel Dataset for Dialogue Act Recognition in Comics, an extension of the EmoRecCom dataset.
  • COMICS: 1.2 million panels paired with automatic textbox transcriptions from Golden Age collection of the Digital Comics Museum (published at CVPR 2017). New OCRed text 2024 here.
  • Comics Datasets Framework: Mix of Comics datasets for detection benchmarking (ICDAR 2024)
  • COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
  • DCM772: 772 annotated images from 27 Golden Age collection of the Digital Comics Museum. It includes ground-truth bounding boxes of all panels, all characters (body + faces), small or big, human-like or animal-like (published at MDPI Journal Imaging 2018).
  • eBDtheque: a representative database of comics of 100 pages including manual annotations of 850 panels and 1092 balloons paired with 1620 comic characters and 4693 text lines. (published at ICDAR 2013).
  • EmoRecCom: ICDAR2021 Competition Multimodal Emotion Recognition on Comics scenes (codalab) (published at ICDAR 2021).
  • FGC 2019: ICDAR 2019 Competition on Fine-Grained Classification of Comic Characters
  • GNC: the Graphic Narrative Corpus currently contains textual metadata of about 219 titles written in English. Corresponding image are not provided due to copyright issue (published at ICDAR 2017).
  • iCartoonFace: a large-scale challenging dataset established for cartoon face recognition. 389,678 images of 5,013 cartoon persons collected from 1,302 cartoon albums (published at ICM 2020).
  • IMCDB: Indian Mythological Comic Dataset – digitized Indian comic storybook in the English language (published at ICDAR 2021).
  • KABOOM ONOMATOPEA: Comic Onomatopoeia Dataset for Extracting Arbitrary or Truncated Texts
  • Manga 109: 109 manga volumes from “Manga Library Z” drawn by professional manga artists in Japan (published in Multimedia Tools and Applications Journal 2017).
  • OpenMantra: evaluation dataset of 5 manga titles (JA/EN/ZH text+images) for machine translation, presented in AAAI 2021.
  • SSGCI 2016 ICPR 2016 Competition on Subgraph Spotting in Graph representation of Comic Book Images
  • VLRC: Visual Language Research Corpus made up of ~36,000 coded panels from 300+ comics from Europe, Asia, and the United States, across time periods (1940-present), and various genres.