About the project

DATA-KBR-BE will optimise KBR’s existing ICT infrastructure in order to facilitate sustainable data-level access to KBR’s digitised collections for digital humanities research. For this project, research teams at Ghent University and the University of Antwerp will work closely together with the collection, digitisation and ICT experts at KBR to co-design two interdisciplinary research scenarios that will extract relevant thematic datasets from BelgicaPress) (KBR’s digitised historical newspaper collection) for reuse and analysis in the field of digital humanities.

First, article segmentation on the newspaper archives was performed. Next, each article’s text was analyzed with NLP tools and linked to open data. Additionally, the images on each page were classified (e.g. cartoon, portrait,…) and clustered based on visual similarity. This additional metadata greatly improves the accessibility of the collection. An interactive demonstrator called NewspAIper was developed to query and visualize the collection and extracted metadata.

IDLab role

IDLab has the following tasks within the DATA-KBR-BE project:

  1. Article segmentation and document layout analysis
  2. Named entity recognition and linking with open data
  3. Image classification and clustering based on visual similarity
  4. Development of an interactive demonstrator

Contact the involved IDLab Researchers

Main researcher
ing. Dilawar Ali
Research supervisor
prof. dr. Steven Verstockt