DATA-KBR-BE will optimise KBR’s existing ICT infrastructure in order to facilitate sustainable data-level access to KBR’s digitised collections for digital humanities research. For this project, research teams at Ghent University and the University of Antwerp will work closely together with the collection, digitisation and ICT experts at KBR to co-design two interdisciplinary research scenarios that will extract relevant thematic datasets from BelgicaPress) (KBR’s digitised historical newspaper collection) for reuse and analysis in the field of digital humanities.
First, article segmentation on the newspaper archives was performed. Next, each article’s text was analyzed with NLP tools and linked to open data. Additionally, the images on each page were classified (e.g. cartoon, portrait,…) and clustered based on visual similarity. This additional metadata greatly improves the accessibility of the collection. An interactive demonstrator called NewspAIper was developed to query and visualize the collection and extracted metadata.
IDLab has the following tasks within the DATA-KBR-BE project: