Historical Documents and Automatic Text Recognition: Introduction
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Pinche, Ariane, Stokes, Peter, Anthony
1M ago
With this special issue of the Journal of Data Mining and Digital Humanities (JDMDH), we bringtogether in one single volume several experiments, projects and reflections related to automatic textrecognition applied to historical documents.More and more research projects1 now include automatic text acquisition in their data processing chain,and this is true not only for projects focussed on Digital or Computational Humanities but increasinglyalso for those that are simply using existing digital tools as the means to an end. The increasing useof this technology has led to an automation of tasks ..read more
Visit website
Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Romein, C. Annemieke, Hodel, Tobias, Gordijn, Femke, Zundert, Joris J. van, Chagué, Alix, Lange, Milan van, Jensen, Helle Strandgaard, Stauder, Andy, Purcell, Jake, Terras, Melissa M., Heuvel, Pauline van den, Keijzer, Carlijn, Rabus, Achim, Sitaram, Chantal, Bhatia, Aakriti, Depuydt, Katrien, Afolabi-Adeolu, Mary Aderonke, Anikina, Anastasiia, Bastianello, Elisa, Benzinger, Lukas Vincent, Bosse, Arno, Brown, David, Charlton, Ash, Dannevig, André Nilsson, Gelder, Klaas van, Go, Sabine C.P.J., Goh, Marcus J.C., Gstrein, Silvia, Hasan, Sewa, Heide, Stefan von der, Hindermann, Maximilian, Huff, Dorothee, Huysman, Ineke, Idris, Ali, Keijzer, Liesbeth, Kemper, Simon, Koenders, Sanne, Kuijpers, Erika, Rønsig Larsen, Lisette, Lepa, Sven, Link, Tommy O., Nispen, Annelies van, Nockels, Joe, Noort, Laura M. van, Oosterhuis, Joost Johannes, Popken, Vivien, Estrella Puertollano, María, Puusaag, Joosep J., Sheta, Ahmed, Stoop, Lex, Strutzenbladh, Ebba, Sijs, Nicoline van der, Spek, Jan Paul van der, Trouw, Barry Benaissa, Van Synghel, Geertrui, Vučković, Vladimir, Wilbrink, Heleen, Weiss, Sonia, Wrisley, David Joseph, Zweistra, Riet
1M ago
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the ..read more
Visit website
Toward Automatic Typography Analysis: Serif Classification and Font Similarities
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Wasim, Syed Talal, Collaud, Romain, Défayes, Lara, Henchoz, Nicolas, Slazmann, Mathieu, Ribes Lemay, Delphine
2M ago
Whether a document is of historical or contemporary significance, typography plays a crucial role in its composition. From the early days of modern printing, typographic techniques have evolved and transformed, resulting in changes to the features of typography. By analyzing these features, we can gain insights into specific time periods, geographical locations, and messages conveyed through typography. Therefore, in this paper, we aim to investigate the feasibility of training a model to classify serif typeswithout knowledge of the font and character. We also investigate h ..read more
Visit website
ArchEthno - a new tool for sharing research materials and a new method for archiving your own research
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Weber, Florence, Zwölf, Carlo, Maria, Trouche, Arnaud, Tricoche, Agnès, Sastre, José
3M ago
The archiving of ethnographic material is generally considered a blind spot in ethnographic working methods which place more importance on actual investigations and analysis than on how archives are constructed. A team of computer scientists and ethnographers has built an initial tool for sharing ethnographic materials, based on an SQL relational data model that suited the first survey processed but proved difficult to transpose to other surveys. The team developed a new tool based on dynamic vocabularies of concepts which breaks down archiving into three stages. Firstly ethnographers can sele ..read more
Visit website
You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Clérice, Thibault
4M ago
Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks. The ability of identifying main body of text from marginal text or running titles makes the difference between extracting the work full text of a digitized book and noisy outputs. We show that most segmenters focus on pixel classification and that polygonization of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards), despite being the focus in the early 2010s. We prop ..read more
Visit website
EpiSearch. Identifying Ancient Inscriptions in Epigraphic Manuscripts
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Calvelli, Lorenzo, Boschetti, Federico, Tommasi, Tatiana
4M ago
AbstractEpigraphic documents are an essential source of evidence for our knowledge of the ancient world. Nonetheless, a significant number of inscriptions have not been preserved in their material form. In fact, their texts can only be recovered thanks to handwritten materials and, in particular, the so-called epigraphic manuscripts. EpiSearch is a pilot project that explores the application of digital technologies deployed to retrieve the epigraphic evidence found in these sources. The application of Handwritten Text Recognition (HTR) to epigraphic manuscripts is a challenging task, given the ..read more
Visit website
La reconnaissance de l'écriture pour les manuscrits documentaires du Moyen Âge
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Torres Aguilar, Sergio, Jolivet, Vincent
4M ago
Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotable corpora, providing valuable research insight for various historical inquiries. However, several challenges must be addressed, including the scarcity of relevant training corpora, the consequential variability introduced by different scribal hands and writing scripts, and the complexity of page lay ..read more
Visit website
Preparing Big Manuscript Data for Hierarchical Clustering with Minimal HTR Training
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Perdiki, Elpida
4M ago
HTR (Handwritten Text Recognition) technologies have progressed enough to offer high-accuracy results in recognising handwritten documents, even on a synchronous level. Despite the state-of-the-art algorithms and software, historical documents (especially those written in Greek) remain a real-world challenge for researchers. A large number of unedited or under-edited works of Greek Literature (ancient or Byzantine, especially the latter) exist to this day due to the complexity of producing critical editions. To critically edit a literary text, scholars need to pinpoint text variations on sever ..read more
Visit website
Generic HTR Models for Medieval Manuscripts. The CREMMALab Project
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Pinche, Ariane
4M ago
In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, either financially or in terms of their expertise in artificial intelligence. It may therefore be difficult to integrate handwritten text recognition into their project pipeline if they need to train a model or to create data from scratch. The goal here is not to explain how to build or improve a new ..read more
Visit website
The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique
EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH)
by Couture, Beatrice, Verret, Farah, Gohier, Maxime, Deslandres, Dominique
5M ago
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determ ..read more
Visit website

Follow EPIsciences | Journal of Data Mining & Digital Humanities (JDMDH) on FeedSpot

Continue with Google
Continue with Apple
OR