Improving OCR Accuracy for Historical Bulgarian Texts

The combined efforts of National Library “Ivan Vazov“ – Plovdiv and the Institute of Information and Communication Technology at the Bulgarian Academy of Sciences (IICT-BAS) led to promising advances in the improvement of the optical recognition of historical Bulgarian texts printed before the last orthographic reform in 1945. The approach was based on the creation and implementation of an inflectional lexicon that comprises 1 121 872 wordforms written in accordance with the rules of Drinov-Ivanchev orthography. The results of the tests conducted at NLIB during the final phase were presented at the Workshop Twin Talks 3: Understanding and Facilitating Collaboration in DH organized by CLARIN ERIC and DARIAH ERIC.

The paper Towards Improving OCR Accuracy with Bulgarian Language Resources is already available online: http://ceur-ws.org/Vol-2717.