International CLaDA-BG Conference 2023

Sofia, Bulgaria

10-12 May 2023

Conference main page

First day: 10. 05. 2023 (Wednesday)

Venue: "Prof. Marin Drinov" Hall at BAS-Administration (Sofia, 1040, 1 "15 November" Str.)

09:30 – 10:15 Registration

Session 01 (Chair: Galia Angelova)

This session will be streamed in YouTube at this address!

10:15 – 10:25 Opening

10:25 – 10:45 An Official Ceremony on Presenting a Certificate of Recognition for Merits to Bulgarian Academy of Sciences to Prof. Erhard Hinrichs

10:45 – 11:15 Coffee break

Session 02 (Chair: Kiril Simov)

11:15 – 12:15 Invited talk: Erhard Hinrichs. FAIRification of Research Data and Services and Incorporation of New Technologies in Text+

12:15 – 14:00 Lunch break

Session 03 (Chair: Petya Osenova)

14:00 – 14:25 Michele Mallia, Michela Bandini, Valeria Quochi, An Interface for Linking Ancient Languages

14:25 – 14:50 Alek Keersmaekers, Frédéric Pietowski, Toon Van Hal and Mark Depauw, The Browser-based GLAUx Treebank Infrastructure: Framework, Functionality and Future

14:50 – 15:15 Zhivko Angelov, Ivaylo Nachev, Kiril Simov, Design and Implementation of CLaDA-BG-Research System: Modeling Streets Through a Sofia Usecase. Presentation.

15:15 – 15:45 Coffee break

Session 04 (Chair: Yura Konstantinova)

15:45 – 16:10 Georgi Vasilev. Interactive Virtual Museum Development for Oculus Quest in Unity 3D

16:10 – 16:35 Zlatomira Gerdzhikova. LABedia as Digital Resource for Academic Research

Demo

16:40 – 17:45 Different tools by conference participants will be demonstrated.

Welcome Party 17:50 – 19:00

Second day: 11. 05. 2023 (Thursday)

Venue: Conference hall at Eurostars Sofia City

Session 05 (Chair: Dimitar Popov)

09:30 – 09:55 Stamatia Fotiadou, “In the beginning was the railway”: An interactive narration about the history of Alexandroupolis

09:55 – 10:20 Ivan Kratchanov, Serious Games In Support of the Promotion of Textual Cultural Heritage: State-of-the-Art Survey

10:20 – 10:45 Miglena Raykovska, Nikolay Petkov, Hristina Klecherova, Kristen Jones, Stefan Alexandrov, Combining 3D Scanning and Photogrammetry for High-resolution Artifact Documentation

10:45 – 11:15 Coffee break

Session 06 (Chair: Dimitar Illiev)

11:15 – 12:15 Invited talk: Agiatis Benardou. From Archives to Headsets: Digital Storytelling as Mediator of History

12:15 – 14:00 Lunch break

Session 07 (Chair: Petya Osenova)

14:00 – 15:00 Invited talk: Alessandro Lenci. The Linguist and ChatGPT. Presentation.

15:00 – 15:30 Coffee break

Session 08 (Chair: Kiril Simov)

15:30 – 15:55 Preslava Georgieva, Petya Osenova, Kiril Simov, The Treatment of Named Entities in the Bulgarian Event Corpus. Presentation.

15:55 – 16:20 Dimitar Popov, Velka Popova, Zhaneta Andreeva, Krasimir Kordov, Stanimir Zhelezov, LABMETA - a Web-based System for Studying Cognitive Metaphors in Bulgarian Political Speeches

16:20 – 16:45 Keith Peter Kiely, A Framework for Analysing Disinformation Narratives: Migrants and Refugees in Bulgaria

16:45 – 17:10 Cindy Rico Carmona. Automatic Abbreviation Resolution for Early Modern Latin and Spanish TEI Texts

Dinner 18:30 – details will be announced at the conference

Third day: 12. 05. 2023 (Friday)

Venue: Conference hall at Eurostars Sofia City

Session 09 (Chair: Ivan Georgiev)

09:30 – 09:55 Iva Marinova, Kiril Simov, Petya Osenova, Developing Transformer-Based Language Models for Bulgarian. Presentation.

09:55 – 10:20 Kristiyan Simeonov, Development of a Language Model and Tools for the Analysis of the Instances of Vulgar Latin in the Work of Petronius

10:20 – 10:45 Margaret Dimitrova, Maxim Goynov, Detelin Luchev, Konstantin Rangochev, Enciclopaedia Slavica Sanctorum in 2012-2022: Main Tendencies and Perspectives

10:45 – 11:15 Coffee break

Session 10 (Chair: Desislava Paneva-Marinova)

11:15 – 12:15 Invited talk: Milena Dobreva. From Digitisation Frenzy to Datafication Frenzy: Are Data Spaces the Silver Bullet for “Real” Digital Transformation?. Presentation.

12:15 – 14:00 Lunch break

Session 11 (Chair: Dimitar Popov)

14:00 – 14:25 Vasiliki Kokla, Myrto Vouleli and Anthi Theodoropoulou, Combination of Spectral Imaging with Computational Techniques to Illegible Texts of Copy Letters Come from Alterations of Writing. Case Study: Outgoing Copy Letters of the National Bank of Greece 1853-1858

14:25 – 14:50 Mila Maeva, Ethnography and Digital Humanities

14:50 – 15:20 Coffee break

Session 12 (Chair: Mila Maeva)

15:20 – 15:45 Zara Kancheva, Kiril Simov, Petya Osenova, Current State of BTB-WordNet: Overview of Semantic Relations. Presentation.

15:45 – 16:10 Alexandra Nikolova, Emanuela Mitreva, Vladimir Georgiev, Personalization in Digital Libraries

16:10 – 16:35 Alexandru Colesnicov and Ludmila Malahov, Revitalizing Romanian Dialectal Phonetic Texts with Computational Technology

16:35 – 16:45 Closing

Keynote speeches abstracts

Language Technologies and Digital Humanities: Resources and Applications (LTаDH-RA)

Sofia, Bulgaria

10-12 May 2023

Agiatis Benardou (DARIAH EU)

From Archives to Headsets: Digital Storytelling as Mediator of History

Since the 1970s the commemoration and preservation of ‘difficult heritage’, a term coined by Sharon Macdonald over fifteen years ago, has become a subject of increasing public attention. In the escalation of the European historical turn to memory, we are witnessing the emergence of a new dimension: the distinction of place through reference to historical narrative, whereby historical content is legitimised through exhibitions, memorial plaques, and other modes of urban commemoration. However, despite the opportunities afforded by immersion there has been a lack of substantive evidence to evaluate current approaches and guide future developments, especially in difficult heritage sites. Particularly in Europe, immersion has not been employed widely in such sites. This talk will discuss and expand on the affordances and challenges of designing, developing and assessing the first Virtual Reality production in Greece on Block 15, the building which served as isolation and torture space within the Concentration Camp of Haidari, Attica, Greece during 1943 and 1944. “Block 15” aims at identifying and re-purposing archival and historical resources towards the development of an immersive VR production on the tangible and intangible heritage of the site. To that end, a series of challenges had to be addressed and overcome, ranging from the overarching methodology, the point of view and narrative backbone of the digital storytelling, the development of historically accurate assets, and the integration of findings of user experience surveys carried out for the purposes of the production.

Alessandro Lenci (Università di Pisa, Italy)

The Linguist and ChatGPT

The new generation of Large Language Models (LLMs), of which ChatGPT is the most popular representative, has stormed the fields of Artificial Intelligence (AI) and Natural Language Processing (NLP). Even if LLMs have still enormous limits, much more than it appears prima facie, it is undeniable that they have changed (perhaps forever) the way of developing computational models aiming at matching humans in language understanding and generation tasks. The Linguist has always been a protagonist in this endeavor, though its role has changed with the evolution of scientific paradigms. First, the Linguist was tasked with developing theories and “grammars” to model linguistic knowledge, then with the advent of statistical methods its focus shifted to the development of annotated language resources. What now, when LLMs seem to be able to develop abilities to solve tasks even without annotated training data? What role could and should the Linguist play in the era of LLMs?

Erhard Hinrichs (Leibniz Institut für Deutsche Sprache Mannheim and Tübingen University, Germany)

FAIRification of Research Data and Services and Incorporation of New Technologies in Text+

Text+ (https://www.text-plus.org/en/home) is a research data infrastructure for the humanities, social sciences, and beyond. It is developed as part of the German national research infrastructure NFDI (https://www.nfdi.de/) and focuses on three types of research data: editions, lexical resources, and collections of written, spoken, and multimodal language data. Text+ is a consortium of more than thirty German institutions, ranging from universities, academies of arts and sciences, research institutes, libraries, and archives. It is organized as a federated network of certified data and competence centers that share a common technical infrastructure. Text+ is committed to the FAIR Guiding Principles for scientific data management and stewardship (https://www.go-fair.org/fair-principles/) and the CARE Principles (https://www.gida-global.org/care) for indigenous data governance.

In this presentation, I will focus on on-going efforts by Text+ to FAIRify its portfolio of research data and services: (i) to improve findability and interoperability of its data portfolio by resolving named entities with the help authority files such as VIAF (https://viaf.org/); (ii) to improve accessibility and interoperability of research data by generalizing CLARIN's protocol of federated content search (CLARIN-FCS; https://www.clarin.eu/content/federated-content-search-clarin-fcs-technical-details) from collections of language data to lexical resources of various kinds; to facilitate access and re-usability of copyrighted research data by generating derived data formats.

In conclusion, I will to comment on opportunities of how to incorporate new technologies such as generative pre-trained transformers for academic and industrial end users of Text+.

Milena Dobreva (Sofia University St Kliment Ohridski, Bulgaria)

From Digitisation Frenzy to Datafication Frenzy: Are Data Spaces the Silver Bullet for “Real” Digital Transformation?

Reflection on transformative initiatives in the cultural heritage during the last two or three decades brings into focus digitisation as a defining theme. As Dr Adriana Muñoz, curator from the National Museum for World Culture (Sweden), observed, these processes started off slowly but quickly became so widely adopted that the international scale of activity should be best described as a ‘digitisation frenzy’.

In the early stages, the two main drivers for digitisation were access and preservation. During the last decade, analysis emerged as a third prong, as institutions, researchers, and citizens recognised the power of tools for the exploration, mining, visualisation, and publishing of data within the cultural heritage sector. This has resulted in a ‘datafication frenzy’ that commentators recognise as a manifestation of the “datafication turn” in the cultural heritage.

Initially, communities of practice enacted datafication as a process of exploring and implementing how digital collections, especially big-scale ones, could be used. Now an active international community explores how representing and interpreting collections as data enables new kinds of research and empowers open innovation opportunities for increasingly diverse user communities, with a particular focus on citizen science.

The emergence of a new ecosystem of data spaces offers a novel, and in the long-term probably a more significant, driver for datafication. Widespread perceptions of the meaning of “data” and popular understandings of the concept of “space” has produced an ambiguous landscape where many believe they understand what ‘data spaces’ are, but stakeholders in the cultural heritage domain recognise that this community is still in an agenda-setting stage. The European Commission, Member States, researchers, cultural heritage institutions, professionals and citizens have, over the past two years, begun to invest in building a common European data space. The delivery and assessment of the value of cultural heritage ‘data spaces’ depends upon building a rich and shared understanding of what is meant by the term and how this development will transform the sector.

This talk explores how the data space developments are reshaping our communities’ conceptualisations of digital cultural heritage and how they will transform the cultural heritage sector and its user communities more broadly, and the steps we should take to build a data spaces research agenda.

Keywords: Digital transformation, data space, collections as data, Europeana, GLAM labs, data owners, data providers, data users