Language Technologies and Digital Humanities: Resources and Applications (LTаDH-RA)
Sofia, Bulgaria
10-12 May 2023
Agiatis Benardou (DARIAH EU)
From Archives to Headsets: Digital Storytelling as Mediator of History
Since the 1970s the commemoration and preservation of ‘difficult heritage’, a term coined by Sharon Macdonald over fifteen years ago, has become a subject of increasing public attention. In the escalation of the European historical turn to memory, we are witnessing the emergence of a new dimension: the distinction of place through reference to historical narrative, whereby historical content is legitimised through exhibitions, memorial plaques, and other modes of urban commemoration. However, despite the opportunities afforded by immersion there has been a lack of substantive evidence to evaluate current approaches and guide future developments, especially in difficult heritage sites. Particularly in Europe, immersion has not been employed widely in such sites. This talk will discuss and expand on the affordances and challenges of designing, developing and assessing the first Virtual Reality production in Greece on Block 15, the building which served as isolation and torture space within the Concentration Camp of Haidari, Attica, Greece during 1943 and 1944. “Block 15” aims at identifying and re-purposing archival and historical resources towards the development of an immersive VR production on the tangible and intangible heritage of the site. To that end, a series of challenges had to be addressed and overcome, ranging from the overarching methodology, the point of view and narrative backbone of the digital storytelling, the development of historically accurate assets, and the integration of findings of user experience surveys carried out for the purposes of the production.
Alessandro Lenci (Università di Pisa, Italy)
The Linguist and ChatGPT
The new generation of Large Language Models (LLMs), of which ChatGPT is the most popular representative, has stormed the fields of Artificial Intelligence (AI) and Natural Language Processing (NLP). Even if LLMs have still enormous limits, much more than it appears prima facie, it is undeniable that they have changed (perhaps forever) the way of developing computational models aiming at matching humans in language understanding and generation tasks. The Linguist has always been a protagonist in this endeavor, though its role has changed with the evolution of scientific paradigms. First, the Linguist was tasked with developing theories and “grammars” to model linguistic knowledge, then with the advent of statistical methods its focus shifted to the development of annotated language resources. What now, when LLMs seem to be able to develop abilities to solve tasks even without annotated training data? What role could and should the Linguist play in the era of LLMs?
Erhard Hinrichs (Leibniz Institut für Deutsche Sprache Mannheim and Tübingen University, Germany)
FAIRification of Research Data and Services and Incorporation of New Technologies in Text+
Text+ (https://www.text-plus.org/en/home) is a research data infrastructure for the humanities, social sciences, and beyond. It is developed as part of the German national research infrastructure NFDI (https://www.nfdi.de/) and focuses on three types of research data: editions, lexical resources, and collections of written, spoken, and multimodal language data. Text+ is a consortium of more than thirty German institutions, ranging from universities, academies of arts and sciences, research institutes, libraries, and archives. It is organized as a federated network of certified data and competence centers that share a common technical infrastructure. Text+ is committed to the FAIR Guiding Principles for scientific data management and stewardship (https://www.go-fair.org/fair-principles/) and the CARE Principles (https://www.gida-global.org/care) for indigenous data governance.
In this presentation, I will focus on on-going efforts by Text+ to FAIRify its portfolio of research data and services: (i) to improve findability and interoperability of its data portfolio by resolving named entities with the help authority files such as VIAF (https://viaf.org/); (ii) to improve accessibility and interoperability of research data by generalizing CLARIN's protocol of federated content search (CLARIN-FCS; https://www.clarin.eu/content/federated-content-search-clarin-fcs-technical-details) from collections of language data to lexical resources of various kinds; to facilitate access and re-usability of copyrighted research data by generating derived data formats.
In conclusion, I will to comment on opportunities of how to incorporate new technologies such as generative pre-trained transformers for academic and industrial end users of Text+.
Milena Dobreva (Sofia University St Kliment Ohridski, Bulgaria)
From Digitisation Frenzy to Datafication Frenzy: Are Data Spaces the Silver Bullet for “Real” Digital Transformation?
Reflection on transformative initiatives in the cultural heritage during the last two or three decades brings into focus digitisation as a defining theme. As Dr Adriana Muñoz, curator from the National Museum for World Culture (Sweden), observed, these processes started off slowly but quickly became so widely adopted that the international scale of activity should be best described as a ‘digitisation frenzy’.
In the early stages, the two main drivers for digitisation were access and preservation. During the last decade, analysis emerged as a third prong, as institutions, researchers, and citizens recognised the power of tools for the exploration, mining, visualisation, and publishing of data within the cultural heritage sector. This has resulted in a ‘datafication frenzy’ that commentators recognise as a manifestation of the “datafication turn” in the cultural heritage.
Initially, communities of practice enacted datafication as a process of exploring and implementing how digital collections, especially big-scale ones, could be used. Now an active international community explores how representing and interpreting collections as data enables new kinds of research and empowers open innovation opportunities for increasingly diverse user communities, with a particular focus on citizen science.
The emergence of a new ecosystem of data spaces offers a novel, and in the long-term probably a more significant, driver for datafication. Widespread perceptions of the meaning of “data” and popular understandings of the concept of “space” has produced an ambiguous landscape where many believe they understand what ‘data spaces’ are, but stakeholders in the cultural heritage domain recognise that this community is still in an agenda-setting stage. The European Commission, Member States, researchers, cultural heritage institutions, professionals and citizens have, over the past two years, begun to invest in building a common European data space. The delivery and assessment of the value of cultural heritage ‘data spaces’ depends upon building a rich and shared understanding of what is meant by the term and how this development will transform the sector.
This talk explores how the data space developments are reshaping our communities’ conceptualisations of digital cultural heritage and how they will transform the cultural heritage sector and its user communities more broadly, and the steps we should take to build a data spaces research agenda.
Keywords: Digital transformation, data space, collections as data, Europeana, GLAM labs, data owners, data providers, data users