Software systems and services

Automatic text annotation
Automatic text annotation integrates multiple language models, each specifically trained for a distinct linguistic task. These models work together to perform a sequence of analyses across several key areas of language processing. Initially, a segmentation model identifies sentence and word boundaries within the input text. This segmentation is followed by linguistic analysis of the sentences, carried out in parallel by several specialized models.
System description.

Pretrained Large Language Models
The trained language models we use at CLaDA-BG for various tasks, such as Bulgarian language processing, vectorization, indexing, etc., are published on the following website.

LABLASS  – Web-based system for presenting and studying word associations
LABLASS is the first Bulgarian web-based system for studying word associations which has been designed by the team of the Laboratory of applied linguistics at Konstantin Preslavsky University of Shumen under the national project CLADA-BG. LABLASS web system contains data from word association collections compiled under the CLADA-BG project, as well as data from other dictionaries belonging to the Bulgarian lexicographic tradition.

LABMETA – Web-based system for presenting and studying cognitive metaphors
LABMETA is the first Bulgarian web-based system for studying cognitive metaphors in Bulgarian political speeches which has been created by the team of the Laboratory of applied linguistics at Konstantin Preslavsky University of Shumen under the national project CLADA-BG.

PHRASO-LAB – Dictionaries of Phraseological Units
PhrasoLAB-BG is the first web-based platform for studying Bulgarian and German phraseological units related to the human being as an axiological system in the Bulgarian and German phraseological picture of the world. The platform was developed within the framework of the national research project “National Interdisciplinary Research E-Infrastructure for Resources and Technologies for Bulgarian Linguistic and Cultural Heritage, Integrated within the European Infrastructures CLARIN and DARIAH (CLaDA-BG).
PhrasoLAB-BG is an electronic resource containing Bulgarian and German phraseological units.

Bulgarian corpus of text segments
Developed by Ontotext as part of the National Interdisciplinary Research E-Infrastructure for Bulgarian Language and Cultural Heritage Resources and Technologies integrated within European CLARIN and DARIAH infrastructures (CLaDA-BG).
A text segment is a fragment of text that shows a sequence of several adjacent linguistic elements (e.g., words, numbers, etc.). In the Bulgarian corpus of text segments, you can search for several segments, up to a maximum of 6. A set of several text collections has been made available together with the option to select one or more of them where to search for a given segment. The search results are displayed in two tables. The left table shows the size and the frequency of the selected segment, while  the right one  shows the left and right contexts, the date of publication of the document, as well as the source from which the segment was extracted.

Semantic search in current news and Internet content
Developed by Ontotext as part of the National Interdisciplinary Research E-Infrastructure for Bulgarian Language and Cultural Heritage Resources and Technologies integrated within European CLARIN and DARIAH infrastructures (CLaDA-BG).

Key features

  • Search in contemporary news content collected from multiple Internet sources covering various topics  (e.g., Economy, Health, Culture, etc.). 
  • Extraction of semantically enriched content and diverse information from the CLaDA-BG Knowledge Graph. 
  • Trends observation with respect to popular topics for specific periods of time. 
  • Provision of content with a similar thematic profile.

EU Context and Financial Support