Services, Resources and Tools

CLaDA-BG provides resources and services that can be used for research purposes, as well as a wide range of users.

The various resources and services have been developed by partners of the infrastructure, and these are publicly available for usage through the dedicated APIs.

For off-line access please contact us!

Event Corpus of Bulgarian 0.1

Valency Dictionary of Bulgarian 0.1

UD BulTreeBank 2.10

More info at: https://universaldependencies.org/

Reference corpus of Bulgarian – BulTreeBank (web concordance): http://webclark.org/

Corpus of Parliamentary and Journalistic Speech (web concordance): http://political.webclark.org/

Semantic dictionary with senses and synonyms (BTB-Wordnet) – version 1: http://compling.hss.ntu.edu.sg/omw/

CLaDA-BG Corpora in NoSketch Engine

We will constantly add new corpora to the search engine for searching of words in context.

Currently there are tree corpora:

The Dependency version of BulTreeBank.
Bulgarian News corpus automatically annotated with UD format.
Bulgarian literature corpus automatically annotated with UD format.

BTB-WN

The Bulgarian BulTreeBank WordNet is available for online browsing here. It could be used to explore semantic relations between words and for spelling and meaning checks.

The integrated system for corpora and dictionaries

The integrated system for corpora and dictionaries provides unified access to several electronic language resources: Concordance/Words in their context (it allows for presentation of words or phrases in their left or right context), All about words (provides searching of words simultaneously in an explanatory, inflectional and etymological dictionaries, and gives grammatical information and examples), and the BulTreeBank WordNet, which combines lexical, semantic and encyclopedic information.

Online Bulgarian grammar drills

Online Bulgarian grammar drills in 11 categories (including agreement, tense, plural forms, pronouns and others), which can be used to check and enrich one’s skills in Bulgarian grammar. The exercises are suitable for pupils, students, foreigners and anyone who wants to improve their knowledge of Bulgarian.

The Meaning game

The Meaning game presents options for the selection of the correct word meaning in a sentence. It has two levels of difficulty. Here you can check how well you interpret word senses in a given text.

CHILDES Bulgarian LabLing Corpus
DOI: 10.21415/PHWH-J834
THE BULGARIAN LABLING CORPUS, the first Bulgarian children`s speech corpus, has been published to on the CHILDES (Child Language Data Exchange System) platform.
THE BULGARIAN LABLING CORPUS is created by the researchers of the LABLING laboratory of applied linguistics at “Episkop Konstantin Preslavski” University, Shumen, the laboratory being a technological partner in the national CLaDa-BG project.

LABLASS – Web-based system for presenting and studying word associations
LABLASS is the first Bulgarian web-based system for studying word associations which has been designed by the team of the Laboratory of applied linguistics at Konstantin Preslavsky University of Shumen under the national project CLADA-BG. LABLASS web system contains data from word association collections compiled under the CLADA-BG project, as well as data from other dictionaries belonging to the Bulgarian lexicographic tradition.

LABMETA – Web-based system for presenting and studying cognitive metaphors
LABMETA is the first Bulgarian web-based system for studying cognitive metaphors in Bulgarian political speeches which has been created by the team of the Laboratory of applied linguistics at Konstantin Preslavsky University of Shumen under the national project CLADA-BG.

IICT-BAS published three resources in ELRA catalogue:

Bulgarian Treebank Corpus
ISLRN: 761-430-854-533-2
The Bulgarian Treebank Corpus is composed of 156,149 tokens (11,138 sentences) coming from three main sources in the domain of Grammar Notebooks (1,391 sentences), News (6,698 sentences), Other (3,049 sentences). It is available with syntactical and morphological annotation on a sentence basis in Universal Dependencies format.

Bulgarian Event Corpus
ISLRN: 832-960-876-604-2
The Bulgarian Event Corpus is composed 324,905 tokens appropriate for training Named Entity Recognition (NER), Named Entity Linking (NEL) and Event Recognition models for Bulgarian in a multidomain context within Humanities. The texts are domain related. They include documents from the area of Social Sciences and Humanities – scientific papers, archive documents, popular documents, and Wikipedia articles in the relevant areas.

Bulgarian Valency Frame Lexicon
ISLRN: 188-702-981-369-5
The Bulgarian Valency Frame Lexicon is composed of 9547 lexical entries organized by frames with 960 mappings to Princeton WordNet available in XML format. It is a treebank-driven resource of extracted valency frames from BulTreeBank. The frames were manually curated. The structure of the frames follows the BulTreeBank syntactic structure.

Parliamentary Corpora from Phase 1 of ParlaMint project :

Corpora as Data: Multilingual comparable corpora of parliamentary debates ParlaMint 1.0 in CLARIN.SI repository: http://hdl.handle.net/11356/1345

Corpora in Concordancers: NoSketch Engine: https://www.clarin.si/noske/index-en.html (Look for: - ParlaMint-SI 1.0 (parliament: COVID) - ParlaMint-BG 1.0 (parliament: COVID) - ParlaMint-HR 1.0 (parliament: COVID) - ParlaMint-PL 1.0 (parliament: COVID)

Kontext

List of Intransitive Verbs in Bulgarian
If you use this resource, please cite the link.

Frequency List of Bulgarian Verbs
The list is lemma-based, but it reflects all the synthetic wordforms of a verb.
If you use this resource, please cite the link.