Enhancing Grammar for Bulgarian
Verb Forms A Flexible Approach
Performance of the Grammar
The grammar performs well in recognizing complex tense forms with auxiliaries and full-content verbs. However, to build a comprehensive grammar for compound verb forms, it is crucial to learn from rare syntagmatic patterns, enriching the paradigmatic knowledge.
Handling Discontinuity
Discontinuous compound verb forms with adverbial and nominal inserts pose a challenge for the current grammar. To address this, rules identifying shorter segments within the verb complex (auxiliary and main verb chunks) can be applied. This approach, moving from syntagmatic realization to paradigmatic knowledge, is consistent with methodologies in other languages. Exploring discontinuity in relation to main verb forms, such as passive participles separated by adverbials, is essential for refinement Evaluating the Application.
Leveraging Treebank Construction
Evaluating the Application
Evaluating the Application of Regular Grammars for Bulgarian Verb Forms
Some Preliminary Evaluation of the Grammar Application
The application of Recognizer 2 enables correct delimitation and markup of the majority of the longest compound verb form patterns. Experiments with a newspaper corpus reveal that chunks identified by Recognizer 1 are often adjacent within the longest compound verb form patterns in communicatively unmarked written prose. The paradigmatic representation of word order within the verb complex, considering the “communicative organization of Bulgarian sentences” as outlined in Avgustinova 1997 Building a Two-Level Grammar, supports this conclusion.
Here are numerical insights into the application of the regular grammars
In a 4292-word text, Recognizer 1 identifies 536 occurrences of main verbs with or without small words, 164 occurrences of auxiliary verbs with or without sma
Building a Two-Level Grammar
Cascade for Recognizing Bulgarian Verb Forms
In the MV rule explained below, the element is the context node. XPath expressions are employed to search the local tree structure for PCDATA strings of the ph element (representing the running word string) and the ta element (indicating the correct morphosyntactic tag) to match input words to the regular expression. In the rule, input words are ordered pairs of ph content and ta content, with irrelevant content marked by the # symbol, like <“da”,”#”> or <“#”,”Pp@d@@@t”>. The grammar specifies the elements in the XML document to which it is applied (e.g., paragraph, head, and highlighted elements).
Currently, the regular grammar cascade for recognizing compound verb forms consists of two levels. Recognizer 1 includes rules operating on the first level, producing:
Verb forms with a single main verb or a main verb accompanied by small words Read more