Building a Two-Level Grammar


Cascade for Recognizing Bulgarian Verb Forms

In the MV rule explained below, the element is the context node. XPath expressions are employed to search the local tree structure for PCDATA strings of the ph element (representing the running word string) and the ta element (indicating the correct morphosyntactic tag) to match input words to the regular expression. In the rule, input words are ordered pairs of ph content and ta content, with irrelevant content marked by the # symbol, like <“da”,”#”> or <“#”,”Pp@d@@@t”>. The grammar specifies the elements in the XML document to which it is applied (e.g., paragraph, head, and highlighted elements).

Currently, the regular grammar cascade for recognizing compound verb forms consists of two levels. Recognizer 1 includes rules operating on the first level, producing:

Verb forms with a single main verb or a main verb accompanied by small words Enhancing Grammar for Bulgarian.
Verb forms with a single auxiliary verb or an auxiliary verb accompanied by small words.
Groups of small words enclosed in a separate chunk, unattached to a verb due to specific cases of linear order and discontinuity.
Recognizer 2, the second-level set of rules, outputs segments corresponding to full tense, mood, and voice forms, including small words. Examples of rules in Recognizer 2 include

\w -> ?, +,
\w -> , +

Here, the right-hand side’s regular expression input words are tags from Level 1 pattern recognition, with MV representing a main verb chunk, XV for an auxiliary verb chunk Communist Bulgaria Tour, and CT for an independent chunk of small words. These rules tend to overgenerate verb complex constructs as they lack a detailed description of possible verb complex constructs in relation to subordinate clauses. The left-hand side of the rules specifies the recognized pattern category as a verb complex, marked in the XML document with a VC tag.


Please enter your comment!
Please enter your name here