Morphology stands as a vital subfield of linguistics, delving into the organization of words. While words are often perceived as the fundamental units in textual analysis, they can be deconstructed further into smaller constituents known as morphemes. These morphemes, which are the elemental units of grammatical form in language, fall into distinct classifications including roots, suffixes, prefixes, and more. Through the application of word formation rules, these morphemes combine to create coherent words. In essence, just as sentences have internal structures, words also exhibit intricate internal structures consisting of morphemes, and those morphemes are combined through morphological rules, similar to how words are composed through syntactic rules.
There are two important categories of morphemes: those that can stand alone (e.g., “car,” “freedom,” “sing”), and those that need to be attached to other morphemes (e.g., “-ish,” “bi-,” “-er,” “-est”). The former is termed “free morphemes,” while the latter is referred to as “bound morphemes.” Bound morphemes can attach at various positions around another morpheme. Certain bound morphemes are affixed to the beginning of another morpheme (e.g., “bi” in “bi” + “weekly”), which is called a prefix. When a bound morpheme is attached to the end of another morpheme (e.g., “er” in “teach” + “er”), it is a suffix. Although there exist affixes that are inserted within a morpheme (“infix”) or surround a morpheme (“circumfix”), these morphemes are rare or non-existent in English.
Morphologically complex words are decomposed into various parts such as roots and affixes. A root is a core lexical unit of a word to which affixes are attached. While a root is often a free morpheme (e.g., “teach” in “teacher”), some words have a bound morpheme as their root. For example, a word like “receive” is decomposed into “re,” a prefix, and “ceive,” a root, where the root cannot stand alone. Another important aspect of multimorphemic words is that they can further combine with other affixes to derive more complex words. Consider the word “global” as an illustration: it consists of the root “globe” and the suffix “-al.” It can further merge with additional affixes to form words like “globalize,” “globalization,” and more. These instances exemplify what is known as derivational morphology, in which added morphemes derive words with new meanings. Importantly, however, the addition of a morpheme does not always give rise to a new meaning. The so-called inflectional morpheme, primarily serving grammatical functions, does not change a word’s meaning but only alters the grammatical attributes of a word. For instance, adding the “-ed” morpheme to a verb (e.g., “walk” + “ed” -> “walked”) only modifies its grammatical form, turning the verb into its past tense form, while its meaning remains unaffected.
In PolyAnalyst, a built-in algorithm equipped with a morphological dictionary automatically tags every token in the text with morpheme-level information such as grammatical categories or tenses. For example, verbs that end with the “ed” morpheme are automatically tagged as the past tense or the past participial form of the verb. Furthermore, PolyAnalyst offers a variety of morphological functions in the PDL query language, allowing end users to craft precise queries tailored to their specific needs. The video clip below illustrates some of these functions. As shown here, users can employ the lemma function to retrieve all word forms of a given lemma. For example, lemma(face) finds all inflectional forms such as “faces,” “faced,” “facing,” and so on. Since the word “face” can function as both a noun and a verb, a user can further refine the search to locate only the forms corresponding to the noun sense; e.g., lemma(noun, face).
lemma(face) and lemma(noun, face)
Notably, PolyAnalyst defaults to a comprehensive search of all word forms of a given word. That is, if a word “facing” is included in a query, PolyAnalyst will retrieve all inflectional forms of its root “face,” capturing “face,” “faced,” “faces,” and “facing.” If a user wishes to narrow down the search to a specific word form, they can utilize the form function, which then exclusively identifies instances of the same word form as the argument; for instance, form(facing) exclusively identifies instances of “facing,” and nothing else.
Among this series of functions, the one that covers the broadest scope is singleroot. This function searches not only for all possible inflectional forms of a word but also for all words that share the same root. For instance, singleroot(face) would match a word like “facial,” along with “face,” “faced,” “facing,” and so on.
form(facing) and singleroot(facing)