For beginners, text analysis often begins with a simple keyword search. Analysts can start by searching for “great service”. They may then begin adding other searches like “service was great” or “service is so great”. This, of course, is a very rudimentary approach that will miss a lot of relevant comments, so users who want to get more results quickly will advance to proximity searches. For example, they may search for “great” within three words of “service”. While some may use word lists (“awesome”, “good”, or “excellent” service) or parts of speech (adjectives near “service”) to improve results, most text analysis systems stop right there.
For advanced text searching, however, you need to have information about word relationships. For example, “the insured driver hit a bus” and “a bus hit an insured driver” are two very different events that might help a text analysis system determine who was at fault in the accident. A basic search might try to find the first event by looking for “insured driver”, “hit”, and “bus”, in that order, but what if “the insured driver was hit by a bus”? It matches, but it’s the opposite of what the search thought it was looking for.
Of course, there are ways to try to make up for a computer’s lack of knowledge about word relationships. For example, we could change the search to exclude any “was hit by” sentences. But there are many other factors to consider. For example, if we don’t restrict the distance, we will correctly find “the insured driver, who drove a red 2008 Ford Focus, hit a bus”. However, we will also find “the insured driver reported that a red 2008 Ford Focus hit a bus.” With human language so rich and intricate, it is virtually impossible to accurately account for all possible expressions of the event we’re searching for, without some more clever understanding of the structure and relationships involved in putting words together.
Dependency parsing provides this information. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions, such as:
Did the claimant run a red light?
Knowing which party ran a red light can help us determine who was at fault in an accident. In this case, “claimant” is the subject of “run”, which has the object “red light”. From this, we can deduce that “claimant” is the do-er for the action of “running a red light”, and it may be more likely that the claimant is at fault for what happened next.
What assets are my competitors buying up?
Here, we see that in a sentence where Amazon is the subject, Whole Foods is the object being purchased. Dependency parsing correctly identifies this sentence, and accurately analyzes the one below, as well:
In this case, the same phrase is present (“Amazon to buy Whole Foods”), but a different acquisition is announced. In this case, it is rather the competitor of Whole Foods, Trader Joe’s, that is being purchased. While these major events (one real, one imaginary) are unlikely to go unnoticed, many smaller competitor acquisitions are regularly announced this way in unread publications and press releases.
What impression do our customer service representatives leave?
For this sentence, dependency parsing correctly identifies multiple pieces of information about the customer service – both that the customer service is “outstanding” and that the customer service being discussed is specifically in relation to the “tech department”.
There are many ways to use dependency parsing to more accurately query your text data. Of course, there are many other intricacies of language to consider, but those are topics for another day.
Dependency parsing is one of the features of PolyAnalyst, Megaputer’s text analysis software.