Megaputer Intelligence Inc. and MicroSystems Ltd. have partnered to develop a new text mining product called TextAnalyst. The desktop tool provides semantic analysis, summarization, and navigation of natural language texts.
TextAnalyst employees several innovative techniques to mine text. The product draws on MicroSystems’ more than 15 years of experience in the field of text analysis. Megaputer, a data mining company, plans to integrate its PolyAnalyst neural networking tool with TextAnalyst to create a complete suite of analysis tools.
Three Phase Process
TextAnalyst performs three main processes. Text is pushed through a variable-length window one symbol at a time. Symbols can be letters, punctuation, or blank spaces. The window can be set from two to 20 symbols wide. As a stream of text is passed through the window, snapshots are made that are used to create a representation of words, word roots, and word groupings in the text.
The next step is to identify how often these concepts are encountered together in some semantic piece of the text, such as a sentence. “We go through and identify how often they appear in a paragraph or chapter to discover preliminary relations between concepts,” reported Sergei Ananyan, vice president, marketing, Megaputer.
After this step, the system has a preliminary semantic network developed where every word has a weight and every concept found has a corresponding weight based on a frequency analysis. The relationships between terms also are weighted based on how often they occur together.
The third step in the process involves the use of a neural network similar to the old Hopfield networks, which are one-dimensional neural networks where all neurons are connected. “Although Hopfield networks were abandoned a long time ago, in this task they are very effective,” noted Ananyan.
The initial semantic network is used as as input to the neural network. The result is a refined semantic network, Ananyan said. Through a reweighting of relationships, the network is renormalized, producing a final semantic network.
The creation of the semantic network is the most important underlying element. Once you have done that you can do lots of things, noted Ananyan. Applications include building knowledge bases, analyzing the contents of arbitrary texts, abstracting texts, classifying texts into specified subjects, and performing semantic searches of information.
None of the Megaputer/MicroSystems beta customers were prepared to talk to the press on the record but one client would say anonymously, that the program was “extremely user friendly” and could be recommended to collaborators less familiar with qualitative data analysis techniques.
Mating with Data Mining Tool
Megaputer plans to integrate its existing data mining tools with TextAnalyst, perhaps as early as the next release. An interesting cross mining application is market response analysis.
Many large companies want to better monitor the market response to their activities, such as a new product launch. TextAnalyst can examine press articles and other outside information about a company and its competitors. Information cluster can be created with textAnalyst that describes the key concepts in the information. The PolyAnalyst data mining tool can then be used over time to examine cluster dynamics.
“As companies take actions we can monitor how clusters start to change, how competitors’ clusters evolve and how their cluster evolve. We provide a measure of the companies influence on the market,” Ananyan said.