Normalization in preparation for fuzzy matching
When we need to compare entities stored as string data we often need to consider many different pieces of the string. For example, suppose we examine the following names in our data: Steven M Johnson Steve Michael Johnson When we, as humans, compare these entities, we don’t directly compare each string as a whole. Instead we compare substrings. “Steven” is compared to “Steve,” “M” to “Michael,” and “Johnson” to “Johnson.”…