A world leader Pharmaceutical company in USA
Case Study : Vendor Names Deduplication
Executive Summary
The company has large number of vendors of various categories from different parts of the world. Descriptions of various fields of vendors are incorrect, incomplete because of lack of standardizations and lack of English knowledge of some vendors. There was a need for an intelligent classification tool to classify different types of vendors from the semantics of the vendor description fields.
Incomplete, misspelled and different standardizations of vendor names, addresses and other fields
Missing data like name, address
Automatically classify vendors into correct class
Data Engineering on about 100,000 data points including Data cleaning & validation
Machine Learning Algorithms used to address the key challenges
Pattern matching and String distance algorithms for feature identification
SVM, Naive Bayes for Accurately Classifying the Strings into Binary Classes