S K Y B I T S

Case Study

A world leader Pharmaceutical company in USA

Case Study : Vendor Names Deduplication

Executive Summary

The company has large number of vendors of various categories from different parts of the world. Descriptions of various fields of vendors are incorrect, incomplete because of lack of standardizations and lack of English knowledge of some vendors. There was a need for an intelligent classification tool to classify different types of vendors from the semantics of the vendor description fields.

Challenges
  • Incomplete, misspelled and different standardizations of vendor names, addresses and other fields
  • Missing data like name, address
  • Objective
  • Automatically classify vendors into correct class
  • Solutions
  • Data Engineering on about 100,000 data points including Data cleaning & validation
  • Machine Learning Algorithms used to address the key challenges
  • Pattern matching and String distance algorithms for feature identification
  • SVM, Naive Bayes for Accurately Classifying the Strings into Binary Classes