Record Linking for Meltwater's Knowledge Graph
Marton Mihaltz
zoom
Record linking, also known as entity resolution, is a method which clusters database records/knowledge base entries such that each cluster corresponds to a single distinct real-world entity (e.g., a business, a person). It is a crucial step in data cleaning and data integration. Meltwater's Knowledge Graph is a graph-structured database where vertices represent business entities like organizations, key persons, industries, stock indices, addresses etc., while edges represent relationships like affiliations, subsidiaries, industry associations or events like mergers and acquisitions etc. Its core is fused from 10+ different knowledge sources that include structured databases like Wikidata or Crunchbase. In this industry-oriented talk, we describe the process of training, deploying and improving our RL solution which utilizes machine learning and big data tactics to cluster millions of related entities from various sources with high accuracy for our Knowledge Graph.