Brand-Company Name Matching Dataset

In Progress Posted 3 months ago Paid on delivery
In Progress

Challenge: We need to combine two large datasets. One dataset contains information on 170.000 COMPANIES, with their legal names (in the blue column), and the other database includes only about 10.000 BRANDS, which have shorter names (more like the way people actually know the brands, e.g., from the supermarket shelf). Now, we need to MATCH the two databases by looking at which company names (in one dataset) actually belong to what brands (in the other dataset). Only then we can achieve our goal and “connect the dots”.

Your task: The attached table is the outcome of this automated process. We used an algorithm that searched for the brand name (left, yellowish column) WITHIN the longer legal company name. Every time the brand name (or a version of it, which we defined) shows up in the other name, it ended up as one row in the table. Is the fact that the brand name is part of the company name a PROOF that the two names mean the same company? No. Some brand names are words that could be found in maaaany other company names. The algorithm cannot assess this. So, we need to check all of these “potential matches” for correctness. Do the brand and the company name belong together (=1) or not (=0).

Sometimes the names may sound very similar. It is a good idea to also quickly see if the “sector” information for the brand and the company are similar. The sector labels in the two databases are not the same. So, a brand might be in the category “Online brand” in the one database, but in the category “E-commerce” in the other database. This is basically the same. Only if the sector information seems wildly different (e.g., “Insurance” vs. “Food”), you can use this as a sign that the brand and the company are probably not the same.

By the way, the country of the company does NOT matter, only if they somehow belong together. For example, “Volkswagen” and “Volkswagen China” belong together. If you are not sure, you can mark these cases in the last column “not sure”, and check these cases later by collecting additional information, for example googling the names.

I coded the first 200 cases to give you a first impression of how the coding could look like. Many 0 for not matching cases, some 1 for cases where the brand and the company seem to belong together, and some “not sure” cases, which call for more information later on.

Freelancers with experience in data matching and analysis tasks are highly sought here, especially if you've dealt specifically with company and brand name data. Please make sure to discuss any relevant experience in your proposal. Expect fast communication and feedback from me throughout the duration of this project. Looking forward to some great collaborations!

The project is limited to the first 2,000 entries. After reviewing the quality of the matching, the project can be renewed/extended for the rest of the list.

Classification Data Entry Excel Research Web Search

Project ID: #37744835

About the project

22 proposals Remote project Active 3 mos ago