For a research project, I would need to find identifiers (a tax code, called "YTJ", made up of 8 numbers) for a list of roughly 7700 Finnish companies. In addition, I would also need to know whether the names of the company have changed over time.
I have written some R code, using rvest and RSelenium for the newspaper webpage, and jsonlite for the API, to do this. It would be greatly appreciated if you could alter this code such that I can continue to use it later if the list of the companies I need would change. I also would really like to learn more about how to do this, and from your coding.
I found out that the information of YTJ and namechanges is available from two sources: an API (link: [login to view URL]) and a webpage of a Finnish newspaper (which seems actually linked to the same API mentioned above) (link: [login to view URL]).
I already wrote some R-code, but haven't finalized it and of course would appreciate if could be continued--however, if it is absolutely useless of course it's great if you write your own.
I would look forward to your help and to hearing from you! Below an example of what I have done so far--I'll be happy to clarify more if helpful.
**********
On the example of Nokia:
1) First, I try to F
- On [login to view URL] I search for "Nokia"
- the result page shows a lot of "Nokia"-companies, and their Y-tunnus (also called "YTJ")--this is the number/identifier I need
- clicking on for instance "Nokia Oyj", there are a list of characteristics. In the attached two images img1 and img2, you can see the information I would like to have, from the tables "Perustiedot" and "Edellinen nimitieto". The second shows whether there has been a name change in the past.
Problems I can't solve:
- I could not figure out how to loop through my company list such that if the search engine doesn't find a result, it just goes to the next item in my list.
- I didn't find out how I could loop through the results page to get the info of all "Nokias" (for instance, "Nokia Technologies Oy")
- the second table of the name changes "Edellinen nimitieto" --some companies have it and some don't, and on different "spots"
2) From the API
- on the page [login to view URL] I can type "Nokia" under "name"
- When clicking "try it out" I receive a code of Nokia companies (Img3 for what it looks like)
- Using jsonlite, I turn this file in to a table of information..
Problems I can't solve:
- loop through my company list using this page
- the information in json-script includes a URL to the additional info--which should include the namechange.
Other problems:
- some firms cant be found because there are typos or there are company forms like "Ltd" or "Oy" that are useless and hinder the search results. I tried to just take those formulas out, but was wondering if one could also experiment with substring searches or something....
Attached:
- the list of firms as "[login to view URL]" with 7744 names in it
- Images of the search outcomes
- my R code so far in
Hi,
This is Santhosh from India. I am a Business Management Graduate with Computer Science engineering background who is passionate about Data, Mathematics and Technology.
I have worked in many data analytics projects involving SAS & R. Also I have worked in projects involving Excel (Pivot Table, Macros) and VBA.
I believe I have enough expertise in R to complete this project as per your requirement. I am sure that you will be 100% satisfied with my work. Please get in touch. Looking forward to hear from you.
Thanks & Regards,
Santhosh
€155 EUR in 5 days
5.0 (62 reviews)
6.0
6.0
7 freelancers are bidding on average €150 EUR for this job
Hi sir I am Hasan Jack and I can help you to scrape data from website for company information as I have made 300+ Scrappers so far.
Looking forward to hear from you.
Best Regards,
Hasan Jack