Hello
I need a scraper for [login to view URL]
Go on the website and have a look so that the below requirements will be clear to you.
It must be able to get as input two parameters from a file, the inputs are:
settore (category) provincia (province) localita (location)
it must be able to read sequentially the inputs from the file
Note that localita or provincia can be blank (one at the time not both)
example
settore,provincia,localita
finance,rome
commerce,rome
finance,,milan
commerce,,milan
and so on.
The website is structured in two levels, the results of the search are links to a webpage with the company detail, on this page there is another link called "mini dossier" the scraper must follow the link and get all the info on that page, the only info I need are the ones on this "mini dossier" page.
They output must be in CSV with | separator and must be formatted so that the fields are all always lined up,
with that I mean that since not all the records have the same info when the info is missing then the relative field is left blank.
In this way all the records will have the same fields so that the data is easier to manipulate
The output will have these fields
settore (from the search term)
provincia (from the search term)
localita (from the search term)
Name
Address
Tel
Fax
Web
Email
Classificazione
Codice Fiscale/Iscr. Reg. Imprese
(the above ones from the mini dossier page)
The scraper must understand when it needs to move to the second input in the input file.
The scraper should also have a way to limit the speed of crawling, example 5 requests for second, or minute etc.
I would like a scraper that runs on windows like a normal application but I'm open to linux applications etc
The script must be robust, e.g do not crash if the website returns an error or something.
I will escrow 30% before the demo and once you have a demo that works well I'lI escrow the rest
I'll release only when it is all working fine
The demo must be able to fetch 150 records so that I can test all the functions
The job is actually easier than it looks.
this is a small project so keep the bidding low (below 100)
Thanks and best regards