We have a crawling technology that we are using to find all relevant content on specific sites. We need someone who can help us configure those crawlers using regular expressions so that:
1) The crawlers find ALL RELEVANT content. That meaning following all relevant links but not the irrelevant links
2) Extract all relevant metadata in a specific HTML/URL. We have a system for this too but need the regular expressions from you.
We will do this in several countries all around the world so if you perform well here it will be a lot of work to do in the months and years to come.
Applicants to this job will be invited to take a small test before they get selected.
The test has 10 Regular Expression questions and 1 open question, and it takes approximately between 30 minutes and 1 hour. Expressions will be tested with .NET Regex engine, Case Insensitive and Single Line.
We will provide a tool for testing the Regular Expressions during the test.
We are using ASP.NET, so this should be taken slightly into consideration. Take a look at the attached file.