I am looking for someone to create an advanced search engine, where instead of a keyword, the user enters in a URL of a website and/or multiple URLs of a set of similar websites. The output of the search engine is a listing of all other websites that are similar, ranked by the level of similarity. I expect that a basic crawler will be used and the difficult part of the project will be building the engine that predicts similarity. All of the input URLs will be within a specific industry sector that has roughly 500,000 websites associated with it, so the trick will be how to find the 500,000 websites to begin with and then how to predict similarity. I have many ideas, but I need somebody who has done similar projects and who is a perl and MySQL expert. I need the code to be structured into logical groups that we can discuss, as I want to reuse the code for several different ongoing projects. Please respond with your level of experience in this type of work and point me to similar projects that you have done. Also, any questions that you might have on the project. The code must be installed on my host and be fully documented.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables):
a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
AMS Computer is my host. It runs on a Linux platform and the code must use only Perl and MySQL unless agreed to differently. Any use of GPL code must be agreed to in advance of the project. Browser interface must be configured to run on any browser.