Find Jobs
Hire Freelancers

RSS cataloger

$100-500 USD

Completed
Posted over 16 years ago

$100-500 USD

Paid on delivery
We seek a developer or development team to implement part of a technology proof-of-concept. This element will scrape ~1,000 RSS feeds into a set of MySQL databases. Bidders should be familiar with Python and/or Ruby, MySQL, and RSS/XML. ## Deliverables *1. Overview* We seek a developer or development team to implement part of a technology proof-of-concept. This element will scrape ~1,000 RSS feeds into a set of MySQL databases. *2. Specific Functionality* * Site list: Create a directory of URLs in a MySQL database, including fields for time of last update and update interval. This list should be populated from Bloglines top 1,000 most popular RSS feeds <[login to view URL]>, or a similar source if one is available. Each URL/record should also be assigned a unique ID. * Update scheduler: A procedure/application that queues feeds to be checked for updates into a MySQL database, taking update interval as an input as follows: when the time elapsed since a URL was last updated exceeds update interval, that URL ID should be added to the bottom of the queue. There should also be a throttle variable that limits the total number of requests per day. * Update checker: A robot that takes the next URL in MySQL schedule DB and checks that URL for any updates. This should first try to check for changes to the XML LastUpdateDate field, and if the feed does not use that then should compare the most recent entry for that URL to the one stored to see if it has been updated. * Parser: Upon detection of an update, the parser should dump **only** the new/updated XML items into another MySQL database, including the full Description field for each item if available. In addition to the XML fields, this DB will include a unique ID for each item (i.e. blog post/article), time, and a link to the unique URL ID it was sourced from. * Summarizer: For each item (i.e. blog post/article) parsed, this will provide a summary word count of the Title and Description (i.e. the number of times each word appears). This will be stored in another MySQL database. These modules do not necessarily reflect an architecture preference, but simply the necessary features of the application. It is fine if all of the functions exist as procedure within a single application, for instance. Thus, the application will use 4 different MySQL databases (which should be created as part of this project): | **Table** | **Fields** | **Notes** | | URL directory | URL, ID, lastUpdated, updateInterval | | | Update scheduler | URLID, updateRequestID | | | RSS entry database | Fields for each XML field parsed, ID, sourceURL, timeStamp | | | RSS content summary | Word, Count, RSSEntryID | links to RSS entry DB | *3. Implementation Requirements* The application should be written in either Python or Ruby and MySQL. It is fine (and indeed encouraged) to use [Feed Parser][1] for initial parsing of the URLs (although this does not parse description fields which is a key requirement). The application will run on Amazon Web Services (AWS), which is a Redhat Linux (Fedora 6) x86 environment. *4. Deadline and deliverables* All code must be clearly commented, and source code, scripts, and databases should be delivered as a single zipped tarball. The deadline for receipt of a full, initial working program will be 5 business days from the start of the project. After testing by us which will take from 2-10 business days, revisions will be noted and final code will be due 5 business days later.
Project ID: 3737784

About the project

3 proposals
Remote project
Active 16 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
See private message.
$425 USD in 14 days
5.0 (10 reviews)
3.9
3.9
3 freelancers are bidding on average $361 USD for this job
User Avatar
See private message.
$233.75 USD in 14 days
5.0 (79 reviews)
6.6
6.6
User Avatar
See private message.
$425 USD in 14 days
5.0 (20 reviews)
4.9
4.9

About the client

Flag of UNITED STATES
United States
5.0
1
Member since Jan 27, 2008

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.