Write some Software

Cancelled Posted Feb 25, 2014 Paid on delivery
Cancelled Paid on delivery

We are mining a database of English articles and need a Perl programmer to assist us in generating some simple Perl scripts for this purpose.

The following task is one of the many upcoming projects and we a looking to hire somebody with a longer term employment relationship in mind.

The task:

We have a large tab-delimited file, the 6th columns of which contains the data of our interest. On the 6th column of each row is a set of comma-separated English words that we have already reduced to their dictionary form. We require a script that loops through all the sets of words and generates the following statistics as output.

1. A list of all unique words present over the entire column. Associated with each unique word should be term frequency, term rate (term frequency of current word divided by total number of all words in the file), document frequency (number of rows the word appears in) and document rate (document frequency of current word divided by number of rows).

2. All of the above statistics, but this time for bi-grams: bi-gram frequency, bi-gram rate, and document frequency and rate for bi-grams. Bi-grams are neighboring pairs of words. For example in the previous sentence, the bi-grams would be (all,of), (of,the), (the,above), etc. but NOT (of,all).

Please include both cost and duration estimates in your application. Please also include a brief sample of your previous Perl code.

Perl

Project ID: #5486715

About the project

Remote project Active Feb 25, 2014