PHP-based text processing web application.
Input (html web form):
Two lists (X and Y) of phrases consisting of a series of words. The length of these lists is variable and the number of words in each phrase can vary from 1-10 words. The first list is complete list of phrases, whereas the second list is a subset consisting of just the phrases which satisfied a criteria.
A third input (Z) of an integer will quantify the level of noise filtering which will occur prior to output.
Output (simple html):
A list of unique 1-, 2-, and 3-word sub-phrases which appear 'Z' number of times in list 'X', without appearing once in list 'Y'.
Business logic:
Each line item in list 'X' is parsed in to all the 1-, 2-, and 3-word sub-phrases possible (preserving word order). These results are recorded into an associative array along with a tally which is incremented for each unique occurance of that sub-phrase in the entire list. Counted word phrases must exist within a single line item and not exist between subsequent line items to be counted in the tally.