Data extraction from 3 different sources: PDF, HTML, and Word Files

Completed Posted Feb 2, 2014 Paid on delivery
Completed Paid on delivery

I need an interface developed that would allow a user to upload a file in PDF, Doc Or Docx, and HTML format. Once uploaded, the PHP page will extract information from each of the three types of files and store the information in a CSV file that is comma separated. Even though the three files are in different format, the same type of information will be extracted from each. This will create a consistent out put in the CSV file.

I attached a copy of the HTML, Word, and PDF Files to the project for viewing. The information that needs to be extracted from each of the document is the following:

● Class name: e.g. Sr. Puppy (9-12 Months) – Male

● Armband number – 2 to 4 digits

● Dog name

● Registration number: alphanumeric or “Listed”

● Date of Birth

● Class Placement: may be blank

● Breeder name: can be more than one name

● Sire and Dam: (parents) format is name of sire X name of dam

● Place of birth: Canada or Elsewhere

● Owner name: can be more than 1 person

● Agent name: optional

Using the word document format as an example, the following would be an example of what is to be extracted:

(Section from Word Document)

Sr. Puppy (9-12 Months) - Male

102 GRASSRIDGE I AM A ROCK, AE499458, 04-Mar-2013

1ST Breeders: Denise Cranna. Ch. Malhaven Skyrockets In Flight x Ch. Grassridge Heavenly Grace. Canada. Owner: Karen IBBITSON, Denise CRANNA. Agent: Ingrid WINKLER

Information to be extracted:

Class Name: Sr. Puppy (9-12 Months) – Male

Armband number: 102

Dog name: Grassridge I Am A Rock

Registration No: AE499458

DOB: 04-March-2013

Class Placement: 1st

Breeder Name: Denise Cranna

Sire & Dam: Ch. Malhaven Skyrockets In Flight x Ch. Grassridge Heavenly Grace

Place of Birth: Canada

Owner: Karen Ibbitson, Denise Cranna

Agent: Ingrid Winkler

Data Mining Data Processing PHP

Project ID: #5386847

About the project

9 proposals Remote project Active Feb 17, 2014

Awarded to:

techmartsol

Hello, We have gone through the scope of work and would be happy to provide complete solution on this application that will extract data from uploaded file and future support also if required. Let's take it to th More

$149 USD in 3 days
(41 Reviews)
6.0

9 freelancers are bidding on average $311 for this job

rajeshsonisl

Hello, With 99% completion rate, 650+ successfully completed projects, and a 5.00 reputation (maximum possible, 5.0) (Yes, not even 4.99 average rating, can be verified on my profile page !!)... you can never go wro More

$1030 USD in 4 days
(499 Reviews)
7.5
programer786

Hi there, i have major Experience in php and scraper. i will create for your to extract the data from 3 of those format. but there are some limitation to extract. the giving files will be format as you describe. Tha More

$211 USD in 7 days
(53 Reviews)
5.7
kollurnath

A proposal has not yet been provided

$155 USD in 3 days
(0 Reviews)
0.0