Data extraction from 3 different sources: PDF, HTML, and Word Files
$30-250 USD
Paid on delivery
I need an interface developed that would allow a user to upload a file in PDF, Doc Or Docx, and HTML format. Once uploaded, the PHP page will extract information from each of the three types of files and store the information in a CSV file that is comma separated. Even though the three files are in different format, the same type of information will be extracted from each. This will create a consistent out put in the CSV file.
I attached a copy of the HTML, Word, and PDF Files to the project for viewing. The information that needs to be extracted from each of the document is the following:
● Class name: e.g. Sr. Puppy (9-12 Months) – Male
● Armband number – 2 to 4 digits
● Dog name
● Registration number: alphanumeric or “Listed”
● Date of Birth
● Class Placement: may be blank
● Breeder name: can be more than one name
● Sire and Dam: (parents) format is name of sire X name of dam
● Place of birth: Canada or Elsewhere
● Owner name: can be more than 1 person
● Agent name: optional
Using the word document format as an example, the following would be an example of what is to be extracted:
(Section from Word Document)
Sr. Puppy (9-12 Months) - Male
102 GRASSRIDGE I AM A ROCK, AE499458, 04-Mar-2013
1ST Breeders: Denise Cranna. Ch. Malhaven Skyrockets In Flight x Ch. Grassridge Heavenly Grace. Canada. Owner: Karen IBBITSON, Denise CRANNA. Agent: Ingrid WINKLER
Information to be extracted:
Class Name: Sr. Puppy (9-12 Months) – Male
Armband number: 102
Dog name: Grassridge I Am A Rock
Registration No: AE499458
DOB: 04-March-2013
Class Placement: 1st
Breeder Name: Denise Cranna
Sire & Dam: Ch. Malhaven Skyrockets In Flight x Ch. Grassridge Heavenly Grace
Place of Birth: Canada
Owner: Karen Ibbitson, Denise Cranna
Agent: Ingrid Winkler
Project ID: #5386847
About the project
Awarded to:
Hello, We have gone through the scope of work and would be happy to provide complete solution on this application that will extract data from uploaded file and future support also if required. Let's take it to th More
9 freelancers are bidding on average $311 for this job
Hello, With 99% completion rate, 650+ successfully completed projects, and a 5.00 reputation (maximum possible, 5.0) (Yes, not even 4.99 average rating, can be verified on my profile page !!)... you can never go wro More
Hi there, i have major Experience in php and scraper. i will create for your to extract the data from 3 of those format. but there are some limitation to extract. the giving files will be format as you describe. Tha More