Closed

Develop an script to generate OCR with ALTO standards for PDF and other outputs

The main scope of the project is to create a python bash or perl script that process OCR over a master folder of TIFF files or a PDF file.

The workflow will be:

1. Read a Master directory file with .Tiff files uncompressed

2. Start OCR processing

3. Generate Output files:

3.1. Output #1: Master PDF 1.4 (PDF/A-1)

3.2. Output #2: Access PDF 1.4 (PDF/A-1) - (Never bigger than 20mb.)

3.3. Output #2: ALTO XML with all the OCR information

3.4. Output #3: TXT with all the OCR text results

3.5. Output #4: TXT with all the OCR text results

4. Create XML with the info of the process

4.1 XML has to record all the processes and also record any failure on the process.

The project scope is to generate a script on bash, perl or python based on any Open-source OCR tools to do those tasks. To complete the project, the script has to be tested, checked and documented in detail. No chance to make anything different. During the development, the developer has to give us a report of the fields and values we have to work with and we will help on the definition. Can be a long term project if the results are good with much more image processing tasks.

Some valid Open-source tools that can bu used are:

- [login to view URL]

- [login to view URL]

- [login to view URL]

Reference documentation links:

- [login to view URL]

- [login to view URL]

- [login to view URL]

We can provide image samples and also give much more detail of the script to develop.

Other options can be used but need to be validated by us before start the development.

Project will be divided in two milestones, defined as here:

Milestone 1 is to complete:

- Point 1

- Point 2

- Point 3.1

- Point 3.3

Milestone 2 is to complete the project:

- Performance and correct any error of Milestone 1

- Point 3.2

- Point 3.4

- Point 3.5

- Point 4

- Documentation

- Testing

Feel free to ask any question

Skills: Python, XML, Shell Script, OCR, PDF

About the Employer:
( 26 reviews ) Marín, Spain

Project ID: #32722135

14 freelancers are bidding on average $564 for this job

(39 Reviews)
6.8
Devrits

Hi Hiring manager I am Full Stack Principal Architect,Expert Image processing,OCR Most popular spells that I use: C#, Python, .NET, WPF, WCF, VSTO, SQL, OCR, PHP, Java, React.js, Node.js, Laravel. I have more than 15 More

$750 USD in 4 days
(35 Reviews)
6.0
mannanmaan1425

Note: I am available to start right now! Dear Hiring Manager, I have read out your job description and the requirements for the job.I am a professional Artificial intelligence and machine learning programmer . I More

$250 USD in 1 day
(30 Reviews)
5.3
umairkaramat24

Hello, I read your project details and really interested in your mentioned job. I have 5+ years’ experience doing similar jobs related to these skills Python, Shell Script, PDF, XML and OCR. I think its doable job, and More

$750 USD in 8 days
(8 Reviews)
4.1
ayesha0124

Hi there, How r u? I have had a look and i am sure that i can handle this project well as i have experience in XML, PDF, OCR, Shell Script and Python. I have worked on similar projects before too. Please initiate the c More

$750 USD in 24 days
(1 Review)
4.2
(2 Reviews)
4.0
mykrsolovlo

How are you? We have AI & Data Science,Django team who are highly experienced in Machine learning and Deep Learning and can deliver products as per your requirements. We have done many real time projects like Semantic More

$700 USD in 7 days
(5 Reviews)
3.7
demvitalii3

Dear, sir. How are you? ~~~ Computer Vision Professional is here. ~~~ I've a good interest about your project as a computer vision professional who has been specializing in this field for over 8 years. I recently devel More

$700 USD in 7 days
(8 Reviews)
3.2
yinshu2020

----------------Professional OCR Expert! Best Result in Time!----------- Dear sir. I've read your project description very carefully. I've extensive experience in OCR, so I believe that I can provide excellent result i More

$500 USD in 7 days
(2 Reviews)
2.3
vladilavsuhovoy1

Hi! I am an expert Python engineer. I am familiar with Python and I have a lot of work experiences in OCR, XML, Shell Script, Python and PDF. I can start right away. I want to discuss for this project in detail. Plea More

$500 USD in 5 days
(2 Reviews)
1.5
aafreenkhan208

Hi, I'm Aafreen Khan! Hope you’re doing well. I'll complete your project in the way you'll fall in love with because I've been working as a Full Stack Web & Software developer for 5 years. I provide end to end solutio More

$500 USD in 7 days
(0 Reviews)
0.0
aiworks4u

Hello to Spain, your project sounds very interesting for me and i could already realize numerous comparable projects. I have a lot of experience with your required tasks like: - OCR (OpticalDocumentRecognition More

$251 USD in 3 days
(0 Reviews)
0.0
SyedAdeel2020

Greetings! I have reviewed your project details and as you need. I believe that I can assist you with this project. I have been working as a website developer for the past 8+ years and I have impressed a lot of my cl More

$500 USD in 7 days
(0 Reviews)
0.0
kofanovsan

Hello, Jorge! I have read your project requirements very carefully and with great interest. I am a python expert with 5 years of experience. Recently, I have performed tasks to interpret pdf documents with python. http More

$500 USD in 5 days
(0 Reviews)
0.0