Find Jobs
Hire Freelancers

PDF Data Extraction and Storage in SQL with OCR and Password Removal

₹100-500 INR / hour

Closed
Posted 12 months ago

₹100-500 INR / hour

Overview This project aims to develop a comprehensive C# application that can read PDF files from a specified location, extract data, implement OCR techniques for image-based files, remove password protection, and store the extracted data in an SQL database. A Web API will be created to facilitate these activities. Objectives Read PDF files from a specific location Extract data from PDF files Implement OCR techniques for image-based PDFs Remove password protection from PDF files Store extracted data in an SQL database Develop a Web API to perform these activities Implement a user authentication and authorization system for the Web API Create logging and error handling mechanisms Develop a monitoring system for the application Required Libraries and Tools Visual Studio 2019 or later .NET Core 3.1 or later iText 7 for .NET (PDF processing) Tesseract OCR (OCR processing) [login to view URL] (Entity Framework Core for SQL) [login to view URL] (Web API) [login to view URL] (Logging) [login to view URL] (Health monitoring) IdentityServer4 (Authentication and Authorization) Project Structure 4.1. PDFDataExtractor [login to view URL] [login to view URL] 4.2. [login to view URL] [login to view URL] [login to view URL] [login to view URL] 4.3. [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] 4.4. [login to view URL] [login to view URL] Controllers [login to view URL] [login to view URL] Implementation Steps 5.1. Read PDF files from a specific location Implement a method that reads PDF files from a specified location using iText 7. 5.2. Extract data from PDF files Implement a method to extract text from PDF files using iText 7. 5.3. Implement OCR techniques for image-based PDFs Implement a method to recognize text from image-based PDF files using Tesseract OCR. 5.4. Remove password protection from PDF files Implement a method to remove password protection from PDF files using iText 7. 5.5. Store extracted data in an SQL database Use Entity Framework Core to store the extracted data in an SQL database. 5.6. Develop a Web API to perform these activities Create a Web API using .NET Core and expose endpoints to perform the data extraction and storage process. 5.7. Implement a user authentication and authorization system for the Web API Use IdentityServer4 to create an authentication and authorization system for the Web API. 5.8. Create logging and error handling mechanisms Implement logging using [login to view URL] and create custom error handling middleware. 5.9. Develop a monitoring system for the application Implement health checks using Microsoft.Extensions.Diagnostics.HealthChecks. Testing Unit tests for service classes Integration tests for Web API endpoints Load testing for the application Deployment Deploy the Web API to a hosting provider (e.g., Azure, AWS, or Heroku) Implement CI/CD pipelines using Azure DevOps, GitHub Actions, or other similar tools Documentation Write comprehensive documentation for the project, including how to use the Web API, any limitations, and authentication/authorization details.
Project ID: 36356307

About the project

10 proposals
Remote project
Active 11 mos ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
10 freelancers are bidding on average ₹238 INR/hour for this job
User Avatar
Hello I am a professional python developer. My main specializations are automation, web scrapers and bots development. I have already developed over 200 scrapers. From the simplest (for example, a competitor's price collector) to complex parsers (with authorization, bypassing captcha, rotating ips and others) which can collect millions of products from amazon. I have done web scrapers for: - Amazon - Instagram - Facebook - Google - Twitter - LinkedIn - Pinterest - Walmart - And many others For scraping I use: - Python - Requests - BeautifulSoup - Selenium - Scrapy - Pyautogui - Undetected Chromedriver - Rotating ips I can bypass: - CloudFlare - IP blocking - Captcha - Authorization required - Other limitations Django / PostgreSQL For big scraping projects I usually use Django with PostgreSQL. This allows us to store information in a database for further processing and use. I also set up an administration area which allows us to check the data and set up scraper configs. If you need a professional solution in this area - I am ready to cooperate. I am ready to make a sample script before we start Regards, Oleg
₹100 INR in 40 days
4.9 (9 reviews)
4.1
4.1
User Avatar
Hello Amar G., We would like to grab this opportunity and will work till you get 100% satisfied with our work. We are an expert team which have many years of experience on Python Lets connect in chat so that We discuss further. Regards
₹300 INR in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I read your requirement and I'm almost crystal clear with the requirements but once would like to discuss things in detail so we can conclude that we both are on the same page of understanding. I am a Senior Devoloper have many years of experience on Python, .NET, SQL, C# Programming, Microsoft SQL Server Lets connect in chat so that We discuss further. Regards
₹300 INR in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
As a skilled data analyst, I possess a strong proficiency in various analytical tools such as PowerBI, Tableau, Excel, SQL, Python, and Machine Learning. With my expertise in data analysis and visualization, I am adept at transforming complex data sets into insightful and actionable insights for decision-making purposes. My ability to utilize these tools effectively enables me to efficiently analyze and visualize data, making it more accessible and understandable for stakeholders. Whether it's identifying trends, uncovering patterns, or creating data-driven solutions, my skillset allows me to derive valuable insights that help organizations achieve their objectives. Overall, I am a highly motivated and detail-oriented data analyst with a strong passion for utilizing data to drive business outcomes.
₹250 INR in 40 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, Dear I went through your project description and it seems like me is a great fit for this job. I am an expert which have many years of experience on Django, Amazon Web Services Lets connect in chat so that We discuss further. Thanks ! Best Regards SHAHRUKH GHAFFAR
₹100 INR in 36 days
0.0 (0 reviews)
0.0
0.0
User Avatar
This project aims to develop a comprehensive C# application that can read PDF files from a specified location, extract data, implement OCR techniques for image-based files, remove password protection, and store the extracted data in an SQL database. A Web API will be created to facilitate these activities. The project requires expertise in C# programming, .NET, SQL, and Microsoft SQL Server. The implementation steps include reading PDF files, extracting data, implementing OCR techniques, removing password protection, storing data in SQL database, developing a Web API, implementing user authentication and authorization, creating logging and error handling mechanisms, developing a monitoring system, testing, deployment, and documentation. The project requires experience in unit testing, integration testing, and load testing, as well as knowledge of deployment and CI/CD pipelines. The final deliverable will include comprehensive documentation on how to use the Web API, limitations, and authentication/authorization details.
₹134 INR in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
Dear Client, I am excited to submit my proposal for your project. As a highly skilled developer with extensive experience in C#, PDF processing, OCR, and SQL databases, I am confident that I can develop a comprehensive application that meets all your requirements. My proposed solution includes building a C# application that can read PDF files from a specified location, extract data using OCR techniques for image-based files, remove password protection, and store the extracted data in an SQL database. I will use advanced OCR technologies such as Google Cloud Vision API, Microsoft Cognitive Services, and Tesseract OCR to ensure high accuracy in data extraction. To ensure the security and reliability of the application, I will use Microsoft SQL Server for data storage and implement robust error handling and logging mechanisms. The application will also feature a Web API to facilitate data extraction and storage from any device with internet access. The Web API will be built using .NET Core and adhere to best practices for security, scalability, and performance. Throughout the project, I will communicate regularly with you to ensure that the application meets all your requirements and exceeds your expectations. I am committed to delivering a high-quality, user-friendly application within the agreed-upon timeframe and budget. Thank you for considering my proposal, and I look forward to hearing from you soon. Best regards, Kundan Kumar
₹300 INR in 40 days
0.0 (1 review)
0.0
0.0
User Avatar
I am an experienced .NET professional from Kerala, India. I have worked on various projects such as online delivery apps, online hotel booking systems, accounting-related projects, etc. I am confident in my abilities and believe that I can provide excellent results for your project. I am always up for a challenge and enjoy working on projects that involve complex functionalities and security systems. Programming is my passion and I find myself fully immersed in it. I enjoy the challenge of finding solutions to complex problems and am constantly seeking to learn more. Although my expertise lies in C# and .NET technologies, I am not as knowledgeable in UI but not bad. I hope this brief introduction gives you an idea of my skills and abilities. If you need further information, please feel free to go through my profile. Thank you
₹300 INR in 25 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of INDIA
New Delhi, India
5.0
10
Payment method verified
Member since Jan 7, 2019

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.