Portfolio/PDF Table Extraction
OCROCR

PDF Table Extraction

A FastAPI application that extracts structured data from PDF files using OpenAI's GPT-4 model. Extracts reference numbers, dates, and amounts from PDF documents. Features regex-based initial extraction, GPT-4 enhanced parsing, result caching, and CORS support for cross-origin requests.

Inquire

Project Duration

Nov 2024Jan 2025

Client

Document Processing ClientNDA

Key Features

PDF text extraction
AI-powered data parsing
Structured data output
Result caching
RESTful API
Cross-origin support

Technology Stack

PythonFastAPIOpenAI GPT-4PyMuPDFPandasRegex

Project Metrics

97%
accuracy
PDF
formats
< 5s
speed

Interested in a Similar Project?

Let's discuss how we can build something amazing for your business.