Skip to main content

Intelligent Document Summarizer API

An AI-powered REST API that summarizes PDF, DOCX, and TXT documents, extracts keywords, and answers natural language questions — backed by Groq LLaMA 3.3 70B with an offline transformer fallback.

Visit website
  • API Design
  • NLP Engineering
  • Python / FastAPI
  • Deployment (Railway)
Intelligent Document Summarizer API interface screenshot

Overview

A production-ready REST API that accepts uploaded documents in PDF, DOCX, or plain-text format and returns concise abstractive summaries, ranked keyword lists, and answers to natural language questions about the document content. The system is designed for high availability — when Groq's cloud API is unreachable, it falls back automatically to a locally hosted HuggingFace transformer model.

Summarisation Pipeline

Documents are first parsed with PyMuPDF (PDF) or python-docx (DOCX), then chunked into context windows appropriate for the model. The Groq LLaMA 3.3 70B model is prompted to produce abstractive summaries — condensing meaning rather than extracting sentences verbatim — with configurable output length.

Keyword Extraction & QA

Keywords are ranked using a TF-IDF scoring approach, surfacing the most distinctive terms in each document. The question-answering endpoint passes the user's query alongside the full document context to the LLM, grounding answers in actual content and preventing hallucination.

Offline Fallback

When the Groq API is unavailable, the system automatically switches to a local HuggingFace transformer model for summarisation, ensuring the API remains functional without external dependencies. This dual-mode design makes the service resilient for production use.

Tech Stack

Python · FastAPI · Groq LLaMA 3.3 70B · HuggingFace Transformers · PyMuPDF · python-docx · TF-IDF · Railway