← Back to Projects

Financial Research Agent

A retrieval-augmented research assistant for BFSI documents that extracts answers from financial PDFs using semantic search and two-pass LLM reasoning, augments incomplete responses with external data, and scores every answer for confidence and provenance — enabling analysts to move from document to decision with traceable, source-backed insights.

Tags & Technologies

RAG Semantic Search LLMs Agentic AI Confidence Scoring Explainable AI AWS Bedrock Llama 3.3 70B Amazon Titan Embeddings Annoy SerpAPI Python Streamlit BFSI

Key Impact & KPIs

Project Overview

1. BFSI Document Ingestion Pipeline

Designed a document ingestion pipeline for BFSI materials — annual reports, regulatory filings, and investment research — that extracts text via PyPDF2, segments it into overlapping chunks, and embeds each chunk through Amazon Titan for semantic retrieval.

2. Two-Pass Retrieval-Augmented Generation

Implemented a two-pass RAG workflow where the first pass synthesizes answers strictly from internal document evidence via Llama 3.3 70B, and a second pass — triggered only when coverage is incomplete — augments the response with external data from SerpAPI.

3. Confidence Scoring and Hallucination Safeguards

Built a verification layer that computes weighted confidence across semantic similarity, source quality, evidence coverage, and consistency, while flagging numeric contradictions, outdated references, and unsourced claims — providing structured quality signals before results reach the analyst.

4. Per-Document Semantic Memory

Developed a per-document memory system that persists Q&A pairs with embeddings, indexes them via Annoy for similarity lookup, and injects relevant prior answers into synthesis prompts — eliminating redundant LLM calls and preserving continuity across research sessions.

5. Interactive Research Interface

Delivered a multi-interface research tool with a Streamlit dashboard for document upload and query execution, a CLI for programmatic access, and a nine-scenario evaluation harness — demonstrating operationalization of RAG for financial document analysis under practical latency and trust constraints.

Model Selection Rationale