Back to projects Project · GenAI

Secure RAG Assistant

A production-grade Retrieval-Augmented Generation system with role-based document access control — built as a proof-of-concept for enterprise internal knowledge management.

Enterprise PoC RAG / GenAI AWS Bedrock

Source code Get in touch

What it does

Users authenticate via SSO (Keycloak / OIDC) and query a document knowledge base through a conversational AI interface. The system enforces data-access policies at the retrieval layer: an employee only ever receives answers grounded in documents their role permits them to see.

Key engineering highlights

Role-scoped RAG retrieval

JWT roles map to a privilege hierarchy at query time via pgvector metadata filtering — access policy is enforced at the vector-search layer, not just the UI.

Multi-layer prompt guard

Zero-cost regex / blocklist pre-screening, Amazon Comprehend toxicity detection, and a canary-word advisor that detects system-prompt exfiltration attempts.

Two-phase LLM pipeline

Tool-first pass (MCP / SSE) with RAG fallback; multi-query expansion (4 variants) improves retrieval recall.

Dual ingestion modes

REST endpoint for local development; SQS consumer in production (S3 event notifications → async chunk-embed-store pipeline).

Multi-module Maven architecture

A shared common library (embeddings, role model) consumed by independent Spring Boot services.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                        USER BROWSER                          │
│                    Angular 21 SPA (:4200)                    │
│        Keycloak PKCE login → Bearer token on each request    │
└───────────────────────────────┬──────────────────────────────┘
                                │ HTTPS + JWT
                                ▼
┌──────────────────────────────────────────────────────────────┐
│                       BACKEND  (:8080)                       │
│                                                              │
│  POST /ask ─► PromptGuardService                             │
│                [1] regex injection patterns    (zero cost)   │
│                [2] keyword blocklist           (zero cost)   │
│                [3] Comprehend DetectToxicContent             │
│                      │                                       │
│                      ▼                                       │
│              ChatService                                     │
│                ┌ Phase 1: Tool-first ───────────────────┐    │
│                │  ChatClient + MCP tools (no RAG)        │    │
│                │  → if LLM answers: return ✓             │    │
│                └ Phase 2: RAG fallback ─────────────────┘    │
│                   MultiQueryExpander (4 query variants)      │
│                   RoleFilterDocumentRetriever               │
│                     JWT role → RoleHierarchy → pgvector     │
│                   ContextualQueryAugmenter                   │
│                   CanaryWordAdvisor (leak detection)   [4]   │
│                   EvaluationService (relevancy score)       │
│                                                              │
│  POST /upload ─► DocumentUploadService ─► S3                 │
│  GET  /history ─► SPRING_AI_CHAT_MEMORY + RAG_SOURCES (PG)   │
└──────┬─────────────────────────┬─────────────────────────────┘
       │ SSE (MCP)               │ pgvector query / JDBC
       ▼                         ▼
┌──────────────┐    ┌────────────────────────────┐
│ TOOLS (:8082)│    │  PostgreSQL + pgvector      │
│ DocumentAccess    │  (:5433)                    │
│ Tool → S3    │    │  • vector_store             │
└──────────────┘    │  • SPRING_AI_CHAT_MEMORY    │
                    │  • RAG_SOURCES              │
                    └────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                     INGESTION SERVICE  (:8081)               │
│  local profile  POST /ingest (multipart) ───────────────┐    │
│  aws profile    SQS ◄─ S3 event notification            │    │
│                   HeadObject (metadata) + GetObject     │    │
│                   TikaDocumentReader → TokenTextSplitter │    │
│                   VectorStore.accept() ─► Titan V2 ─► pgvector│
└──────────────────────────────────────────────────────────────┘

┌───────────────┐   ┌────────────────────────────────┐
│ KEYCLOAK      │   │  AWS BEDROCK  (eu-west-3)       │
│ (:8180)       │   │  • claude-haiku-4-5 (chat)      │
│ realm:        │   │  • titan-embed-text-v2          │
│ rag-assistant │   │    (1024-dim embeddings)        │
└───────────────┘   └────────────────────────────────┘

┌──────────────────────────────────────────────┐
│  OBSERVABILITY                               │
│  Prometheus (:9090) ◄─ Spring Boot actuators │
│  Grafana    (:3000) ◄─ Prometheus            │
│  Jaeger     (:16686) ◄─ OTLP traces          │
└──────────────────────────────────────────────┘

The key architectural insight is the trust boundary at retrieval: security is enforced inside RoleFilterDocumentRetriever — the LLM never receives documents the user isn't permitted to see, regardless of what they ask.

Tech stack

Backend

Spring Boot 3.5
Spring AI 1.1
PostgreSQL + pgvector
AWS Bedrock — Claude Haiku
Titan Embeddings V2
Amazon Comprehend
S3 · SQS

Frontend

Angular 21
TypeScript (strict)
Signals-first architecture
Keycloak JS — PKCE / OIDC

Infrastructure

Docker Compose
Terraform (IaC)
AWS ECS Fargate
Aurora Serverless
Keycloak 26

Observability

Prometheus
Grafana
Jaeger
OpenTelemetry tracing

Have a similar project in mind?

Let's talk about how I can design and ship secure GenAI solutions for your organization.

Get in touch Source code