Secure RAG Assistant
A production-grade Retrieval-Augmented Generation system with role-based document access control — built as a proof-of-concept for enterprise internal knowledge management.
What it does
Users authenticate via SSO (Keycloak / OIDC) and query a document knowledge base through a conversational AI interface. The system enforces data-access policies at the retrieval layer: an employee only ever receives answers grounded in documents their role permits them to see.
Key engineering highlights
Role-scoped RAG retrieval
JWT roles map to a privilege hierarchy at query time via pgvector metadata filtering — access policy is enforced at the vector-search layer, not just the UI.
Multi-layer prompt guard
Zero-cost regex / blocklist pre-screening, Amazon Comprehend toxicity detection, and a canary-word advisor that detects system-prompt exfiltration attempts.
Two-phase LLM pipeline
Tool-first pass (MCP / SSE) with RAG fallback; multi-query expansion (4 variants) improves retrieval recall.
Dual ingestion modes
REST endpoint for local development; SQS consumer in production (S3 event notifications → async chunk-embed-store pipeline).
Multi-module Maven architecture
A shared common library (embeddings, role model) consumed by independent Spring Boot services.
Architecture
┌──────────────────────────────────────────────────────────────┐
│ USER BROWSER │
│ Angular 21 SPA (:4200) │
│ Keycloak PKCE login → Bearer token on each request │
└───────────────────────────────┬──────────────────────────────┘
│ HTTPS + JWT
▼
┌──────────────────────────────────────────────────────────────┐
│ BACKEND (:8080) │
│ │
│ POST /ask ─► PromptGuardService │
│ [1] regex injection patterns (zero cost) │
│ [2] keyword blocklist (zero cost) │
│ [3] Comprehend DetectToxicContent │
│ │ │
│ ▼ │
│ ChatService │
│ ┌ Phase 1: Tool-first ───────────────────┐ │
│ │ ChatClient + MCP tools (no RAG) │ │
│ │ → if LLM answers: return ✓ │ │
│ └ Phase 2: RAG fallback ─────────────────┘ │
│ MultiQueryExpander (4 query variants) │
│ RoleFilterDocumentRetriever │
│ JWT role → RoleHierarchy → pgvector │
│ ContextualQueryAugmenter │
│ CanaryWordAdvisor (leak detection) [4] │
│ EvaluationService (relevancy score) │
│ │
│ POST /upload ─► DocumentUploadService ─► S3 │
│ GET /history ─► SPRING_AI_CHAT_MEMORY + RAG_SOURCES (PG) │
└──────┬─────────────────────────┬─────────────────────────────┘
│ SSE (MCP) │ pgvector query / JDBC
▼ ▼
┌──────────────┐ ┌────────────────────────────┐
│ TOOLS (:8082)│ │ PostgreSQL + pgvector │
│ DocumentAccess │ (:5433) │
│ Tool → S3 │ │ • vector_store │
└──────────────┘ │ • SPRING_AI_CHAT_MEMORY │
│ • RAG_SOURCES │
└────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ INGESTION SERVICE (:8081) │
│ local profile POST /ingest (multipart) ───────────────┐ │
│ aws profile SQS ◄─ S3 event notification │ │
│ HeadObject (metadata) + GetObject │ │
│ TikaDocumentReader → TokenTextSplitter │ │
│ VectorStore.accept() ─► Titan V2 ─► pgvector│
└──────────────────────────────────────────────────────────────┘
┌───────────────┐ ┌────────────────────────────────┐
│ KEYCLOAK │ │ AWS BEDROCK (eu-west-3) │
│ (:8180) │ │ • claude-haiku-4-5 (chat) │
│ realm: │ │ • titan-embed-text-v2 │
│ rag-assistant │ │ (1024-dim embeddings) │
└───────────────┘ └────────────────────────────────┘
┌──────────────────────────────────────────────┐
│ OBSERVABILITY │
│ Prometheus (:9090) ◄─ Spring Boot actuators │
│ Grafana (:3000) ◄─ Prometheus │
│ Jaeger (:16686) ◄─ OTLP traces │
└──────────────────────────────────────────────┘The key architectural insight is the trust boundary at retrieval: security is enforced inside RoleFilterDocumentRetriever — the LLM never receives documents the user isn't permitted to see, regardless of what they ask.
Tech stack
Backend
- Spring Boot 3.5
- Spring AI 1.1
- PostgreSQL + pgvector
- AWS Bedrock — Claude Haiku
- Titan Embeddings V2
- Amazon Comprehend
- S3 · SQS
Frontend
- Angular 21
- TypeScript (strict)
- Signals-first architecture
- Keycloak JS — PKCE / OIDC
Infrastructure
- Docker Compose
- Terraform (IaC)
- AWS ECS Fargate
- Aurora Serverless
- Keycloak 26
Observability
- Prometheus
- Grafana
- Jaeger
- OpenTelemetry tracing
Have a similar project in mind?
Let's talk about how I can design and ship secure GenAI solutions for your organization.