← All case studies US SaaS · Internal Engineering Knowledge Base

Engineering Knowledge Base —
Onboarding 2 Weeks → 3 Days

A 60-person US SaaS engineering team had three years of accumulated documentation that nobody could find. Runbooks, architecture decisions, incident post-mortems — all written down, none of it accessible. New engineers spent their first two weeks pinging senior devs. Senior devs were the search engine.

RAG Systems LangChain FastAPI Confluence API Slack Integration

The situation

A 60-person US SaaS engineering org had accumulated three years of documentation: Confluence spaces covering architecture decisions, system design records, incident post-mortems, onboarding guides, API references, and operations runbooks. The knowledge existed. The problem was finding it.

In practice, new engineers learned the system by asking the people who already knew it. Every new hire's onboarding looked the same: two weeks of pairing sessions, Slack questions to senior engineers, and manual document hunting. Senior engineers answered the same questions repeatedly — questions that already had written answers, buried in a Confluence page nobody could locate quickly enough.

As the team scaled hiring, this pattern became expensive. The senior engineers most capable of doing complex work were spending 6–8 hours per week fielding questions that a well-built internal search tool should have answered in seconds. Onboarding new hires was starting to compress the team's delivery capacity on core product work.

The company had tried Confluence's native search. It returned pages, not answers — and it ranked by edit date, not relevance. They'd looked at off-the-shelf AI search tools, but none could be configured around their specific content structure and access control requirements without significant engineering effort. They needed a system built around how their knowledge was actually organized.

What we built

A production RAG system that ingested their full documentation corpus, made it queryable via natural language, and surfaced citations so engineers could navigate to the original source — not just read a summary and wonder if it was current.

Multi-source ingestion pipeline — a scheduled Airflow job pulls content from Confluence (via REST API), GitHub wikis, internal markdown documentation repositories, and architecture decision record (ADR) files committed to their monorepo. Each document is chunked, embedded, and indexed with source metadata (space, page ID, last-updated timestamp) preserved for citation and freshness tracking
Hybrid retrieval (ChromaDB + BM25) — the system runs dense vector search and BM25 keyword search in parallel, merging results via reciprocal rank fusion. This handles both intent-based queries ("how do we handle rate limiting in the payments service") and exact-match lookups ("what is the SLA for the notification service") equally well — the two query patterns that come up most often in engineering knowledge bases
Access-aware retrieval — Confluence spaces marked as HR, legal, or leadership-only are excluded from the engineering index at ingestion time. Engineers can only surface documents from spaces they already have Confluence read access to, enforced via a pre-filter applied before vector search runs
Citation tracing on every response — every answer links back to the source Confluence page or GitHub file, including the last-updated timestamp. Engineers can see at a glance whether the document they're reading is from a recent architecture review or a three-year-old runbook that may be stale
Slack slash command integration — a /kb Slack command routes queries through the FastAPI backend and returns a threaded answer with citation links, so engineers can search without leaving their workflow. The most common use case: a developer hits an unfamiliar error and searches the knowledge base directly from the incident Slack thread
Staleness detection and re-indexing — documents updated in Confluence trigger a webhook that queues a targeted re-embedding job. The index is never more than 15 minutes stale for actively maintained documents, and the staleness timestamp visible in citations means engineers know when to trust the answer and when to verify
RAGAS evaluation against a golden test suite — 150 query/answer pairs covering common onboarding questions, architecture queries, and operational procedures. Runs on every deployment as a regression gate. Retrieval precision and answer faithfulness are tracked in a dashboard the engineering leads review weekly

Why this required careful architecture

The core challenge wasn't retrieval quality on clean documents — it was the heterogeneity of the corpus. Confluence pages vary from structured architecture documents with clear headers to freeform meeting notes written in five minutes. GitHub wiki pages follow no consistent format. ADR files are highly structured but use domain-specific vocabulary that standard embedding models underweight.

We handled this by building content-type-aware chunking strategies: structured pages are split at heading boundaries to preserve section context; freeform pages use fixed-size chunks with overlap; ADR files are indexed as atomic units because splitting them breaks the problem/decision/consequence structure that makes them useful. The chunking strategy had measurable impact on retrieval precision — early evaluations with naive fixed-size chunking scored 0.71 faithfulness on the golden test suite; content-aware chunking pushed it to 0.89.

Access control was a second constraint that shaped the architecture from the start. A naive approach — index everything and filter results at the API layer — has the same flaw as API-layer RBAC in any RAG system: semantically similar queries can surface restricted content in the retrieved chunks before the access check runs. We pre-filter at ingestion: restricted spaces never enter the index. The engineering corpus is clean by construction, not filtered after the fact.

The Slack integration introduced a token-budget constraint that doesn't exist in a web interface. Slack thread replies have display limits that make verbose multi-paragraph responses harder to read than a short answer plus a citation link. We built a summarisation layer that produces a 2–3 sentence direct answer for the Slack response and a more complete answer for the web interface, both grounded in the same retrieved context and carrying the same citations.

Platform metrics

Engagement duration6 weeks

Onboarding time3 days

Previously2 weeks

Team adoption (month 1)80%+

Senior eng. time recovered~8 hrs/week

RAGAS faithfulness score0.89

Documents indexed4,200+

Tech stack

Python 3.11 FastAPI LangChain ChromaDB BM25 OpenAI Ada Apache Airflow PostgreSQL Redis Slack API Confluence API RAGAS AWS ECS

Services involved

→ Production RAG Systems → Python Backend APIs → AI Integration & Automation → AI Quality Assurance

Results

3 days

New engineer onboarding time, down from 2 weeks of pairing sessions and document hunting

80%+

Team adoption in the first month — engineers reached for the knowledge base before pinging colleagues

~8 hrs

Senior engineer time recovered per week from repeated questions that already had written answers

0.89

RAGAS faithfulness score on the golden test suite — answers grounded in retrieved source documents

"New engineers were spending their first two weeks pinging senior devs for context that was already written down somewhere — nobody could find it. The system Jonix built didn't just solve onboarding. It stopped the senior engineers from being the default search engine for institutional knowledge."

CTO

US SaaS company, 60-person engineering team

Other case studies