Getting Started

The Cape Documentation Agent is a documentation intelligence layer for cape.io. It ingests documentation from multiple sources — Markdown files, OpenAPI specs, Confluence — into a single PostgreSQL vector store, and exposes that knowledge through several consumption surfaces.

Architecture

Sources                    Knowledge Base              Surfaces
──────────────────────     ────────────────────────    ────────────────────────
Markdown files         ──► Documents + Chunks      ──► Chat (embedded in Cape)
OpenAPI specs          ──► Vector embeddings        ──► REST API
Confluence pages       ──► (PostgreSQL + pgvector)  ──► MCP server (Claude/Cursor)
CI/CD pipelines        ──►                          ──► FAQ page

Every surface reads from the same store. The difference between modes is how much context is retrieved, and how the output is structured.

Namespaces

All content is partitioned by namespace. Queries are always scoped — namespaces are never mixed unless explicitly requested.

Namespace	Audience	Auth required
`user_docs`	Cape external customers	No
`tech_docs`	Internal development team	API key
`api_endpoints`	Developers integrating with Cape	API key
`confluence`	Internal teams via Confluence	API key

Retrieval modes

The retrieval parameter controls how context is gathered before generating a response.

Standard — embeds the query, runs a top-K vector similarity search, passes results to the LLM in one call. Fast, one round-trip.

Smart — agentic multi-step process (2–4 LLM calls, max 3 iterations):

Initial top-K search
LLM evaluates whether context is sufficient; identifies gaps
Inspects document outlines (heading paths, no content) to find targeted sections
Fetches those sections; re-evaluates
Generates final answer

Use smart retrieval for complex questions that span multiple document sections.

Chunking strategy

Documents are split hierarchically:

Splits at H1/H2/H3 boundaries — each section becomes a parent chunk
Parent chunks are further split into child chunks (max ~500 tokens, 100-token overlap)
Code blocks are never split mid-block
Each child chunk stores its heading breadcrumb (headingPath) so context stays meaningful

Both parent and child chunks are stored. Embeddings are generated on child chunks for precision; retrieval returns parent content for context.

Language support

Language is BCP 47 (en, nl, de, fr). If omitted from a request, it is auto-detected from the user's input and passed to the LLM in the system prompt so responses are returned in the same language.