Getting Started

The Cape Documentation Agent is a documentation intelligence layer for cape.io. It ingests documentation from multiple sources — Markdown files, OpenAPI specs, Confluence — into a single PostgreSQL vector store, and exposes that knowledge through several consumption surfaces.

Architecture

Sources                    Knowledge Base              Surfaces
──────────────────────     ────────────────────────    ────────────────────────
Markdown files         ──► Documents + Chunks      ──► Chat (embedded in Cape)
OpenAPI specs          ──► Vector embeddings        ──► REST API
Confluence pages       ──► (PostgreSQL + pgvector)  ──► MCP server (Claude/Cursor)
CI/CD pipelines        ──►                          ──► FAQ page

Every surface reads from the same store. The difference between modes is how much context is retrieved, and how the output is structured.

Namespaces

All content is partitioned by namespace. Queries are always scoped — namespaces are never mixed unless explicitly requested.

NamespaceAudienceAuth required
user_docsCape external customersNo
tech_docsInternal development teamAPI key
api_endpointsDevelopers integrating with CapeAPI key
confluenceInternal teams via ConfluenceAPI key

Retrieval modes

The retrieval parameter controls how context is gathered before generating a response.

Standard — embeds the query, runs a top-K vector similarity search, passes results to the LLM in one call. Fast, one round-trip.

Smart — agentic multi-step process (2–4 LLM calls, max 3 iterations):

  1. Initial top-K search
  2. LLM evaluates whether context is sufficient; identifies gaps
  3. Inspects document outlines (heading paths, no content) to find targeted sections
  4. Fetches those sections; re-evaluates
  5. Generates final answer

Use smart retrieval for complex questions that span multiple document sections.

Chunking strategy

Documents are split hierarchically:

  • Splits at H1/H2/H3 boundaries — each section becomes a parent chunk
  • Parent chunks are further split into child chunks (max ~500 tokens, 100-token overlap)
  • Code blocks are never split mid-block
  • Each child chunk stores its heading breadcrumb (headingPath) so context stays meaningful

Both parent and child chunks are stored. Embeddings are generated on child chunks for precision; retrieval returns parent content for context.

Language support

Language is BCP 47 (en, nl, de, fr). If omitted from a request, it is auto-detected from the user's input and passed to the LLM in the system prompt so responses are returned in the same language.