AgencyRelay
Capability · RAG & Knowledge AI

White-label RAG that cites, not hallucinates

Retrieval-augmented generation built around your client's content — with a chunking strategy, reranking, real citations, and an honest "I don't know" path when the corpus doesn't have the answer.

  • Grounded answers with real citations
  • Honest "I don't know" path by default
  • White-label safe by default
What a RAG engagement looks likeCapability

Retrieval pod, predictable spine

  • FormatRAG pod inside your SOW
  • CadenceWeekly delivery · async daily standup
  • Stackpgvector · Pinecone · Cohere rerank · OpenAI / Anthropic
  • Starting at$3,200 / week
Final SOW is scoped against your brief. Multi-track AI pods (e.g. RAG + Agents) and pods that mix capabilities are quoted at the highest applicable rate.
When agencies bring us in

Four moments where the retrieval brief deserves more than a vector store and a hopeful prompt

These are the conversations agency owners describe when a knowledge-grounded brief is in front of them and the in-house team has wired up an embedding API but not yet a retrieval system that survives a real audit.

Signal 0101 / 04

An LLM is confidently inventing answers your client's customers will quote back

The chatbot looks helpful and is making things up — wrong policy, wrong price, wrong contact. We re-architect the surface around real retrieval, citation rendering, and an evaluated "I don't know" path so confidence and accuracy line up.

Signal 0202 / 04

The corpus is scattered across PDFs, Notion, Drive, Confluence, a CMS, and ticketing

Half the questions are answered in one source and half in another. We build the ingestion, normalisation, and chunking layer once, with per-source tuning, so the retriever sees the corpus the way the buyer thinks about it.

Signal 0303 / 04

The first RAG demo nails three sample queries and breaks on the fourth

Top-k chosen by feel, no reranker, no eval set, no failure mode for empty retrieval. We add hybrid search, a reranker, and a labelled question set so retrieval quality is measured, not vibed.

Signal 0404 / 04

The buyer wants citations, audit logs, and a paper trail for every answer

Compliance, legal, regulated content — the brief asks for traceable answers, not opaque ones. We make citation discipline a first-class part of the system: every claim ties back to a chunk a user can open, with run-time logs the buyer's auditor can read.

What this track is — and isn't

A senior retrieval pod under your brand. Not a vector store demo, not an out-of-the-box chatbot.

What it covers
  • Ingestion pipelines for the corpus (PDFs, web, Notion, Confluence, Drive, CMS, ticketing)
  • Chunking strategy — semantic, recursive, hierarchical — with per-source tuning
  • Embedding model selection and a vector store wired into your client's data layer
  • Hybrid search (BM25 + vector) and a reranker for answer-relevant ranking
  • Citation discipline — every claim ties back to a source the user can open
  • An evaluated "I don't know" path when retrieval is empty or low-confidence
  • Eval suite against a labelled question set, run in CI on every retrieval change
What it doesn't do
  • Tool-using agents that take actions on the client's stack — that's AI Agents
  • Internal automation pipelines without a retrieval surface — that's Workflow Automation
  • Wiring AI into an existing CRM or helpdesk surface — that's AI Integrations
  • Direct-to-client pitching — the pod sits inside your team, not in front of the client
  • Recruit, place, or staff-augment a developer onto your payroll
How a RAG engagement runs

From brief to first evaluated retriever in under three weeks

Retrieval work runs eval-first. The pod doesn't ship a grounded answer surface into a real workflow without a labelled question set and a documented citation contract behind it.

  1. Step 01Days 1–4

    Brief & feasibility

    Working session with your delivery lead and the buyer-side stakeholder. We map the corpus, the expected query types, the citation contract, and what a "wrong" answer looks like. NDA and SOW signed under Salt Technologies, Inc.

  2. Step 02Week 1

    Architecture + eval set

    Architecture readout: chunking strategy per source, embedding model, vector store, hybrid search and reranker, citation rendering, the "I don't know" path. In parallel, we build the labelled question set every retrieval change runs against in CI.

  3. Step 03Weeks 2–5

    Ingestion + retrieval build

    Iterative build with weekly working review. Ingestion pipeline first, then embeddings and the vector store, then retrieval and reranking. Eval pass rate, citation accuracy, and latency tracked from build one — not retrofitted at the end.

  4. Step 04Week 5+

    Grounded generation + rollout

    Production deployment behind a feature flag, with eval gates on every release, observability for retrieval quality and cost, and a written runbook for stale-document handling and re-indexing. Post-launch transitions cleanly into a Support & Maintenance retainer for corpus drift, model upgrades, and v1.x feature work.

How to engage

Two engagement shapes — pick the one that matches your brief

Retrieval work is long-arc once a real corpus shows up. The two shapes below are the engagements where production-grade RAG systems actually land — a new service line under your brand, or a delivery pod inside an active client SOW.

Capability rate
$3,200per week

Starting weekly rate for a single-capability AI pod. Multi-track AI pods (RAG + Agents, RAG + Integrations) and pods that mix capabilities are quoted at the highest applicable rate. Final SOW is scoped against the brief.

Stack & deliverables

Senior AI engineers, your tools, ship-ready output

We work inside the retrieval tooling your team and your client already use — no parallel platform, no "we'll just rebuild it our way" surprise.

Embeddings & vector stores
  • OpenAI text-embedding-3
  • Voyage · Cohere embeddings
  • pgvector · Postgres
  • Pinecone · Weaviate · Qdrant
Retrieval & reranking
  • Hybrid search (BM25 + vector)
  • Cohere Rerank · Voyage Rerank
  • Per-source retrieval tuning
  • Citation-aware response shaping
Evals & observability
  • Ragas · Promptfoo
  • Labelled question sets
  • LangSmith · Helicone · OpenTelemetry
  • Citation accuracy + drift dashboards
Outputs we ship
  • Production RAG system
  • Citation-rendering UI components
  • Eval suite + CI integration
  • "I don't know" path with telemetry
  • Observability dashboard
  • Ingestion + re-index runbook
Operating principles

Partner-safe inside your top knowledge accounts

Every RAG & Knowledge AI engagement runs on the same operating spine that protects long-arc retainers and Dedicated Partner Pods — contracted through Salt Technologies, Inc.

Principle

No client-facing footprint

We don't email your client, join their calls, or appear in the proposal — unless you explicitly white-list a named engineer in the SOW.

Principle

Inside your accounts

We work in your GitHub, your model-provider accounts, your hosting, and your shared channel under aliases that fit your team's naming.

Principle

Mutual no-poach

Mutual non-solicitation written into every MSA, with a defined window after the engagement ends. Same clause across every track.

Principle

Salt Technologies, Inc.

MSA, NDA, and engagement SOW are issued by Salt — the Delaware C-Corp behind AgencyRelay.

The same operating spine sits underneath every AgencyRelay capability. Read the no-poach and confidentiality page for the contractual instruments behind these defaults.

RAG & Knowledge AI FAQ

What agency owners ask before sizing a RAG build

Direct answers to the questions that come up on almost every RAG & Knowledge AI scoping call.

See full FAQ
  • Q.01

    What's the difference between RAG and AI Agents on this site?

    RAG grounds an answer in your client's content — the model retrieves and cites, but doesn't act. Agents take actions — they call tools, update systems of record, kick off workflows. Most agentic systems we ship combine both (an agent with a RAG tool in its toolbox), but the *primary* capability on the SOW is the one that defines the system. We make this call inside the brief & feasibility step in week one.

  • Q.02

    How do you stop the model from making things up?

    Three layers. First, a citation contract — the model is constrained to answer from retrieved chunks and surface the source for each claim. Second, an evaluated "I don't know" path that fires when retrieval is empty or low-confidence, instead of letting the model reach for its training data. Third, an eval suite that runs in CI on every retrieval or prompt change against a labelled question set the buyer signs off on, including hallucination scoring. All three are part of default scope.

  • Q.03

    Which vector store and embedding model do you use?

    We don't have a single default. The choice between pgvector (when the corpus is small-to-medium and Postgres is already in the stack) and a managed vector store like Pinecone, Weaviate, or Qdrant (for scale, hybrid filters, or multi-tenant routing) sits on corpus size, latency budget, and your client's existing data layer. Embedding choice — OpenAI text-embedding-3, Voyage, Cohere — is selected in the architecture readout with the trade-offs written down so your team and your client can refer back.

  • Q.04

    How do you handle the corpus — who owns it, who maintains it?

    The corpus and the ingestion pipeline belong to your client. We build the ingestion, normalisation, and chunking layer once, with per-source tuning, and ship a re-index runbook so your client's team (or a Support & Maintenance retainer crew) can keep it fresh. Source-of-truth lives in your client's existing systems — Notion, Drive, Confluence, CMS, ticketing — and the vector store is a derived index, not a parallel content system.

  • Q.05

    What happens when documents change or go stale?

    Two patterns, picked in the architecture readout. For corpora that change daily — product docs, support content, CMS — we wire incremental re-indexing on source events (webhooks, polling, or change feeds). For slower-moving corpora — policy, legal, archived knowledge — we ship a scheduled re-index runbook with a freshness dashboard so stale chunks are visible before they become wrong answers. Either way, citation rendering shows the source's last-updated timestamp so users see freshness alongside relevance.

  • Q.06

    What's the smallest engagement you'll take?

    Production RAG systems aren't a one-week capability. The most common starting shape is a 4–6 week pod inside an Invisible Delivery Team SOW, sized around the corpus surface and the citation contract. For shorter retrieval work (a tightly scoped feature, an eval-suite stand-up, a chunking-strategy audit), we'll quote a tighter window against the same weekly rate.

  • Q.07

    How does the pricing work for a multi-track or multi-capability AI pod?

    The starting weekly rate for a single-capability AI pod is $3,200 per week. Multi-track AI pods (RAG + Agents, RAG + Integrations) and pods that mix capabilities (RAG + UI/UX, RAG + Backend) are quoted at the highest applicable rate. Final SOW is scoped against the brief; the rate is the floor, not a ceiling.

  • Q.08

    What's the right way to support a RAG system after launch?

    Most RAG systems graduate cleanly into a Support & Maintenance retainer post-launch — corpus drift monitoring, eval regression on new model versions, reranker re-tuning, citation-accuracy review, and v1.x feature work inside a monthly envelope. Either the same pod or a smaller maintenance crew carries it on the same MSA, no second sales cycle.

  • Q.09

    Do we own the work the pod produces?

    Yes. IP ownership and assignment on delivered code, prompts, evals, ingestion pipelines, retrieval logic, and supporting artefacts is written into the MSA — the work belongs to your agency (and onwards to your client per your own client contract) on payment of the relevant invoice. The Salt Technologies templates are counsel-reviewed and shared before signing.

Bring the brief, get the right shape

Tell us the retrieval work you're sizing — we'll respond with a clean read on architecture, pod shape, and starting rate.

Knowledge portal, support deflection, internal search, brand-grounded answers, or a regulated-content surface. Either way, the conversation starts with the work — not with a deck.

Operating defaultsMSA / NDA / SOW issued by Salt Technologies, Inc.US-aligned working hoursNo-poach commitmentsWhite-label safe by default