What's the difference between RAG and AI Agents on this site?

RAG grounds an answer in your client's content — the model retrieves and cites, but doesn't act. Agents take actions — they call tools, update systems of record, kick off workflows. Most agentic systems we ship combine both (an agent with a RAG tool in its toolbox), but the *primary* capability on the SOW is the one that defines the system. We make this call inside the brief & feasibility step in week one.

How do you stop the model from making things up?

Three layers. First, a citation contract — the model is constrained to answer from retrieved chunks and surface the source for each claim. Second, an evaluated "I don't know" path that fires when retrieval is empty or low-confidence, instead of letting the model reach for its training data. Third, an eval suite that runs in CI on every retrieval or prompt change against a labelled question set the buyer signs off on, including hallucination scoring. All three are part of default scope.

Which vector store and embedding model do you use?

We don't have a single default. The choice between pgvector (when the corpus is small-to-medium and Postgres is already in the stack) and a managed vector store like Pinecone, Weaviate, or Qdrant (for scale, hybrid filters, or multi-tenant routing) sits on corpus size, latency budget, and your client's existing data layer. Embedding choice — OpenAI text-embedding-3, Voyage, Cohere — is selected in the architecture readout with the trade-offs written down so your team and your client can refer back.

How do you handle the corpus — who owns it, who maintains it?

The corpus and the ingestion pipeline belong to your client. We build the ingestion, normalisation, and chunking layer once, with per-source tuning, and ship a re-index runbook so your client's team (or a Support & Maintenance retainer crew) can keep it fresh. Source-of-truth lives in your client's existing systems — Notion, Drive, Confluence, CMS, ticketing — and the vector store is a derived index, not a parallel content system.

What happens when documents change or go stale?

Two patterns, picked in the architecture readout. For corpora that change daily — product docs, support content, CMS — we wire incremental re-indexing on source events (webhooks, polling, or change feeds). For slower-moving corpora — policy, legal, archived knowledge — we ship a scheduled re-index runbook with a freshness dashboard so stale chunks are visible before they become wrong answers. Either way, citation rendering shows the source's last-updated timestamp so users see freshness alongside relevance.

What's the smallest engagement you'll take?

Production RAG systems aren't a one-week capability. The most common starting shape is a 4–6 week pod inside an Invisible Delivery Team SOW, sized around the corpus surface and the citation contract. For shorter retrieval work (a tightly scoped feature, an eval-suite stand-up, a chunking-strategy audit), we'll quote a tighter window against the same weekly rate.

How does the pricing work for a multi-track or multi-capability AI pod?

The starting weekly rate for a single-capability AI pod is $3,200 per week. Multi-track AI pods (RAG + Agents, RAG + Integrations) and pods that mix capabilities (RAG + UI/UX, RAG + Backend) are quoted at the highest applicable rate. Final SOW is scoped against the brief; the rate is the floor, not a ceiling.

What's the right way to support a RAG system after launch?

Most RAG systems graduate cleanly into a Support & Maintenance retainer post-launch — corpus drift monitoring, eval regression on new model versions, reranker re-tuning, citation-accuracy review, and v1.x feature work inside a monthly envelope. Either the same pod or a smaller maintenance crew carries it on the same MSA, no second sales cycle.

Do we own the work the pod produces?

Yes. IP ownership and assignment on delivered code, prompts, evals, ingestion pipelines, retrieval logic, and supporting artefacts is written into the MSA — the work belongs to your agency (and onwards to your client per your own client contract) on payment of the relevant invoice. The Salt Technologies templates are counsel-reviewed and shared before signing.

Capability · RAG & Knowledge AI

White-label RAG that cites, not hallucinates

Retrieval-augmented generation built around your client's content — with a chunking strategy, reranking, real citations, and an honest "I don't know" path when the corpus doesn't have the answer.

Explore Capabilities Send Us the Opportunity

Grounded answers with real citations
Honest "I don't know" path by default
White-label safe by default

What a RAG engagement looks likeCapability

Retrieval pod, predictable spine

FormatRAG pod inside your SOW
CadenceWeekly delivery · async daily standup
Stackpgvector · Pinecone · Cohere rerank · OpenAI / Anthropic
Starting at$3,200 / week

Final SOW is scoped against your brief. Multi-track AI pods (e.g. RAG + Agents) and pods that mix capabilities are quoted at the highest applicable rate.

When agencies bring us in

Four moments where the retrieval brief deserves more than a vector store and a hopeful prompt

These are the conversations agency owners describe when a knowledge-grounded brief is in front of them and the in-house team has wired up an embedding API but not yet a retrieval system that survives a real audit.

Signal 0101 / 04

An LLM is confidently inventing answers your client's customers will quote back

The chatbot looks helpful and is making things up — wrong policy, wrong price, wrong contact. We re-architect the surface around real retrieval, citation rendering, and an evaluated "I don't know" path so confidence and accuracy line up.

Signal 0202 / 04

The corpus is scattered across PDFs, Notion, Drive, Confluence, a CMS, and ticketing

Half the questions are answered in one source and half in another. We build the ingestion, normalisation, and chunking layer once, with per-source tuning, so the retriever sees the corpus the way the buyer thinks about it.

Signal 0303 / 04

The first RAG demo nails three sample queries and breaks on the fourth

Top-k chosen by feel, no reranker, no eval set, no failure mode for empty retrieval. We add hybrid search, a reranker, and a labelled question set so retrieval quality is measured, not vibed.

Signal 0404 / 04

The buyer wants citations, audit logs, and a paper trail for every answer

Compliance, legal, regulated content — the brief asks for traceable answers, not opaque ones. We make citation discipline a first-class part of the system: every claim ties back to a chunk a user can open, with run-time logs the buyer's auditor can read.

What this track is — and isn't

A senior retrieval pod under your brand. Not a vector store demo, not an out-of-the-box chatbot.

What it covers

Ingestion pipelines for the corpus (PDFs, web, Notion, Confluence, Drive, CMS, ticketing)
Chunking strategy — semantic, recursive, hierarchical — with per-source tuning
Embedding model selection and a vector store wired into your client's data layer
Hybrid search (BM25 + vector) and a reranker for answer-relevant ranking
Citation discipline — every claim ties back to a source the user can open
An evaluated "I don't know" path when retrieval is empty or low-confidence
Eval suite against a labelled question set, run in CI on every retrieval change

What it doesn't do

Tool-using agents that take actions on the client's stack — that's AI Agents
Internal automation pipelines without a retrieval surface — that's Workflow Automation
Wiring AI into an existing CRM or helpdesk surface — that's AI Integrations
Direct-to-client pitching — the pod sits inside your team, not in front of the client
Recruit, place, or staff-augment a developer onto your payroll

How a RAG engagement runs

From brief to first evaluated retriever in under three weeks

Retrieval work runs eval-first. The pod doesn't ship a grounded answer surface into a real workflow without a labelled question set and a documented citation contract behind it.

Step 01Days 1–4
Brief & feasibility
Working session with your delivery lead and the buyer-side stakeholder. We map the corpus, the expected query types, the citation contract, and what a "wrong" answer looks like. NDA and SOW signed under Salt Technologies, Inc.
Step 02Week 1
Architecture + eval set
Architecture readout: chunking strategy per source, embedding model, vector store, hybrid search and reranker, citation rendering, the "I don't know" path. In parallel, we build the labelled question set every retrieval change runs against in CI.
Step 03Weeks 2–5
Ingestion + retrieval build
Iterative build with weekly working review. Ingestion pipeline first, then embeddings and the vector store, then retrieval and reranking. Eval pass rate, citation accuracy, and latency tracked from build one — not retrofitted at the end.
Step 04Week 5+
Grounded generation + rollout
Production deployment behind a feature flag, with eval gates on every release, observability for retrieval quality and cost, and a written runbook for stale-document handling and re-indexing. Post-launch transitions cleanly into a Support & Maintenance retainer for corpus drift, model upgrades, and v1.x feature work.

How to engage

Two engagement shapes — pick the one that matches your brief

Retrieval work is long-arc once a real corpus shows up. The two shapes below are the engagements where production-grade RAG systems actually land — a new service line under your brand, or a delivery pod inside an active client SOW.

Most common for RAG

Engagement shape

Capability Expansion

Capability lead + delivery support

Standing RAG & Knowledge AI up as a defensible new service line on your rate card — capability lead, pricing template, monthly retainer with quarterly review.

Read the model

Engagement shape

Invisible Delivery Team

Retrieval pod inside your SOW

Active client work that has been sold and now needs end-to-end RAG throughput inside your shared channel — long-arc, structured, named delivery owner.

See the solution

Capability rate

$3,200per week

Starting weekly rate for a single-capability AI pod. Multi-track AI pods (RAG + Agents, RAG + Integrations) and pods that mix capabilities are quoted at the highest applicable rate. Final SOW is scoped against the brief.

Stack & deliverables

Senior AI engineers, your tools, ship-ready output

We work inside the retrieval tooling your team and your client already use — no parallel platform, no "we'll just rebuild it our way" surprise.

Embeddings & vector stores

OpenAI text-embedding-3
Voyage · Cohere embeddings
pgvector · Postgres
Pinecone · Weaviate · Qdrant

Retrieval & reranking

Hybrid search (BM25 + vector)
Cohere Rerank · Voyage Rerank
Per-source retrieval tuning
Citation-aware response shaping

Evals & observability

Ragas · Promptfoo
Labelled question sets
LangSmith · Helicone · OpenTelemetry
Citation accuracy + drift dashboards

Outputs we ship

Production RAG system
Citation-rendering UI components
Eval suite + CI integration
"I don't know" path with telemetry
Observability dashboard
Ingestion + re-index runbook

Best-fit partners

Who this track runs hottest with

RAG & Knowledge AI shows up most often where the agency is already shipping AI work and the next brief asks for grounded, source-backed answers — not just a smarter chat box.

Most common entry point

For HubSpot Agencies

HubSpot partners shipping knowledge portals, content-grounded chatbots, and support-deflection surfaces over the HubSpot CMS, blog, and help content — where the standard search ends and a real retrieval system begins.

Read the hubspot agencies fit

Brand-grounded answer surfaces

For Design Agencies

Boutique design and product studios shipping client-facing knowledge surfaces — brand-grounded answer experiences, internal portals, customer-content discovery — where retrieval quality is part of the design, not an afterthought.

Read the design agencies fit

Catalog and content-grounded AI

For Shopify Agencies

Shopify partners adding merchant docs retrieval, product-knowledge assistants over catalogs, and shopper-side answer surfaces grounded in the actual store content — not in the model's guess at it.

Read the shopify agencies fit

See all agency types we partner with

Operating principles

Partner-safe inside your top knowledge accounts

Every RAG & Knowledge AI engagement runs on the same operating spine that protects long-arc retainers and Dedicated Partner Pods — contracted through Salt Technologies, Inc.

Principle

No client-facing footprint

We don't email your client, join their calls, or appear in the proposal — unless you explicitly white-list a named engineer in the SOW.

Principle

Inside your accounts

We work in your GitHub, your model-provider accounts, your hosting, and your shared channel under aliases that fit your team's naming.

Principle

Mutual no-poach

Mutual non-solicitation written into every MSA, with a defined window after the engagement ends. Same clause across every track.

Principle

Salt Technologies, Inc.

MSA, NDA, and engagement SOW are issued by Salt — the Delaware C-Corp behind AgencyRelay.

The same operating spine sits underneath every AgencyRelay capability. Read the no-poach and confidentiality page for the contractual instruments behind these defaults.

No-poach & confidentiality Read partner models

Helpful before the call

Three reads that come up most before a RAG build

Most agency owners deciding between a retrieval pod, a single AI specialist hire, or a freelance prompt engineer weigh the same trade-offs. These pages address them head-on.

Capability · Free

AI Implementation overview

All four AI capabilities side by side — useful when the brief sits between RAG and one of the other tracks and you want to see the routing decision laid out.

Solution · Free

Capability Expansion

The most common shape for adding RAG & Knowledge AI to your rate card — capability lead, pricing template, quarterly review on whether the new line earns its place.

Compare · Free

White-label vs hiring in-house

When a retrieval pod is cheaper, faster, and lower-risk than the next senior AI hire — and when staffing up is the right call.

RAG & Knowledge AI FAQ

What agency owners ask before sizing a RAG build

Direct answers to the questions that come up on almost every RAG & Knowledge AI scoping call.

See full FAQ

Q.01
What's the difference between RAG and AI Agents on this site?
RAG grounds an answer in your client's content — the model retrieves and cites, but doesn't act. Agents take actions — they call tools, update systems of record, kick off workflows. Most agentic systems we ship combine both (an agent with a RAG tool in its toolbox), but the *primary* capability on the SOW is the one that defines the system. We make this call inside the brief & feasibility step in week one.
Q.02
How do you stop the model from making things up?
Three layers. First, a citation contract — the model is constrained to answer from retrieved chunks and surface the source for each claim. Second, an evaluated "I don't know" path that fires when retrieval is empty or low-confidence, instead of letting the model reach for its training data. Third, an eval suite that runs in CI on every retrieval or prompt change against a labelled question set the buyer signs off on, including hallucination scoring. All three are part of default scope.
Q.03
Which vector store and embedding model do you use?
We don't have a single default. The choice between pgvector (when the corpus is small-to-medium and Postgres is already in the stack) and a managed vector store like Pinecone, Weaviate, or Qdrant (for scale, hybrid filters, or multi-tenant routing) sits on corpus size, latency budget, and your client's existing data layer. Embedding choice — OpenAI text-embedding-3, Voyage, Cohere — is selected in the architecture readout with the trade-offs written down so your team and your client can refer back.
Q.04
How do you handle the corpus — who owns it, who maintains it?
The corpus and the ingestion pipeline belong to your client. We build the ingestion, normalisation, and chunking layer once, with per-source tuning, and ship a re-index runbook so your client's team (or a Support & Maintenance retainer crew) can keep it fresh. Source-of-truth lives in your client's existing systems — Notion, Drive, Confluence, CMS, ticketing — and the vector store is a derived index, not a parallel content system.
Q.05
What happens when documents change or go stale?
Two patterns, picked in the architecture readout. For corpora that change daily — product docs, support content, CMS — we wire incremental re-indexing on source events (webhooks, polling, or change feeds). For slower-moving corpora — policy, legal, archived knowledge — we ship a scheduled re-index runbook with a freshness dashboard so stale chunks are visible before they become wrong answers. Either way, citation rendering shows the source's last-updated timestamp so users see freshness alongside relevance.
Q.06
What's the smallest engagement you'll take?
Production RAG systems aren't a one-week capability. The most common starting shape is a 4–6 week pod inside an Invisible Delivery Team SOW, sized around the corpus surface and the citation contract. For shorter retrieval work (a tightly scoped feature, an eval-suite stand-up, a chunking-strategy audit), we'll quote a tighter window against the same weekly rate.
Q.07
How does the pricing work for a multi-track or multi-capability AI pod?
The starting weekly rate for a single-capability AI pod is $3,200 per week. Multi-track AI pods (RAG + Agents, RAG + Integrations) and pods that mix capabilities (RAG + UI/UX, RAG + Backend) are quoted at the highest applicable rate. Final SOW is scoped against the brief; the rate is the floor, not a ceiling.
Q.08
What's the right way to support a RAG system after launch?
Most RAG systems graduate cleanly into a Support & Maintenance retainer post-launch — corpus drift monitoring, eval regression on new model versions, reranker re-tuning, citation-accuracy review, and v1.x feature work inside a monthly envelope. Either the same pod or a smaller maintenance crew carries it on the same MSA, no second sales cycle.
Q.09
Do we own the work the pod produces?
Yes. IP ownership and assignment on delivered code, prompts, evals, ingestion pipelines, retrieval logic, and supporting artefacts is written into the MSA — the work belongs to your agency (and onwards to your client per your own client contract) on payment of the relevant invoice. The Salt Technologies templates are counsel-reviewed and shared before signing.

Bring the brief, get the right shape

Tell us the retrieval work you're sizing — we'll respond with a clean read on architecture, pod shape, and starting rate.

Knowledge portal, support deflection, internal search, brand-grounded answers, or a regulated-content surface. Either way, the conversation starts with the work — not with a deck.

Book a Partner Call Send Us the Opportunity

Operating defaultsMSA / NDA / SOW issued by Salt Technologies, Inc.US-aligned working hoursNo-poach commitmentsWhite-label safe by default

White-label RAG that cites, not hallucinates

Four moments where the retrieval brief deserves more than a vector store and a hopeful prompt

An LLM is confidently inventing answers your client's customers will quote back

The corpus is scattered across PDFs, Notion, Drive, Confluence, a CMS, and ticketing

The first RAG demo nails three sample queries and breaks on the fourth

The buyer wants citations, audit logs, and a paper trail for every answer

A senior retrieval pod under your brand. Not a vector store demo, not an out-of-the-box chatbot.

From brief to first evaluated retriever in under three weeks

Brief & feasibility

Architecture + eval set

Ingestion + retrieval build

Grounded generation + rollout

Two engagement shapes — pick the one that matches your brief

Capability Expansion

Invisible Delivery Team

Three more AI tracks that often pair with retrieval work

AI Agents

AI Workflow Automation

AI Integrations

Senior AI engineers, your tools, ship-ready output

Who this track runs hottest with

For HubSpot Agencies

For Design Agencies

For Shopify Agencies

Partner-safe inside your top knowledge accounts

No client-facing footprint

Inside your accounts

Mutual no-poach

Salt Technologies, Inc.

Three reads that come up most before a RAG build

AI Implementation overview

Capability Expansion

White-label vs hiring in-house

Three more pages worth opening before the call

Solutions hub

Capabilities hub

Pricing

What agency owners ask before sizing a RAG build

What's the difference between RAG and AI Agents on this site?

How do you stop the model from making things up?

Which vector store and embedding model do you use?

How do you handle the corpus — who owns it, who maintains it?

What happens when documents change or go stale?

What's the smallest engagement you'll take?

How does the pricing work for a multi-track or multi-capability AI pod?

What's the right way to support a RAG system after launch?

Do we own the work the pod produces?

Tell us the retrieval work you're sizing — we'll respond with a clean read on architecture, pod shape, and starting rate.