Why We Built a Different Kind of RAG Pipeline

When we started building the retrieval layer for Innovista Intelligence, the obvious approach was straightforward: embed signals, run cosine similarity, pass the results to the model. Standard RAG. It worked on day one.

It was also solving the wrong problem.

Generic retrieval finds text that's semantically close to your query. That's useful for documents. For competitive intelligence, it misses the point. An analyst querying "what's TSMC's competitive moat in advanced packaging" doesn't need signals that mention "TSMC" and "packaging" in close proximity. They need signals where someone has already connected TSMC's CoWoS lock-in to its implications for hyperscaler procurement decisions — where the analytical work has already been done. Proximity isn't the constraint. Strategic relevance is.

That distinction drove every decision in the pipeline we ended up building.

One Index, Three Surfaces

Before the architecture: one structural point worth naming.

The same retrieval infrastructure — same embedding index, same vector engine, same pre-analyzed signal corpus — powers three distinct surfaces in the platform: the War Room Copilot (interactive Q&A and structured analysis), On-Demand Reports (a multi-agent pipeline producing 12,000-word research reports), and the Proactive Insight Engine (background trend detection that surfaces shifts you didn't know to look for).

Each surface queries differently. All three share the same foundation. That means when the signal library grows, every surface improves simultaneously — not as a planned feature, but as a structural consequence of the architecture.

The Data Model Comes First

Most retrieval systems treat ingestion as a preprocessing step — get the data in, worry about structure later. We built the schema first, because the schema determines what retrieval can do.

Every signal enters the platform with the same structure: a factual description of what changed, an analyst's first-pass assessment of why it matters, the entities involved, the sector, the severity, the date, and a source link. These aren't metadata fields attached to raw text. The structured fields are the primary data. Raw text is a byproduct.

The payoff comes at query time. Retrieval can filter on sector, entity, or severity before running vector similarity — not as a post-processing step, but as a SQL condition that runs alongside the cosine distance calculation. A War Room scoped to semiconductors never sees an AI applications signal, regardless of how semantically similar it might be to the query.

Analysis Before Indexing

This is the layer that separates the pipeline from everything that comes before it.

Before a signal is embedded, the AI analysis layer runs against it. The output that matters most is a single field called soWhat: two to three sentences on why the signal matters strategically, written for a senior executive, leading with the implication rather than the event. Not "TSMC raised prices" — but "every tape-out schedule modeled on 2025 wafer pricing is wrong, and fabless companies without locked capacity face structural disadvantage in the 2027–2028 cycle."

The model also generates investment angles, competitive implications, and an impact score. But soWhat is the one that changes the retrieval behavior.

Because when the embedding is built, soWhat is included in the input text alongside the signal's title and factual description. The embedding vector is built on pre-analyzed content. Cosine similarity no longer finds signals with overlapping vocabulary — it finds signals where the analytical conclusions align with your query. Ask about "competitive positioning in the foundry market" and you surface signals where the strategic consequence of a capacity decision is already articulated, not just signals that mention the word "foundry."

That's the architectural insight: retrieval quality is a function of what you embed, not just how you embed it.

Two Different Retrieval Shapes

The Copilot and the report pipeline both run vector similarity queries against the same index, but they retrieve differently — and those differences were deliberate.

The Copilot retrieves within the War Room's scope: if you built a TSMC + 2nm War Room, the retrieval filter is exactly that. Narrow by design. One interesting decision here: date filters are intentionally excluded. The full signal history is available for every Copilot query, so a question asked today can surface a signal from six weeks ago if it's the most relevant context. Recency doesn't automatically win — relevance does.

The report research agent works differently. A 12,000-word Deep Dive on TSMC covers seven or eight distinct sections — competitive landscape, technology roadmap, supply chain, financial position, and others. Each section runs its own retrieval pass, embedding the section title and instructions as the query. The scope logic is deliberately wider: rather than requiring exact entity matches, it accepts any signal that overlaps with the report's entities or sectors. A section on competitive dynamics should surface signals where TSMC appears alongside Samsung or Intel, not just signals about TSMC in isolation. When a section comes up short — fewer than 15 relevant signals — web search triggers automatically to fill the gap.

Same index. Two query shapes. The retrieval pattern follows the analytical task, not the other way around.

What Goes Into the Context Window

Retrieved signals don't arrive as raw text. Each one is assembled into a structured block that includes the signal's date, sector, severity, similarity score, what changed, why it matters — and the soWhat field. The similarity score is surfaced explicitly so the model can calibrate confidence. The soWhat analysis is visible at synthesis time, not just at retrieval time.

For the Copilot, this context block is built under a 120,000-character budget — roughly 30,000 tokens of signal content — with the lowest-similarity signals truncated first if the budget fills. For reports, each section gets its own signal context block, assembled independently.

The effect: the model never reasons over a pile of raw events. It reasons over events that have already been analyzed once.

The Library Compounds

The most important property of this architecture is temporal. The vector index doesn't reset or degrade. Every new signal extends retrieval surface across all three product surfaces simultaneously — Copilot, reports, and proactive insights all improve with the same import.

But the deeper compounding effect is in the embeddings themselves. Because soWhat is baked into each vector, every new signal doesn't just add data — it adds a new direction of semantic retrieval that didn't exist before. The library doesn't just grow larger. It grows smarter.

That's the property generic RAG built on raw text can't replicate. The index improves because the corpus is analytically processed before it enters the index — and that processing compounds with every signal added to the library.

Every query searches the full corpus. The library compounds with every import.

See the intelligence pipeline in action →

Why We Built a Different Kind of RAG Pipeline