Knowledge Sphere: turning a cabinet of documents into a knowledge base that answers questions
Every research team and professional institution sits on a mountain of documents — reports, papers, manuals, contracts. But once knowledge is written into a PDF it is effectively sealed: finding an answer means flipping through file after file, page after page, relying on memory and keyword luck.
Knowledge Sphere was built to solve this — an AI-driven document-intelligence platform. Users upload documents, ask in natural language, and the AI assistant answers grounded in the content, attaching a traceable source to every sentence. It turns a cabinet of static documents into a knowledge base you can have a conversation with.
Shepherd Tech delivered the full-stack build of this platform, from architecture to launch. Below are its seven core capabilities and the engineering trade-offs behind them.
1. AI chat with inline citations: trustworthy because it is traceable
At the core is an AI assistant that understands your documents:
- Natural-language Q&A: users can ask complex questions about uploaded content; the AI extracts and organises relevant information to answer in real time.
- Conversation memory: the AI remembers earlier questions within a session, keeping context coherent and supporting follow-ups.
- Inline citations: every response sentence carries a citation marker pointing to a specific passage in the documents. This is the bedrock of the product’s trustworthiness — the AI’s answers are not a black box; every sentence can be verified back against the source.
Design stance: augment, don’t replace. The system marks the answer and the evidence chain across vast documents; the final judgement stays with the professional — and the clickable citation is what makes it trustworthy.
2. Smart understanding and summaries: the big picture first, then the detail

- Automatic summaries: each uploaded document gets multi-level summaries — quick overview, detailed summary, section summaries — so users grasp the whole without reading it all.
- Cross-document analysis: the AI can synthesise information across multiple documents at once, not just a single file.
- Context awareness: the AI considers the full context and intent of a question, not just literal matching.
- Question refinement: when initial retrieval is insufficient, the AI adjusts its strategy and searches again for a more complete answer.
3. The document pipeline: large files and scans, all handled

Accurate Q&A starts with reading documents cleanly and structuring them. The platform builds an enterprise-grade document pipeline:
- High-quality OCR: AWS Textract handles scanned PDFs, recognising text in photocopied and scanned files with high accuracy.
- Layout understanding: the system recognises headings, paragraphs and tables, preserving the logic of the content rather than flattening a page into a wall of text.
- Structured OCR context: OCR results are structured and attached to the AI’s citations, deepening its understanding of the document.
- Async background processing: large files are processed by retryable background jobs (Upstash QStash Workflow) so the frontend never blocks; users can keep working and track each file’s status in real time.
- Bulk upload: import and process many documents at once; speed and stability for multi-page documents are specifically optimised.
4. Interactive PDF: from "reading the answer" to "jumping to the source"
This is where Knowledge Sphere most embodies trust — a complete traceable chain from question to source.
Step one: ask a question, get an answer with citations. The user asks in natural language, the AI answers, and each relevant sentence carries a citation marker (e.g. 📄 2) indicating which document and passage it came from.

Step two: click a citation to jump to the source and highlight it. When the user clicks a citation marker (or the matching page button), the system flips the PDF to the corresponding page and selects and highlights that passage — almost no gap between answer and source, so users verify on the spot exactly where the AI got it.

- Text-block highlighting: OCR blocks are colour-coded by extraction confidence, so users see at a glance where recognition is solid.
- Direct copy: copy source text straight from a highlighted block, paired with an inline action menu (copy / search / explain / translate / follow-up).
- Page navigation: jump anywhere in the document from the page bar or the passage list.
5. Smart search: semantic + keyword, hybrid retrieval

- Semantic search: retrieves on the meaning and intent of the query, not literal matching — even with different wording, close meaning is found.
- Full-text search: keeps traditional exact keyword matching.
- Hybrid search: combines semantic vectors and full-text search, merging their rankings with Reciprocal Rank Fusion (RRF) for the most comprehensive results.
- Scope filtering: search within specific documents or a Space, with results updating live as you type.
6. Team collaboration and permissions: share knowledge, hold the line
- Shared Spaces: group documents by topic into collaborative workspaces for the team.
- Role permissions: tiered Owner and Viewer access.
- Invite-only: members join securely by invite code; only invitees can enter a workspace.
- Document-level permissions and activity tracking: fine-grained control over document access, with tracking of who accessed what.
7. Accounts and security: enterprise-grade data protection
- Complete account system: email-verified sign-up and login, secure password reset, idle auto-logout, profile management.
- Multi-layer access control: permissions at the user, Space and document levels.
- Data isolation: user data is fully separated; passwords are stored hashed with Argon2.
Architecture: a rebuild that cut latency from 20 seconds to 3
Knowledge Sphere’s engineering depth shows most in one key rebuild of the AI agent.
The early platform used a fixed-flow state machine (a seven-step LangGraph pipeline) — controllable, but with response latency of 15–25 seconds, and only "fake streaming" (computing everything before showing it at once).
We rebuilt it into an autonomous AI agent on Vercel AI SDK tool calling:
- The agent decides for itself when to call retrieval tools (passage search / summary search); the prompt offers strategy guidance rather than a forced fixed order — simple questions skip the full seven steps, complex ones get multiple retrieval rounds.
- Response latency dropped from 15–25s to 2–5s, and fake streaming became real token streaming.
- Added hybrid search (vector embeddings + a PostgreSQL full-text GIN index + RRF merge) and summary-layer embeddings for sharper retrieval.
Stack at a glance
- Frontend: Next.js · React · TypeScript · tRPC (type-safe API)
- AI agent: Vercel AI SDK (streamText + tool calling) · large language models via OpenRouter
- Embeddings: Azure OpenAI text-embedding-3-small (1536-dim)
- Retrieval: PostgreSQL vector retrieval + full-text search, merged with Reciprocal Rank Fusion
- Document processing: AWS Textract OCR · Upstash QStash Workflow (async, retryable) · cloud object storage
- Data layer: Drizzle ORM · PostgreSQL
- Auth: Lucia · Argon2 password hashing
In closing
Knowledge Sphere is a production-ready AI document-intelligence platform, with every core capability implemented and running — full AI chat with inline citations, the document pipeline, hybrid search, interactive PDF, team collaboration and enterprise security.
It demonstrates Shepherd Tech’s ability to deliver complex AI systems: not just calling an LLM API, but holding the entire RAG chain steady — from retrieval architecture and document pipeline to a traceable product experience.
Have something you want built?
Message us on WhatsApp. For websites we'll build a free demo; for bigger builds we'll scope it with you.
