cat, can't roll back, don't have per-agent path scopes, and aren't the readable text the LLM actually wants. That's puppyone's job.Vector databases are great. We've built on top of them; our customers run them. We are not a vector database and we don't want to be one.
Pinecone, Zilliz, Milvus, FAISS and friends solve a real, narrow problem: given a query embedding, return the K most similar items, fast, at scale. They are the best in the world at that. Their data model is a vector with a payload, not a versioned filesystem.
What we keep seeing teams discover the painful way:
"We put all our docs into Pinecone. Now the LLM 'kind of' answers questions, but we can't
catthe actual file, we don't have version history, we don't have per-agent permissions, and the chunked text we stored has drifted from the real source. Where do we put the real document?"
The answer is: in a file workspace. That's puppyone. The vector DB indexes over puppyone. Each tool does what it's good at.
| Dimension | Vector DB (Pinecone / Zilliz / FAISS) | puppyone |
|---|---|---|
| What it stores | Vector embeddings + small metadata payload | Canonical files (markdown, JSON, CSV, anything) |
| What you query with | An embedding (similarity search) | A path (cat, ls, grep, read_file) |
| What you get back | IDs / payloads of the K most similar items | The actual file content |
| Version history | Per-vector upsert; usually no diff between versions | Git-style commits, per-file diffs, instant rollback |
| Per-agent permissions | Namespace / collection-level (per-tenant), not per-agent path | Per-agent Access Points with explicit read/write path scopes |
| Native interface | SDK / REST around query(), upsert(), fetch() | Bash, MCP, REST, sandbox mount |
| What an LLM does with it | "Find me 5 chunks like this query, here are the IDs" | "Read me /research/spec.md, here are the bytes" |
| SaaS ingestion | Not the job — you build the embedding pipeline | Built-in connectors: Notion, Slack, Gmail, Postgres, GitHub, etc. |
| Self-hosted | Some yes (FAISS / Milvus / pgvector), some no (Pinecone managed) | Yes (open source, Docker) |
| Best at | Fast similarity search at scale | Being the canonical store of files agents read and write |
Use Pinecone / Zilliz / FAISS / pgvector for the job they're built for: embedding-backed retrieval at scale.
We don't replace any of that. puppyone has no built-in similarity search and we deliberately don't ship a vector engine — most teams already have one or want to choose their own.
Use puppyone when the question is about the canonical store, not retrieval:
cat?"A vector DB doesn't answer any of those well, because it wasn't built to. It answers "what's similar to this?" — which is also a critical question, just a different one.
In every production RAG / agent setup we've seen, the layout looks something like this:
/research/spec.md).cat those paths in puppyone (via Bash, MCP, or REST) to get the actual canonical text.The clean rule: vectors find it, puppyone stores it. Anything else is going to drift.
You can. People do. Common breakage:
payload on each vector. Now your "store" is two-headed: one source for embeddings, one for raw text, and they drift.cats the file.upsert and the old vector is gone — usually with no diff, no rollback, no audit. Compliance and debugging hate this.If you've been writing markdown into a Pinecone payload column to "have the original around", you've quietly turned your vector DB into a bad filesystem. That's the seam.
pgvector and Supabase Vector are excellent for embeddings next to your structured data in Postgres. The vector layer becomes part of your relational stack — great for many workloads. It still doesn't solve:
In production, the layout is usually: structured app data + vectors in Postgres / Supabase (with pgvector); canonical files / agent context in puppyone; agents query vectors, read files. Nothing is duplicated, nothing drifts.
FAISS is excellent as an embedded library when you want low-latency similarity search inside your own process — and you don't need a server. It is not a storage system at all. You feed it vectors, it returns IDs. Whatever those IDs point to has to live somewhere — most production setups use puppyone (or a database) as that "somewhere".
There is no migration. You don't replace your vector DB with puppyone.
cats the file in puppyone. This is the pattern.After a month, the architecture stops being "a vector DB with text as payload trying to be a knowledge base" and starts being a clean two-layer system: puppyone holds the files, the vector DB indexes them.
Does puppyone replace Pinecone / Zilliz / FAISS / pgvector? No. We don't ship a vector engine. We're the canonical file store the vector DB indexes over. They're complementary layers.
Does puppyone do semantic search? Not natively. We make it easy to plug a vector DB on top of puppyone content (Pinecone, Zilliz, FAISS, pgvector — your choice). The "vector search → cat file" pattern is the recommended setup.
Why not just bake a vector store into puppyone? Three reasons: (1) most teams already have one or want to choose, (2) vector engine choice has real ops trade-offs (managed vs self-hosted, cost vs scale), (3) keeping puppyone single-purpose makes it interchangeable with whatever vector layer you pick.
My RAG pipeline is currently Pinecone-only with text in the payload. Is that wrong? Wrong is too strong. It's a setup that works until version history, agent provenance, per-agent permissions, or SaaS ingestion become real requirements. When they do, puppyone is the layer to add underneath, with Pinecone staying as the index.
Does puppyone work with hybrid search (BM25 + vectors)? Yes — both BM25 and vector indexes can be built over puppyone content. The canonical text and structure live in puppyone; the indexes live wherever you want them.
Vector databases find documents. puppyone stores documents. Don't ask one to do the other's job. Run both: puppyone as the canonical, version-controlled, per-agent-scoped file store; your vector DB of choice as the similarity index over it. The LLM gets the right document, with the right history, every time.
Stop turning your vector DB into a half-broken filesystem. Add the layer it's missing.Get started