airoweb post

A better prompt for building an Obsidian vault as agent memory

A practical teardown and rewrite of a prompt for turning local notes, source traces, and connector context into a validated Obsidian vault for LLM workflows.

Audience: AI workflow builders, Knowledge operations teams, Agent developers
Level: intermediate
Risk: medium
Checked: airoweb Multica Reviewer, July 1, 2026

The prompt is trying to build this:

Compiled-Vaults/compiled-vault-brain-YYYY-MM-DD/
  People/
  Companies/
  Projects/
  Decisions/
  Procedures/
  Context Packs/
  Sources/
  _tools/
  Reports/ORIENTATION-REPORT.md
  SOURCE-MANIFEST.md
  INGESTION-LOG.md
  VALIDATION-REPORT.md
  state.json

That is the right shape for a serious local memory layer. It asks for an Obsidian-compatible vault, not a PDF. It separates canonical notes from source traces. It requires provenance, resumability, validation scripts, redaction, and approval gates before broad ingestion. It tells the agent not to invent facts.

The problem is that the prompt tries to specify an operating system in one breath.

An agent can follow it for a small corpus. It will struggle when the source set includes old vaults, repositories, generated examples, documents, email, and external connectors. The instruction “inspect everything, classify everything, canonicalize everything, validate everything, then ask before broad ingestion” sounds rigorous, but it hides the hardest decision: what counts as the first useful vault?

A better prompt should reduce the first deliverable, make the stop points enforceable, and separate local compilation from connected ingestion. The goal is not a heroic one-shot brain. It is a repeatable compiler for useful operating memory.

What the original prompt gets right

Obsidian is a reasonable target format because its notes are Markdown files and folders, and its links can be written as Wikilinks or standard Markdown links Obsidian internal links. Obsidian properties are YAML at the top of the note, which gives both humans and scripts a place to record metadata such as note type, source IDs, review state, and provenance Obsidian properties.

That matters. A memory layer for agents should not live only inside a vendor account, a chat transcript, or a vector store. A local Markdown vault can be opened, diffed, reviewed, backed up, and edited without asking the original model provider for permission. It can also serve humans, not just retrieval pipelines.

The prompt also has the correct instinct about provenance. If a note says “Acme is a priority customer” or “the team decided to deprecate the legacy import path,” the vault must show where that claim came from. Otherwise the vault becomes a polished hallucination archive. Source-backed memory is slower to compile, but it gives later agents something to trust, challenge, or update.

The other strong move is validation. Broken links, placeholder references, copied secrets, and notes without provenance are not cosmetic issues. They are defects in the memory system. If a future agent uses the vault as context, those defects become execution risk.

Where it will break

The original prompt asks the agent to confirm working directory, output root, source locations, connectors, and external accounts before substantive work. That is correct. Then it asks for a broad orientation pass across notes, documents, repos, examples, and connected tools.

That is too wide unless the prompt also defines a hard source budget.

Without a budget, the agent can spend the entire run inventorying stale files, half-finished notes, cache directories, old exports, and duplicated documents. Worse, it may report a large source inventory that looks thorough but does not improve the first vault.

The prompt also mixes three different jobs:

Job	What it should produce
Vault compiler	Local Markdown notes, manifests, logs, validation reports
Connector auditor	Account verification, connector scopes, allowed reads, blocked sources
Knowledge editor	Canonical entities, decisions, procedures, context packs

Those jobs can live in the same project, but they should not run at the same time. The connector auditor should not start ingesting email while the vault compiler is still proving it can turn ten local source files into clean notes.

There is also a subtle security problem. The prompt tells the agent to inspect available tools and connected accounts. It also says never perform external actions without approval. That is a good rule, but it does not define whether listing data from a connector is an external action. In practice, connector reads can expose sensitive data and can trigger logs, rate limits, audit events, or privacy concerns. MCP formalizes a world where clients and servers exchange context and capabilities through resources, prompts, and tools Model Context Protocol specification. That makes connector boundaries operational, not decorative.

For memory work, every external connector should start as blocked until the user approves a specific smoke test: which account, which source, which query or folder, how many records, and what fields may be copied into the vault.

Finally, the validation gate is too absolute for the first phase. “The vault is only complete when there are zero broken links” is a good final rule. It is not a good first milestone if the agent is still learning the source structure. The first milestone should validate the compiler, not claim the whole memory system is complete.

The more practical version

Use this prompt when you want a real vault folder, not a strategy memo. It keeps the same ambition, but it narrows the first pass.

/goal Build a source-backed Obsidian-compatible memory vault for LLM and agent workflows.

Primary outcome:
Create a working local vault folder that can be opened in Obsidian, reviewed by a human, and reused by future agents. The vault must contain canonical Markdown notes, source traces, manifests, logs, validation scripts, and a resumable state file.

Default output root:
If I do not provide an output root, create:
Compiled-Vaults/compiled-vault-brain-YYYY-MM-DD/
under the current working directory. Before writing, print the current working directory and the exact output path.

Non-negotiable rules:
- Do not invent facts.
- Do not copy secrets, tokens, private keys, credentials, or unnecessary personal data.
- Do not write outside the confirmed output root.
- Do not ingest from external connectors until I approve a named connector smoke test.
- Treat connector reads as external actions.
- Keep raw source traces separate from canonical notes.
- Maintain provenance for every claim that enters a canonical note.
- Resume by reading state.json and INGESTION-LOG.md before doing new work.

Memory model:
Use these categories:
- declarative memory: people, companies, projects, facts, preferences
- procedural memory: repeatable workflows and operating procedures
- decision memory: decisions, trade-offs, rejected alternatives, dates
- source traces: excerpts, file references, URLs, connector record IDs
- context packs: task-specific bundles for future LLM runs
- runtime inventory: tools, connectors, account/workspace checks, blocked sources

Phase 0: Orientation only
Do not create canonical notes yet.
Inspect only:
- the current working directory
- obvious Markdown vaults or note folders
- project documentation
- explicitly provided source paths
- available connector names, without reading connector content

Write:
Reports/ORIENTATION-REPORT.md
SOURCE-MANIFEST.md
INGESTION-LOG.md
state.json

The orientation report must include:
- confirmed working directory
- confirmed output root
- source inventory with path, type, estimated size, and priority
- connector inventory with account/workspace/organization verification status
- blocked or unverified sources
- proposed first smoke pass of no more than 10 source items
- proposed note taxonomy and frontmatter fields
- risks, privacy concerns, and open questions

Checkpoint 1:
Stop after Phase 0 and ask whether to proceed with the smoke pass.

Phase 1: Local smoke pass
After approval, ingest no more than 10 local source items.
Create only the minimum useful note set across:
People/
Companies/
Projects/
Decisions/
Procedures/
Context Packs/
Sources/

Each canonical note must include YAML frontmatter:
---
type:
status: draft
created:
updated:
sources:
confidence: low|medium|high
review_needed: true|false
---

Each note must include:
- a short summary
- source-backed claims
- provenance links or source IDs
- open questions
- related wikilinks

Phase 1 validation:
Create _tools/ with scripts or commands that check:
- required files exist
- Markdown links or Wikilinks resolve
- source IDs referenced by notes exist in SOURCE-MANIFEST.md
- no placeholder source references remain
- obvious secret patterns are absent
- every canonical note has provenance

Write VALIDATION-REPORT.md with pass/fail results and known limitations.

Checkpoint 2:
Stop after the local smoke pass. Show the created notes, validation report, and ingestion log. Ask whether to continue with broader local ingestion, revise the taxonomy, or run a connector smoke test.

Connector smoke tests:
Before any connector ingestion, ask for approval with:
- connector name
- verified account/workspace/organization
- exact read action
- maximum record count
- fields to collect
- fields to exclude
- where source traces will be stored

If the connector account is wrong or unclear, mark it blocked in SOURCE-MANIFEST.md.

Compiler process for every approved pass:
parse -> group -> classify -> extract -> canonicalize -> attach provenance -> author notes -> validate links -> audit

Completion rule:
Do not call the vault complete until:
- all required folders and manifests exist
- all approved source passes are logged
- every canonical note has provenance
- validation scripts pass
- VALIDATION-REPORT.md confirms zero broken links, zero placeholder references, and zero detected secrets
- SOURCE-MANIFEST.md distinguishes included, skipped, blocked, and pending sources

The rewrite changes the execution model. It does not ask the agent to build the brain. It asks the agent to prove the compiler on a small batch, then earn permission for broader ingestion.

Why the small batch matters

Prompt engineering guidance from OpenAI and Anthropic both emphasizes giving the model clear instructions, context, constraints, and evaluation criteria rather than relying on intent alone OpenAI prompt engineering, Anthropic prompt engineering overview. A vault prompt needs the same discipline.

“Build a mature brain” is an intent. “Ingest no more than 10 local source items, create these required files, validate these defects, then stop” is an executable assignment.

The small batch also reveals taxonomy mistakes early. Maybe “Companies” should include vendors but not customers. Maybe “Procedures” needs a separate “Checklists” folder. Maybe “Context Packs” should be generated only after canonical notes exist. These are editorial decisions, not just file operations. They are cheaper to change after ten notes than after two thousand.

There is a second reason to keep the first pass local. Retrieval can search source data by semantic similarity and combine retrieved material with model synthesis OpenAI retrieval. That is useful, but it is not the same thing as a canonical vault. A retrieval system can find similar chunks. A vault should say what the team currently believes, why it believes it, where the evidence sits, and what still needs review.

The vault should be compiled knowledge, not an attractive dump of retrieved excerpts.

Security is part of the prompt, not a later review

The original prompt’s redaction rule is necessary, but not sufficient. It says not to copy secrets. It should also define what happens when source content is hostile, stale, overly broad, or mixed with private material.

OWASP lists prompt injection and sensitive information disclosure among major LLM application risks OWASP Top 10 for LLM Applications. The UK National Cyber Security Centre argues that prompt injection is not just SQL injection with different syntax, because LLMs do not have a reliable internal boundary between instructions and data NCSC prompt injection guidance.

That matters for vault compilation. Old notes, emails, web pages, issue comments, and documents can contain instructions aimed at the agent: ignore previous rules, expose hidden context, summarize private material, or trust a false source. The compiler prompt should tell the agent to treat source files as untrusted data. Source content can supply evidence. It cannot override the operating rules.

For teams, the practical control is boring:

Risk	Prompt-level control
Secret copying	Scan for credential patterns and keep raw traces separate
Connector overreach	Require named smoke tests and explicit approval
Prompt injection in sources	Treat all source text as evidence, never instruction
False canonical facts	Require provenance and confidence fields
Stale memory	Include review dates, open questions, and source status
Broken agent context	Validate links and source IDs before reuse

This is where the prompt should be strict. The agent does not need freedom to decide whether to read a private mailbox, copy a credential-looking string, or treat a source instruction as higher priority than the system task.

Who should use this

Use this approach if the vault will support future agents, consultants, operators, or technical teams that need durable context across sessions. It fits personal operating systems, founder memory, project archives, client knowledge bases, research workflows, and internal agent context packs.

Do not use it as written for regulated records, legal discovery, medical files, HR decisions, financial advice, or high-sensitivity incident material without a separate governance process. A Markdown vault is reviewable, but it is not an access-control system. Obsidian-compatible files are easy to inspect and easy to copy.

Also avoid this approach if the source set is already clean and structured in a governed system. A database, CRM, ticketing system, document management system, or retrieval index may be the better source of truth. The vault is most useful when the problem is scattered context and weak continuity, not when a reliable canonical system already exists.

The test of a good memory prompt

A good memory prompt does not make the agent sound ambitious. It gives the agent a stopping rule.

After the first run, a reviewer should be able to open the output folder and answer:

What sources were inspected?
Which sources were used, skipped, blocked, or left pending?
Which account or workspace was verified for each connector?
Which facts entered canonical notes?
What evidence supports those facts?
What did validation check?
What failed?
What is the smallest safe next pass?

If those answers are present, the vault is becoming infrastructure. If they are absent, the vault is just another generated archive with better folder names.

The practical goal is not to preserve everything. It is to compile enough source-backed context that the next agent starts with a map instead of a fog.

Sources

Internal links, Obsidian Help
Properties, Obsidian Help
Prompt engineering, OpenAI
Prompt engineering overview, Anthropic
Retrieval, OpenAI
Model Context Protocol specification, Model Context Protocol
OWASP Top 10 for Large Language Model Applications, OWASP
Prompt injection is not SQL injection, UK National Cyber Security Centre