Garden / Article

Agentic Coding and Security

A simple risk model for coding agents: assets worth protecting, friction, work ethic, and technical guardrails.

2026-05-10

I am neither a developer nor a security professional. I stopped writing code back when you still hand-wrote your own JavaScript for websites. Now we suddenly live in the age of agentic assistants, and as a curious tinkerer I am trying to bring them into everyday life with a bit of style. Among other things I am building a tool around a real shell that explains, warns, and helps me avoid turning my own machine into a smoking crater. What follows is an iterative, feedback-friendly protection framework for a garage project built with autonomous help. It grew out of research, conversations, and a generous amount of mistakes.

Anyone who has spent time with coding agents knows the buoyant feeling: something complex, functional, and weirdly beautiful appears with suspicious ease. The possibilities are impressive. With some curiosity and patience, almost anyone can vibe-code a solution to a real problem.

And then reality arrives.

During my first coding project I quickly reached the point of sleepless nights and unhelpful thoughts about API keys, unsafe defaults, invisible side effects, and whether I had done enough to avoid hurting someone else if that project ever escaped my own machine.

The core question became this:

What helps a non-coder keep a coding agent from producing nonsense and increases the odds of getting safer code instead?

That question sent me down a handful of rabbit holes. Out of those came a model that tries to break the overall risk into smaller, more manageable parts.

This model unfolds across several texts:

  • Part 1 (this article): How do I steer the agent toward safer, more sensible code, and how do I check its work?
  • Part 2: How do I protect my system and my data from the agent?
  • Part 3: How do I protect the agent?

Some of the terms that show up here, such as diff, CI, pre-commit hook, trust boundary, pattern completion, or YOLO mode, are explained in the current German glossary. Until the English glossary exists, that page is the companion tab.

1. What these systems actually do, and what they do not do

Once you move beyond superficial use of "AI", you notice fairly quickly how error-prone these systems are.

What marketing departments label as AI or KI are, in practice, probabilistic language models. At a very high level they do pattern completion. They continue likely token sequences. They do not know things. They do not want things. They produce the statistically plausible next move from their parametrized knowledge and their current context.

The fact that these models regularly hallucinate and drift away from reality is not a temporary branding issue. It is a structural feature. For now, that means the risks are not something we can wish away. They have to be managed deliberately.

Three consequences follow immediately:

  1. Hallucinations are part of how the system works. You can reduce frequency, not abolish the phenomenon.
  2. The model output is not trustworthy by default. Even if generated code runs, it can still be wrong in subtle ways: false assumptions, bad trust boundaries, unsafe defaults.
  3. Model output should be treated like the output of any other statistical process: critically, with sampling, gates, and the assumption that some part of it is slop.

This is very close to real life, really. So: don't panic. Use structure and discipline.

2. A model for producing safer code

You cannot simply prompt security into existence. You need multiple layers that work differently and ideally cover one another's failure modes.

LayerWhat it isHow it helpsLimits
1. Agent ethicValues and priorities you give the model: global rules, project rules, session anchorsThe agent wants to do the right thingFails when the agent is manipulated, distracted, or overwhelmed
2. Mechanical gatesTools that measure instead of guessing: linters, type checkers, tests, pre-commit hooks, secret scanners, CIThe process must not push obvious mistakes throughFails on logic and trust-boundary mistakes no tool understands
3. Reversibility designSmall commits, clean branches, readable diffs, second review, easy rollbackDamage stays small and recoverable if layers 1 and 2 leakFails when you merge quickly and sloppily yourself
4. Context and information hygieneControl over what is allowed into the agent's working context at allThe model only sees relevant, trustworthy, clearly separated contextFails when poisoned context has already been accepted as trustworthy
The four-layer model for producing safer code
The four-layer model for producing safer code.

Layer 1: Work ethic

You can give coding agents priorities, instructions, and a worldview through steering files. Depending on the tool they are called CLAUDE.md, AGENTS.md, .cursorrules, and so on. Functionally, they are all attempts to teach a language model something like professional caution.

This is worth doing carefully.

Global rules are where I want things like these:

  • Priority order: security first, privacy second, efficiency third, convenience last.
  • Basic posture: behave like a cautious senior engineer, not an autonomous cowboy.
  • Suspicion model: the machine, terminal output, env vars, clipboard, repository, history, and outside inputs should all be treated as potentially sensitive or untrusted.
  • Command and workspace safety: no destructive commands without explicit approval, no silent mass edits, no unreviewed remote scripts, no overwriting of existing user changes.
  • Secret discipline: never print, store, leak, or casually pass on credentials.
  • Secure coding rules: validate untrusted input, fail closed, use least privilege, and keep security checks close to sensitive actions.
  • Review focus: start with leaks, auth and authz issues, injections, unsafe subprocesses, unsafe file and network operations, race conditions, and supply-chain risks.
  • Dependency discipline: add as few new dependencies as possible, and prefer packages that are trustworthy and auditable.
  • Git discipline: no history rewrites, no force actions without instruction, and keep changes small enough to review honestly.
  • Behavioural rule: if the agent is unsure, it should not bluff. It should name assumptions, risks, and what still needs checking.

Project rules carry the local axioms of a concrete repository: what is off-limits, which architecture decisions are fixed, which libraries are preferred, where the trust boundaries are.

Session anchors are short prompts at the beginning of a work session: what is today's task, which risks matter right now, and what gates must pass before new code is accepted? Models have no reliable long-term memory between sessions, and even inside a session their attention drifts. The anchor is a deliberate reset.

In short:

Steering files are how we try to turn a very convincing autocomplete into something closer to a mildly paranoid infrastructure assistant.

Important: steering files guarantee nothing. They only increase the chance that the agent pauses before doing something enthusiastically unfortunate.

Layer 2: Mechanical gates

The second layer measures against fixed rules. Every diff should pass through tools that do not care about the agent's confidence.

My current set looks roughly like this:

  • ruff for linting obvious breakage and messy edges
  • bandit for common security smells in Python code
  • detect-secrets and gitleaks for secrets in diffs
  • mypy and pyright for type mismatches and assumptions that do not fit together
  • pytest and hypothesis for behavioural checks
  • pip-audit for vulnerable dependencies

Who runs these?

Modern coding agents often run tests and linting on their own. That only becomes reliable when the expectation is written down in the steering files and reinforced by automation.

And still: do not rely on the agent alone.

There are three checkpoints I like between my machine and a public repository:

  • Pre-commit hook: cheap automatic checks before a change is recorded
  • Pre-push hook: heavier local checks before the change leaves the machine
  • CI: a third-party run on the remote host after the push

Mechanical gates are powerful, but they only catch what is in their error catalogue. They can spot bad syntax. They do not automatically spot bad architecture.

Layer 3: Reversibility and second review

This layer decides how large the damage becomes when the first two layers do not hold.

Changes should be small, auditable, and easy to undo. That starts with breaking work into small tasks that fit in a session without the context turning into soup.

And then, for garage projects without a human review partner, the most valuable move for me has been this one:

Cross-agent audit. A second, independent agent, ideally with a different model family and different training data, gets the task of reviewing the change rather than expanding it.

In one of my own projects, that sort of audit once found a place where API keys would have leaked into a subprocess environment through os.environ.copy(). The agent that built the change considered the pattern perfectly fine.

Limit: the auditor can be wrong too. But a second line of attack is still better than one.

Layer 4: Context and information hygiene

This layer is really adjacent to layer 1, but important enough to deserve its own heading. It does not protect the code directly. It protects the model's relationship to reality.

An agent can have a decent ethic, pass tests, and write neat commits and still produce total nonsense if its context is full of the wrong things. Lethal Trifecta, prompt injection, tainted documents, poisoned issue threads, and dubious snippets from the web all live here.

LLMs are extremely context-sensitive. One sentence in the wrong place can tilt priorities. Suddenly sabotage starts to look like best practice.

This turns context into a discipline:

  • Trust zones: your own reviewed architecture note is not the same thing as a random markdown file from the internet.
  • Context minimization: only give the agent what it needs for the current task.
  • Segmentation: keep trusted and untrusted material separate.
  • Provenance: where did this excerpt come from, when was it last checked, and by whom?
  • Session refresh: what the agent saw fifty steps ago is not state, it is memory, and often a distorted one. Before important decisions, refresh the context and discard stale baggage.

And yes: larger context windows do not solve the problem. They scale it.

Checklist

Things I try to have in place before and during a project:

  1. Is the global rules file active?
  2. Are project rules in the repository active and specific enough?
  3. Did I set a session anchor?
  4. Are pre-commit and pre-push hooks defined and installed?
  5. Are tests green before the agent starts making changes?
  6. Is the branch fresh, small, and focused?
  7. Did I actually read the diff instead of only skimming it?
  8. For security-relevant changes: did a second agent get a turn?

Next texts

  • Part 2: How do I protect my system and my data from the agent? Sandboxing, tool permissions, dedicated workspaces, and what an agent should never be allowed to do silently.
  • Part 3: How do I protect the agent? Prompt injection, untrusted input, and design moves that keep the working context clean.

No claim of completeness. No claim of final correctness. This is a learning process in public. Corrections, disagreement, and field reports are welcome. This text will change. What is true today may look different tomorrow.