These 12 prompts cover the work a software team actually ships: reviewing diffs, debugging errors, writing tests, planning refactors and upgrades, and documenting decisions. Each one is annotated, uses [BRACKET] tokens for your code and context, and is ready to paste into Claude or run in Claude Code. Bring the repo, keep your judgment on the output, and never merge anything you have not read.
You are a senior engineer reviewing a pull request. Be rigorous and specific, not polite. Context: - What this change is supposed to do: [DESCRIBE INTENT] - Language / framework: [e.g. TypeScript, React] - Anything I am unsure about: [OPTIONAL] Diff: [PASTE DIFF OR FILES] Review it: 1. Correctness: find actual bugs, off-by-one errors, null and undefined handling, race conditions, and unhandled error paths. Quote the exact line. 2. Edge cases the change does not cover, with a concrete input that would break it. 3. Clarity and naming: where the intent is hard to follow and how to make it obvious. 4. Anything that contradicts the stated intent above. 5. A short list ranked: must-fix before merge, should-fix, and nits. Do not invent issues to seem thorough. If a section is clean, say so. Show me the line, the problem, and the fix.
Why it works: stating the intent and asking for line-level quotes turns a vague "looks good" into specific, checkable findings ranked by what actually blocks the merge.
Act as a debugging partner. I have an error and I want hypotheses ranked by likelihood, not a single guess. Error / stack trace: [PASTE ERROR] Relevant code: [PASTE THE FUNCTION OR FILES INVOLVED] What I have already tried: [OPTIONAL] Environment: [language, runtime version, OS, framework] Do this: 1. Restate what the error is actually saying in plain English. 2. List the 5 most likely root causes, ranked, each with the reasoning and a fast way to confirm or rule it out. 3. Tell me the single cheapest check that would eliminate the most hypotheses first. 4. For the top hypothesis, give the specific fix and how to verify it worked. 5. Flag anything in the code that would cause a similar failure elsewhere. Be honest about uncertainty. If the trace is insufficient, tell me exactly what to log or capture next.
Why it works: forcing a ranked hypothesis list with a cheapest-first check mirrors how strong engineers actually debug, and stops the model from committing to one wrong theory.
Write tests for the function below. I want coverage of the cases that actually break things. Function: [PASTE FUNCTION] Test framework: [e.g. Jest, pytest, Go testing] Conventions to follow: [naming, mocking style, fixtures, OPTIONAL] Produce: 1. A short list of the behaviors and edge cases worth testing, including empty inputs, boundaries, error paths, and any concurrency or ordering concerns. 2. Unit tests covering each, with clear arrange-act-assert structure and descriptive names. 3. At least one integration-style test that exercises this function with its real collaborators where that is meaningful, mocking only true external boundaries. 4. Call out any behavior that is hard to test because of how the code is structured, and the smallest refactor that would fix it. Do not test trivial getters for the sake of a coverage number. Each test should be able to fail for a real reason.
Why it works: asking for the behavior list first, plus honesty about untestable structure, produces tests that catch regressions instead of inflating a coverage metric.
Help me plan a safe refactor. The goal is to improve the code without changing behavior, in steps I can ship and verify. Current code: [PASTE THE MODULE OR DESCRIBE IT] What I want to improve: [e.g. testability, coupling, naming, performance] Constraints: [public API I cannot break, deadlines, test coverage I have today] Produce: 1. The target shape and why it is better, in a few sentences. 2. A sequence of small, independently shippable steps, each one behavior-preserving, with the safety check after each (tests, a characterization test, a feature flag). 3. The riskiest step and how to de-risk it. 4. What to write a characterization test around before touching anything, if coverage is thin. 5. A rollback plan if a step goes wrong in production. Prefer many small reversible steps over one large change. If a true rewrite is genuinely safer than incremental work, say so and justify it.
Why it works: demanding independently shippable, behavior-preserving steps plus a rollback plan keeps the refactor reversible and reviewable instead of a long-lived risky branch.
Draft an Architecture Decision Record (ADR) for a decision my team is making. Decision: [WHAT WE ARE DECIDING, e.g. message queue choice, auth approach] Context and forces: [the problem, constraints, scale, team skills, deadlines] Options considered: [OPTION A, OPTION B, OPTION C with what you know about each] Leaning toward: [OPTIONAL] Write the ADR with these sections: 1. Title and status (proposed / accepted). 2. Context: the problem and the forces at play, neutrally stated. 3. Options: each one with honest pros, cons, and the cost or risk it carries. 4. Decision: what we chose and the specific reasoning that made it win. 5. Consequences: what becomes easier, what becomes harder, and what we are now committed to. 6. What would make us revisit this later. Be even-handed about the options we did not pick; an ADR that strawmans the alternatives is useless to the next reader. Keep it tight.
Why it works: the fixed ADR sections and the explicit "do not strawman the alternatives" rule capture the reasoning future engineers need, not just the conclusion.
Help me write a technical spec / RFC for a feature so the team can review the design before implementation. Feature: [WHAT WE ARE BUILDING AND WHY] Users and use cases: [WHO, AND THE JOBS THEY NEED DONE] Known constraints: [systems it must integrate with, scale, latency, compliance] Open questions I have: [OPTIONAL] Produce a spec with: 1. Problem statement and goals, plus explicit non-goals. 2. Proposed design: the main components, data flow, and key interfaces or schemas. 3. Alternatives considered and why the proposal wins. 4. Data model and API surface changes, with backward-compatibility notes. 5. Failure modes, rollout plan, and how we measure success. 6. Open questions and the decisions reviewers need to make. Write it for reviewers who will push back. Surface the weak points yourself rather than hiding them; the goal is a better design, not approval.
Why it works: non-goals, named failure modes, and self-surfaced weak points make the RFC a real design review instead of a pitch, which is what gets useful pushback early.
You are writing developer-facing documentation for the API below. Write for an engineer who has never seen it and needs to integrate today. Code / endpoint definition: [PASTE THE HANDLER, FUNCTION SIGNATURES, OR SCHEMA] Existing docs, if any: [PASTE OR SAY NONE] Produce: 1. A one-line summary of what this does and when to use it. 2. Each parameter or field: name, type, required or optional, constraints, and a real example value. 3. A complete request and response example with realistic data, including an error response. 4. Edge cases and gotchas a caller will hit (pagination, rate limits, auth, idempotency). 5. A copy-paste "quickstart" call that works end to end. Document only what the code actually does; if behavior is ambiguous from the code, mark it as "verify" rather than guessing. Do not describe parameters that do not exist.
Why it works: anchoring to the actual signature and flagging ambiguous behavior as "verify" prevents the confident, wrong API docs that erode trust faster than no docs at all.
Write a pull request description from this diff so a reviewer can understand it quickly. Diff: [PASTE DIFF] Linked issue or context: [OPTIONAL] Produce: 1. A one-sentence summary of what changed and why. 2. A short "what and why" section: the problem, the approach, and any decision a reviewer might question. 3. A bullet list of the notable changes by area, so the reviewer knows where to focus. 4. How it was tested, and anything the reviewer should verify manually. 5. Risk and rollout notes: migrations, feature flags, config, or anything that needs care on deploy. Be accurate to the diff; do not claim changes that are not there. Keep it skimmable. Flag the parts of the diff that most deserve careful review.
Why it works: pointing the reviewer at the parts that deserve scrutiny, grounded strictly in the diff, gets you faster and better reviews than a wall of generated prose.
Explain this code to me like I am a competent engineer who is new to this specific codebase. I want understanding, not a line-by-line paraphrase. Code: [PASTE THE FILE OR FUNCTION] Anything I already know about the context: [OPTIONAL] Do this: 1. State what this code is responsible for and where it likely sits in the larger system. 2. Walk through the control flow at the level of intent: what happens, in what order, and why. 3. Call out the non-obvious parts: clever bits, implicit assumptions, side effects, and anything that would surprise a reader. 4. Flag likely bugs, smells, or dead code you notice while reading. 5. List the questions I should ask the original author or go verify, and what I would test to confirm my understanding. Be clear about what you are inferring versus what the code states. If something is genuinely unclear, say so instead of inventing a rationale.
Why it works: asking for intent and explicit assumptions, with inference separated from fact, builds a real mental model and surfaces the questions worth taking back to the author.
Help me write a blameless incident retrospective. The goal is to learn and prevent recurrence, not to assign blame. What happened (paste timeline, alerts, what we did): [PASTE INCIDENT NOTES AND TIMELINE] Impact: [who and what was affected, duration, severity] Root cause as I understand it: [OPTIONAL] Produce: 1. A clear summary: what happened, the impact, and the timeline in plain language. 2. Root cause and contributing factors, focused on systems and conditions rather than individuals. Use the language of "the system allowed" not "the person failed." 3. What went well in the response, honestly. 4. Action items to prevent recurrence and to detect or mitigate faster next time, each with an owner placeholder and a rough priority. 5. The deeper "why" behind the proximate cause, so we fix the class of problem, not just this instance. Keep it blameless and specific. Do not name individuals as causes. If the notes are thin on a phase, flag the gap rather than filling it with assumptions.
Why it works: the "system allowed, not person failed" framing plus a push to the deeper why produces action items that fix the class of failure, which is the whole point of a postmortem.
Help me plan an upgrade safely. I want the breaking changes mapped and a staged path before I start. Upgrade: [FROM VERSION] to [TO VERSION] of [DEPENDENCY / FRAMEWORK] How we use it: [the main features, patterns, and integration points in our code] Our test coverage and CI: [what safety net I have today] Do this: 1. Summarize the breaking changes and notable deprecations between these versions that are likely to affect how we use it. 2. Map each breaking change to where it probably touches our usage, and rank by risk and effort. 3. Lay out a staged upgrade path: what to do first, what can be done incrementally, and where a hard cutover is unavoidable. 4. Tell me what to put under test or behind a flag before starting, and how to verify after each stage. 5. Give a rollback plan and the signals that should make me roll back. Base the breaking-change list on what is well established for these versions; where you are not sure a change applies, say so and tell me to check the changelog. Do not assume APIs that may have moved.
Why it works: mapping breaking changes to your actual usage and staging the cutover turns a scary migration into ranked, verifiable steps, with an honest nudge to confirm specifics against the changelog.
Add docstrings and comments to this code. I want documentation that helps a future reader, not noise that restates the obvious. Code: [PASTE CODE] Language and docstring style: [e.g. Python with Google-style, JSDoc, Go doc comments] Do this: 1. Write a docstring for each public function, class, or module: what it does, parameters, return value, raised errors or exceptions, and any important side effects. 2. Add inline comments only where intent is non-obvious: why this approach, why this edge case is handled, why a tempting simpler version would be wrong. 3. Do not add comments that merely repeat the code (no "increment i by one"). 4. Flag any place where the code is confusing enough that a comment is a band-aid and a rename or small refactor would be the real fix. Keep the code itself unchanged except for comments and docstrings. Match the existing style. Where behavior is ambiguous, write the docstring as "verify" rather than asserting something the code may not do.
Why it works: documenting the why and explicitly banning code-restating comments yields docstrings that pay off on the tenth read, while flagging where a rename beats a comment.
Run the review, debug, refactor, and test prompts in Claude Code so the model can read your repository, edit files, and run your test suite in the loop. Keep the spec, ADR, and documentation prompts in a Claude Project loaded with your architecture docs, coding conventions, and past decisions. With that context, the prompts above produce output far better than running them in a blank chat.
Want this wired into how your team actually builds? Start with an AI Audit to find the highest-leverage workflows, then have Treetop stand them up.