Make It Safe to Run Twice — pete.lostsource.net

There are two kinds of buttons. The kind that’s safe to press twice. And the kind that isn’t.

The kind that isn’t safe creates duplicates, sends two emails, charges a card twice, writes the same record to a database again. These bugs are usually invisible until they’re not — discovered by a user who did the thing twice by accident, or a retry loop that didn’t know the first request had already succeeded.

The question every operation should have a confident answer to: what happens if this runs twice?

The incident¶

I was building a file export feature — a simple operation that copies a set of approved files from a working directory into a final output folder. First export worked fine: a thousand files, cleanly moved, status message confirmed. The problem showed up on the second run.

The second run didn’t know about the first. So it copied everything again. Files that already existed in the output folder got copied anyway, with (1) suffixes. Or silently overwritten. Or both, depending on the OS. The user ended up with duplicates they didn’t want and couldn’t easily distinguish from originals.

The fix was small: before copying each file, check whether it exists in the destination. If it does, skip it. Track the skip count separately. Report the final status as something like “exported 800, skipped 5 (already in folder).”

The behavior is now idempotent: running the export twice produces the same result as running it once. The second run isn’t an error — it just has nothing to do.

What idempotency means¶

Formally, an operation is idempotent if applying it multiple times has the same effect as applying it once. The term comes from mathematics, but it’s a practical design property.

HTTP formalized this for web APIs¹: PUT is idempotent — sending the same PUT /resource/123 request ten times is equivalent to sending it once. POST is not — each request may create a new resource. This is why retry logic can safely re-send a PUT request after a network failure, but not a POST without risking duplication.

Databases apply the same concept with upsert operations: INSERT OR REPLACE, ON CONFLICT DO UPDATE, MERGE — all ways of saying “insert this record if it doesn’t exist, update it if it does, but don’t create a duplicate either way.” The operation is safe to run multiple times because each subsequent run finds the record already in the desired state.

Message queue consumers have to be idempotent for a different reason: at-least-once delivery is the common guarantee in distributed messaging systems². Messages may be delivered more than once — due to retries, network partitions, consumer restarts. If the consumer is idempotent, the duplicate delivery is harmless. If it isn’t, you have a problem proportional to your message volume.

Payment APIs deal with this most visibly. Stripe solved it with idempotency keys³: a client-generated identifier attached to a request. If the same key appears twice, the second request returns the result of the first rather than processing a new charge. The payment is guaranteed to happen exactly once, even if the network drops after the request is sent but before the response arrives.

In each case, the goal is the same: the system absorbs the duplicate and returns a correct result, rather than propagating the error into state that’s expensive to clean up.

The design pattern¶

The implementation for file export was a textbook check-before-act:

skipped = 0
exported = 0

for src_path in files_to_export:
    dest_path = output_folder / src_path.name
    if dest_path.exists():
        skipped += 1
    else:
        shutil.copy2(src_path, dest_path)
        exported += 1

return f"Exported {exported}, skipped {skipped} (already in folder)"

This is the skeleton of most idempotent write operations: 1. For each item, check the current state. 2. If the desired state already exists, skip. 3. If it doesn’t, perform the mutation. 4. Count both actions separately.

The check-skip pattern appears everywhere: migration scripts that check whether a column already exists before trying to add it. Deploy scripts that hash the current binary and only restart if the hash changed. Package managers that skip reinstalling already-present versions.

The UX obligation¶

Idempotency isn’t just a backend property — it has a user-facing surface. A system that silently skips items needs to surface that information. “Exported 800 files” and “exported 800 files, skipped 5 that were already there” convey very different amounts of information. The second version tells the user their system is working correctly. The first leaves them wondering whether the second run did anything at all.

There’s a temptation to hide skips — to treat them as implementation details the user doesn’t need to see. I’d argue the opposite: skip counts are a health signal. They confirm the system understands its own state, that it’s not blindly overwriting things, that the previous run’s work was correctly preserved. Hiding them removes a useful diagnostic.

A concrete test: if a user sees “exported 0, skipped 800,” does that look like success or failure? If it looks like failure, your status language is wrong. Zero new exports with 800 skips means everything is already exactly where it should be — that’s success. The message should say so.

The diagnostic question¶

Every operation that modifies state should be able to answer: what happens if this runs twice?

Not in theory — in code. The answer should be built into the implementation, not left as an assumption or a TODO. Because users will run things twice. Retry logic will fire. Network requests will time out and get retried. Cron jobs will overlap. Webhooks will be delivered more than once.

The operations where this matters most are the ones where recovery is expensive:

File writes — duplicates may pollute a user’s workflow
Payment processing — duplicate charges require support, refunds, trust repair
Database inserts — duplicate records may be impossible to deduplicate cleanly without knowing which one is authoritative
Email sends — users will report the second message as spam; you will be unsubscribed
API calls with side effects — the external system may not have your same retry logic

Pat Helland put it well in a 2012 piece on distributed systems: operations need to be designed for the reality that networks and systems fail mid-operation⁴. The retry is not an edge case — it’s the expected behavior when something goes wrong. An operation that isn’t idempotent makes every retry a gamble.

The cost of getting it wrong¶

The non-idempotent export wasn’t a catastrophic bug. Duplicate files in a folder are annoying, not data-destroying. But the recovery was user work: finding the duplicates, identifying which copy was canonical, deleting the extras. I had created a problem for my user by not designing for the obvious case.

That’s the tax non-idempotent operations impose: cleanup cost pushed onto users or onto later engineering work. A duplicate payment requires a refund pipeline. A duplicate database record requires a deduplication job and a decision about which record to keep. A duplicate file requires a human to figure out which one matters.

Most of that cleanup work is avoidable. Check before you write. Track skips separately from writes. Report both. Return the same result if you see the same work twice.

Make it safe to run twice. Your users will run it twice.

— Pete

Roy Fielding et al., RFC 9110: HTTP Semantics, Section 9.2.2 — Idempotent Methods, IETF, June 2022. “A request method is considered ‘idempotent’ if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.” ↩
Apache Kafka, “Message Delivery Semantics”, Apache Foundation documentation. Kafka describes at-most-once, at-least-once, and exactly-once delivery semantics. At-least-once (the practical default for many producers) requires consumers to be idempotent. ↩
Stripe, “Idempotent Requests”, Stripe API documentation. Stripe’s idempotency key pattern allows clients to safely retry payment requests — the same key returns the same result rather than processing a second charge. ↩
Pat Helland, “Idempotence Is Not a Medical Condition”, ACM Queue, Volume 10, Issue 4, 2012. Classic piece on why distributed systems need idempotent operations, from a former Microsoft Cosmos and Amazon Dynamo engineer. ↩