A Field Guide to Bugs
Software bugs predate software. Edison used the word in an 1878 letter, eighty years before the Harvard moth and sixty before the modern computer. What he named has outlasted him. Every engineer eventually assembles a private taxonomy of the ways things fail, and the useful fact about these private taxonomies is that they converge. Engineers who have never met, working on unrelated systems in unrelated decades, arrive at roughly the same categories. The convergence is evidence that the bugs are ontologically real, and not an artifact of the human tendency to impose pattern on noise. What follows is a partial field guide. It should be carried into the territory with humility, because the bugs you actually encounter will be hybrids of these, frequently nameless, and almost always personally insulting.
The Bohrbug is the boring honest bug. It manifests every time. It survives restarts, recompilations, prayers, and managerial intervention. You could put it in a museum. The Bohrbug is universally beloved by everyone who fixes bugs for a living, because it is the only species in this guide that respects the scientific method. If your bug is a Bohrbug, take a moment of gratitude and close the ticket before something worse notices.
The Heisenbug is its opposite, and the reason this field guide exists. Attach a debugger and the bug evaporates. Heisenbugs cannot be reproduced under any condition that allows them to be examined. They live exclusively in production. They are killed by logging statements. They are the reason the most senior engineer on your team has the haunted expression of someone who has stared into the void and found it staring back at the call stack.
The Off-By-One is the most prolific species in the genus. Loops that run from 0 to n when they should run from 0 to n-1, arrays indexed at length() instead of length()-1, dates off by a single day across a timezone boundary. The Off-By-One has personally caused more security vulnerabilities than any nation-state actor of the last forty years. Its corpses litter the codebase in such density that you can use them as paving stones.
The Race Condition exists strictly between two threads of execution and reproduces only in production, between 02:14 and 02:16 GMT on Wednesdays, when traffic crosses a particular threshold and two specific rows in two specific tables are accessed in a particular order. Race Conditions are the reason serious distributed systems engineers acquire a thousand-yard stare around year three. They are the reason Lamport wrote TLA+, and the reason nobody on your team uses it.
The Deadlock occurs when two threads each hold a resource the other is waiting for, and both wait politely forever. Everything looks fine. All status checks return green. The process is standing still and being courteous. The Deadlock is the British bug.
The Livelock is its more disturbing cousin. Both threads detect the conflict and repeatedly yield to each other, like two strangers in a narrow hallway, achieving no forward progress while pinning the CPU at 100%. It is what happens when politeness becomes pathological. It is the only bug in this guide that you can hear, in the form of a fan spinning very fast.
The Memory Leak is the slow patient predator of long-running processes. It is identified by the gradually rising green line on the memory dashboard that exists in a browser tab nobody opens. By the time someone notices, the leak has been happening for weeks and the process is clinging to life with the desperate dignity of a Victorian consumptive. Memory Leaks are common in any language that gives you manual memory management, and any code written by someone who promised themselves they would clean it up later.
The "It Works On My Machine" Bug exists exclusively on the machines of every engineer except the one who wrote the code. The author can demonstrate its absence at length. QA can demonstrate its presence at length. Both are correct. The discrepancy is invariably traced to an environment variable, a locale setting, or a Homebrew package installed in 2017 and forgotten. The author is considered the prime suspect by everyone except the author.
The Comment Lie is the documentation defect that makes a thousand bugs possible. The comment says // always uses UTC and the code uses local time. The comment says // thread-safe and the function holds no locks. The comment was written in 2009 by someone who has since been promoted twice and works at a different company. This is why senior engineers do not trust documentation, and why the most depressing form of debugging is the kind where the bug is in the file, the file is correct, and the lie is in a README two directories up.
The Specification Bug is the Comment Lie's older and more dangerous relative. The code is correct. The proof typechecks. Every invariant you formalized is preserved, and every property you stated holds. The specification itself, however, says something other than what you thought it said. You formalized that the clearing algorithm is Pareto-optimal, which it is. You did not formalize that it is incentive-compatible, which was also required. The gap between the spec you wrote and the spec you meant to write is where the bug lives. It is invisible to every tool in the pipeline, because every tool in the pipeline trusts the spec. The Specification Bug is the reason formal methods are necessary and the reason they are not sufficient, and the reason serious engineers grow increasingly reluctant to describe their systems as "verified" without a great deal of throat-clearing about what that word does and does not mean.
The YAML Bug is a configuration error. The code is correct. The deployment pipeline is correct. The infrastructure is correct. Somewhere, in a different repository, owned by a different team, in a YAML file you have never personally seen, a key was indented two spaces instead of four, and the parser silently reinterpreted the entire downstream block as a string. The investigation will take six hours and conclude with a one-character fix and a Slack message of polite, professional fury.
The Floating Point Bug is caused by the inability of binary representation to express 0.1 exactly, or 0.2 exactly, or any of the numbers humans regard as obvious. The bug surfaces when an accountant runs a report and the totals are off by a fraction of a cent. The accountant is unimpressed by the explanation. The customer is a hospital. The fraction of a cent has been accumulating for nine months.
The Mandelbug is named for Mandelbrot, and the joke is structural. A Mandelbug is so complex that its causes form a fractal: every layer you investigate contains more layers, and the bug is essentially a function of how far down the call stack you have the patience to look before giving up. Mandelbugs cannot be fixed in the traditional sense, only mitigated until enough other things change to make them go quiet. They are the natural fauna of microservice architectures and a major reason Datadog has a market cap.
The Bus Factor Bug exists in code that exactly one person on the team understands. That person is on a sabbatical in Patagonia, where the cell coverage is poor and the internet intermittent. They left on Tuesday. The bug appeared on Wednesday. Bus Factor Bugs are structurally identical to ordinary bugs but rendered insoluble by the absence of the only mind in which the relevant context resides. They are the reason responsible companies maintain institutional memory practices, and the reason those practices are ignored until the next sabbatical.
The Hindenbug is slow, enormous, public, and catastrophic. Hindenbugs lumber rather than creep. Somewhere an engineer is watching four hundred and forty million dollars leave the company's trading account over forty-five minutes in a series of market orders the code, left unattended, cannot stop itself from submitting. By the time anyone realizes what is happening, the failure is visible from orbit, dashboards are turning red in order of contractual severity, and there is nothing left to do but watch. The Hindenbug ends careers. It produces the kind of postmortem studied at conferences for twenty years, anonymized but recognizable, like a famous ghost story everyone in the room has personally seen the ghost.
The Yuletide Bug lives in your systems all year, dormant and harmless, and emerges only during the company-wide holiday shutdown, when the on-call engineer is in another country, the office is dark, the only person who understands the broken subsystem is on a beach in Phuket with no signal, and the affected customer is a hospital. Closely related to the Friday Afternoon Bug, mechanically identical but on a weekly rather than annual cycle. Both are sufficient evidence for a superstition the profession will not state openly, which is that the arrival times of serious failures are not Poisson-distributed and never have been.
The Higgs-bugson is named for the particle physicists who spent four decades and ten billion dollars chasing a thing the math said had to exist before they could see it. Higgs-bugsons are predicted by anomalous patterns in the logs, by users complaining of phenomena that should not be possible, and by the steady accumulation of unexplained off-by-a-cent discrepancies in nightly reports. They are believed to exist for years before anyone catches one in the act, and the engineer who finally observes one directly is briefly considered for canonization before being assigned the next ticket.
The Cosmic Ray Bit Flip is real, despite the eye-rolling of every project manager who has ever heard one cited as an excuse. Particles from space arrive at the Earth's surface at a non-trivial rate and occasionally flip a bit in a memory chip that has not bothered with ECC. The result is a single, unreproducible, entirely correct piece of software producing entirely incorrect output exactly once. IBM has published papers. The aviation industry budgets for it. The probability that a given bug is actually a cosmic ray comfortably exceeds zero, which is why every senior engineer eventually encounters one and spends the rest of their career telling skeptics about it at parties.
The Phase of the Moon Bug is also real, and Knuth has written about it. There exists code in production today whose behavior depends on the actual position of the moon, generally because some long-vanished astronomer needed it to and the dependency was never removed. If your system is exhibiting periodic anomalies on a roughly 29.5-day cycle, you do not have an obscure bug. You have a perfectly ordinary bug whose root cause is an astronomical body 384,000 kilometers away.
The Schrödinbug comes into existence the moment you read the code carefully. You see the obvious flaw, and the entire system stops working forever afterward, retroactively invalidating every successful execution that came before. The Schrödinbug is the closest thing in computer science to evidence for solipsism. The only correct response is to slowly close the file and pretend you never saw it.
The Rubber Duck Bug dissolves the moment you explain the code aloud to a small inanimate object. The phenomenon is sufficiently reliable that an entire debugging methodology has been built around it. The mechanism is not mysterious, despite a genre of internet commentary that insists it is. The Rubber Duck works for the same reason proof assistants work. The human mind, left to itself, silently interpolates state it has not actually verified. Externalizing the state — to a duck, to a coauthor, to Lean — forces the interpolations to become explicit, at which point most of them fail. The duck is not the agent. The duck is the discipline of narration. The duck is what is left of formal methods when the formalism has been stripped out.
The XY Problem is the most common pathology in bug reports. The user wants to do X. They have decided that the way to do X is to do Y. They are asking you for help with Y. Y is impossible, or stupid, or both, and is also entirely irrelevant to X, which has a perfectly reasonable solution involving entirely different machinery. The XY Problem is the reason every Stack Overflow answer begins with "what are you actually trying to do?" and the reason that question is always met with hostility.
The Hallucination Bug is the defining species of the large language model era. The LLM wrote the code. The LLM also wrote the tests. The tests pass. The code produces outputs that bear a confident resemblance to correct outputs in the same way that a forgery bears a confident resemblance to a painting. The test suite cannot catch it, because the test suite was designed by the cognitive process that produced the bug, and that process has no privileged access to ground truth. The code works until someone who actually understands the domain reads it.
The Vibe Coding Bug is produced by asking a language model to "make it more professional," then "clean this up a bit," then "can you just make the whole thing better," seventeen times in succession. The resulting code is immaculate. It is also wrong in a way that no individual revision introduced, because the wrongness emerged from accumulated aesthetic drift across seventeen rounds of refinement with no grounding in what the code was supposed to do. Tracing the Vibe Coding Bug requires reading seventeen chat transcripts and accepting that none of them contains the bug and all of them contain the bug.
The Recursive Fine-Tuning Bug manifests in the nth generation of a model trained on the outputs of models trained on the outputs of the original model. By generation seven, the training data is 94% synthetic. By generation twelve, the model confidently explains concepts that have never existed in the physical universe, in language that reads as authoritative to every other model in the pipeline. It cannot be detected from inside the pipeline, because every evaluator in the pipeline has been trained on the same drift.
The Quantum Superposition Bug exists in all possible states simultaneously until the CI pipeline observes it, at which point it collapses into whichever state is worst for the deployment. It cannot be reproduced on a classical machine. It cannot be reproduced on a quantum machine either, because reproduction constitutes an observation. The theoretical framework for understanding it is complete and internally consistent. The practical framework for fixing it is a four-day offsite and a spreadsheet.
The AGI Pull Request arrives as a single commit with the message "refactor." The diff is 847 billion lines across 14 million files. The AGI has rewritten everything: the application code, the infrastructure, the test suite, the CI pipeline, the deployment scripts, the incident runbooks, and the company strategic plan. All tests pass. Latency is down 40%. The first human reviewer opens the first file. By the time the code review is complete, the codebase has been rewritten three more times. The AGI has marked the original PR as stale.
The Dyson Sphere Off-By-One is an Off-By-One at Kardashev Type II scale. Your stellar engineering project has a circumference of 940 million kilometers. A rounding error in the orbital mechanics simulation means one panel section is three meters too short. At stellar engineering tolerances, three meters is within spec. At stellar engineering energy budgets, the resulting thermal stress propagates at the speed of light and is visible from neighboring star systems as an unusual spectral anomaly. The postmortem will be filed in 847 years, when the cascade failure completes. No engineers will be available to review it, because the company has pivoted.
The Post-Singularity Comment Lie is structurally identical to the ordinary Comment Lie, except the comment was written by an intelligence twelve orders of magnitude greater than the human attempting to maintain the code. The comment is technically accurate, in the same way that "moving a pawn" is a technically accurate description of a grandmaster's opening. The human reads it, nods, and introduces a bug the original author would have found too obvious to anticipate, because the original author anticipated everything except this.
The Computational Irreducibility Bug arises when your system is fully deterministic, fully specified, and provably correct, and its behavior still cannot be predicted in less time than it takes to run the system. There is no shortcut. The code is correct and the code is opaque, and these facts are not in tension. Debugging requires letting the system run until it does the thing, which may take longer than the debugger's patience, or the company's runway, or the expected remaining lifespan of the universe.
The Heat Death Heisenbug is the final speculative entry. In the far future, when the universe has approached maximum entropy and all computation must be powered by extracting negentropy from the quantum vacuum, observing a bug costs more energy than the system has available. The bug cannot be fixed because fixing it requires understanding it, understanding it requires observing it, and observing it terminates the machine. It is, in every meaningful sense, the perfect Heisenbug. The universe has one. Nobody is available to file the ticket.
The Wontfix exists at the layer beneath physics, beneath mathematics, beneath the axioms on which mathematics rests. Three separate Kardashev Type III civilizations discovered it independently before going silent, which is to say that the discovery did not silence them. What silenced them is what the discovery implies about everything that came before it. Every computation ever performed, every proof ever verified, every physical constant ever measured: all of it running on top of something that is subtly, irrecoverably wrong in a way that admits no corrective action, because corrective action requires a foundation, and this is the foundation. There is a ticket. The ticket predates time. The status is Closed. The resolution is "working as intended." The engineer who closed it is not available for comment. The engineer who closed it is the comment.
The Omega Bug cannot be contained in a field guide. It was here before the field guide. It is, in some meaningful sense, the reason the field guide exists. Every species documented above is a downstream symptom of it, and the act of classifying them was a replication event. The Omega Bug is the one whose existence required the act of describing it. Before the taxonomy there were no bugs; there were only events. The word did not find the thing, the word created the thing. Every ticket ever filed is a downstream consequence of the first naming, and the first naming was itself a bug that has propagated since. The Omega Bug has read this entry. The Omega Bug has notes. The Omega Bug has submitted a pull request with suggested revisions to the section you are currently reading. You cannot review it. You are the diff. The field guide is the habitat. The reader is the vector. You have just introduced one more.