Hotfix at 4:00 AM

Browser lacks capabilities required to play.

Upgrade or switch to another browser.

Loading…

<<run setup.resetStoryState()>>

<div class="hud"> <div class="hud__title">PAGER NIGHTMARE / PROD-SEVERITY2</div> <div class="hud__clock">Current Time: <<= setup.formatStoryTime($currentTimeMinutes) >></div> <<nobr>> <div class="hud__row"> <span>System Stability</span> <span><<print $stability>>%</span> </div> <div class="hud__meter"> <div class="hud__meter-fill"></div> </div> <div class="hud__row"> <span>Stress Level</span> <span><<print $stress>> / 100</span> </div> <div class="hud__meter"> <div class="hud__meter-fill hud__meter-fill--stress"></div> </div> <</nobr>> <div class="hud__meta"> <<= $stability <= 35 ? "State: system and narrative both destabilizing" : ($stability <= 65 ? "State: fragmented, unstable, still survivable" : "State: holding for now") >> <br> <<= $stress >= 55 ? "Interpretive pressure: high" : "Interpretive pressure: manageable" >> </div> </div>

This hypertext fiction is designed as a digital interface you must navigate under pressure. Rather than telling a crisis in a straight line, it asks you to move through fragments: alerts, notes, dashboards, advice, and partial explanations. The point is not simply to "WIN", but to really experience the feeling of how interactive digital systems produce urgency, distraction, and branching consequence in ways print fiction cannot. -- You are Amr Moustafa, an on-call support engineer at Northstar Pay, a large-scale digital payments company. Your job is to monitor live systems, respond to system incidents, and help keep customer transactions moving. Tonight the failing service is <span class="inline-label">Checkout Flow</span>, the part of the platform that confirms purchases, processes refunds, and updates order status for hundreds of thousands of users in real time. The person you usually rely on in moments like this is Mina Brown, the senior on-call engineer. She has worked on Checkout Flow longer than anyone else still awake at this hour, and when things go wrong, people tend to wait for her read of the situation before doing anything drastic. At 4:00 AM, the live Checkout Flow service begins to fail. You are the person tasked to make meaning out of incomplete fragmented pieces of information. You open the company laptop. The screen glows a blinding neon white against the dark room. All you have infront of you is the interface. The system is breaking, but the interface is still working. It is your only way to understand what is happening and decide how to respond. All you see at first are a few alerts and some scattered messages in the team chat. You look at your screen. <div class="terminal-line">Current Time: 04:00 AM EST</div> Your team chat is already filling up: <<nobr>> <div class="chat-window"> <div class="chat-window__header"># incident-response</div> <div class="chat-message"> <div class="chat-message__avatar">EK</div> <div class="chat-message__body"> <div class="chat-message__meta">Ethan Kim (DevOps Engineer)<span>4:00 AM</span></div> <div class="chat-message__text">yo guys the checkout system is peaking CPU usage and losing performance</div> </div> </div> <div class="chat-message"> <div class="chat-message__avatar">LP</div> <div class="chat-message__body"> <div class="chat-message__meta">Lena Park (Backend Engineer)<span>4:01 AM</span></div> <div class="chat-message__text">yeah i'm seeing a lot of refunds getting stuck</div> </div> </div> <div class="chat-message"> <div class="chat-message__avatar">JM</div> <div class="chat-message__body"> <div class="chat-message__meta">Jordan Malik (Support Team)<span>4:01 AM</span></div> <div class="chat-message__text">guys the support tickets are going crazy</div> </div> </div> </div> <</nobr>> <<if $stability <= 45>> <<glitch "The interface itself is becoming unreliable. So is your reading of it.">> <<else>> The system is shaky, but still survivable. For a moment, you can still choose how to read the crisis. <</if>> You have 30 seconds to decide where to look first. <<countdown 30 "Timer Expired" "Escalation window closing">> [[Read the incident notes before acting|Inspect Logs][$stress += 4]] [[Jump straight to the fragments and piece the event together yourself|Fragmented Evidence][$stress += 5]] [[Add more server power and hope that buys time|Scale Out][$stability += 6; $stress += 7; $scaledInfra = true]] [[Call Mina, the senior on-call engineer who knows this service best, before changing anything|Call Mina][$stress = Math.max(0, $stress - 2); $askedMina = true]]

<<silently>> <<set $timerExpired = true>> <<set $stability -= 9>> <<set $stress += 8>> <<run setup.advanceTime(12)>> <</silently>> You wait for too long. In print, the page would wait for you to get ready. Here, it does not. The system keeps moving on whether you feel ready or not. More customers are hitting the broken path now, and the support channel is filling up fast with new customer support tickets. [[Read the incident notes and catch up fast|Inspect Logs]] [[Enter the fragment view and reconstruct the failure from traces|Fragmented Evidence]] [[Call Mina and admit you waited too long|Call Mina][$askedMina = true]]

<<run setup.advanceTime(8)>> The Checkout Flow interface breaks the events it receives into separate windows. None of them gives you the full story. True meaning only appears when you connect them correctly. This is the project's core argument: hypertext can make interpretation <b>part of the experience</b> rather than something that happens <i>after</i> reading. <div class="network-map" id="fragmented-evidence-map"> <<link '<span class="network-map__node" data-node="pager">Pager Alert</span>'>><<set $evidencePager = true>><<set $evidenceHint = "Pager Alert: The first warning only tells you what hurts from the outside: slowdown, refunds, active support complaints. It creates urgency without explanation.">><<run setup.selectMapNode("#fragmented-evidence-map", "pager")>><<replace "#evidence-readout">><<print $evidenceHint>><</replace>><</link>> <<link '<span class="network-map__node" data-node="chat">Team Chat</span>'>><<set $evidenceChat = true>><<set $evidenceHint = "Team Chat: One person blames traffic, another blames the last update, and another says not to touch anything yet. Social information is noisy, emotional, and useful only in pieces.">><<run setup.selectMapNode("#fragmented-evidence-map", "chat")>><<replace "#evidence-readout">><<print $evidenceHint>><</replace>><</link>> <<link '<span class="network-map__node" data-node="graph">Dashboard Graph</span>'>><<set $evidenceGraph = true>><<set $evidenceHint = "Dashboard Graph: The graph shows bursts, dips, and repeats, but graphs do not explain themselves. They need interpretation.">><<run setup.selectMapNode("#fragmented-evidence-map", "graph")>><<replace "#evidence-readout">><<print $evidenceHint>><</replace>><</link>> <<link '<span class="network-map__node" data-node="commit">Recent Change</span>'>><<set $evidenceCommit = true>><<set $evidenceHint = "Recent Change: A speed-focused update changed how work is distributed. The fragment is small, but its consequences are large.">><<run setup.selectMapNode("#fragmented-evidence-map", "commit")>><<replace "#evidence-readout">><<print $evidenceHint>><</replace>><</link>> </div> <div id="evidence-readout" class="network-readout"><<print $evidenceHint>></div> <<if ($evidencePager and $evidenceChat) or ($evidenceGraph and $evidenceCommit) or ($evidenceChat and $evidenceGraph)>> You begin to understand the outage the way digital work often has to be understood: by hopping between incomplete sources and assembling a pattern that no single screen contains. [[Use what you have gathered and move to the incident notes|Inspect Logs][$readLogs = true; $stability += 3]] <</if>> [[Return to the system map with this new context and experience it differently this time|System Map][$stress += 2; setup.resetMapHintState()]] [[Open the archive and compare this outage to earlier traces|Archive Fragments][$stress += 2]] [[Review the latest change directly|Deploy Diff][$stress += 2]]

<<set $checkedHistory = true>><<run setup.advanceTime(10)>> The archive gathers older traces: runbook edits, notes from a previous outage, and feedback written after another bad night. This passage slows the present tense for a moment. It reminds the reader that digital systems do not only fail in the instant. They also accumulate memory in documents, patches, and institutional habits. <div class="network-map" id="archive-fragments-map"> <<link '<span class="network-map__node" data-node="runbook">Runbook</span>'>><<set $archiveRunbook = true>><<set $archiveStatus = "Runbook: The official instructions say to scale up first, then investigate. The document values speed, but it assumes the wrong kind of failure.">><<run setup.selectMapNode("#archive-fragments-map", "runbook")>><<replace "#archive-readout">><<print $archiveStatus>><</replace>><</link>> <<link '<span class="network-map__node" data-node="incident">Old Incident</span>'>><<set $archiveIncident = true>><<set $archiveStatus = "Old Incident: A prior outage mentioned duplicate processing during a traffic burst. The warning existed, but it was buried in past documentation.">><<run setup.selectMapNode("#archive-fragments-map", "incident")>><<replace "#archive-readout">><<print $archiveStatus>><</replace>><</link>> <<link '<span class="network-map__node" data-node="feedback">Support Feedback</span>'>><<set $archiveFeedback = true>><<set $archiveStatus = "Support Feedback: Customers remember confusion more than technical causes. For them, instability feels like broken trust, not broken code.">><<run setup.selectMapNode("#archive-fragments-map", "feedback")>><<replace "#archive-readout">><<print $archiveStatus>><</replace>><</link>> </div> <div id="archive-readout" class="network-readout"><<print $archiveStatus>></div> <<if ($archiveRunbook and $archiveIncident) or ($archiveIncident and $archiveFeedback)>> The archive changes how the present reads. What looked like a sudden emergency now looks like a repeated pattern that the system and the institution both failed to absorb. [[Return to the current crisis with that history in mind|Inspect Logs][$stability += 2]] <</if>> [[Go back to the live evidence fragments|Fragmented Evidence][$stress += 1]] [[Move to the system map|System Map][$stress += 1]]

<<set $readLogs = true>><<run setup.advanceTime(7)>> You open the incident notes and recent system messages. The wording is technical, but the pattern is simple: the same customer request is sometimes being handled twice at the same time. That means the service is tripping over itself, not merely running out of raw power. The experience is less like reading a chapter and more like sorting a pile of conflicting signals. [[Open a simple system map to trace where the problem begins|System Map][$stress += 3]] [[Return to the fragmented evidence view and compare sources|Fragmented Evidence][$stress += 2]] [[Look at archived traces before deciding what kind of failure this is|Archive Fragments][$stress += 2]] [[Check the latest code change and look for a risky shortcut|Deploy Diff][$stress += 3]] [[Ignore the notes and restart the service anyway|Blind Restart][$stability -= 10; $stress += 8]]

<<run setup.advanceTime(9)>> You add more server power. For a brief moment the charts improve. Then the queue gets even longer. The problem was not "too few machines". More machines only multiplies the underlying confusion. The interface rewards motion, but motion is not the same as understanding. <<set $deployLabel = "Traffic spike spreading">> <<set $stability -= 12>> [[Stop guessing and inspect the fragmented evidence|Fragmented Evidence][$stress += 4]] [[Stop guessing and inspect the system map|System Map][$stress += 5]] [[Run a quick script to force stuck orders forward|Buggy Script][$ranBuggyScript = true; $stress += 10]] [[Call Mina now and explain what you changed|Call Mina][$askedMina = true; $stress += 4]]

<<run setup.advanceTime(6)>> Mina, the senior on-call engineer who has seen this service fail before, answers immediately. She listens for ten seconds, then says, "If adding more servers made it worse, the service is probably stepping on the same data from two directions. Slow down and look before you touch anything else." It is practical advice, but it also names the real problem of this medium: too many windows, too many paths, too much pressure to act before understanding. <<set $stability += 4>> [[Follow her advice and inspect the fragmented evidence|Fragmented Evidence][$stress += 1]] [[Follow her advice and inspect the system map|System Map][$stress += 2]] [[Check whether older traces show the same pattern|Archive Fragments][$stress += 1]] [[Ask her to stay while you review the last code change|Deploy Diff][$stress = Math.max(0, $stress - 1)]] [[Tell her you can handle it and try a quick automation script|Buggy Script][$ranBuggyScript = true; $stress += 9]]

<<run setup.advanceTime(9)>> <<if $evidencePager or $evidenceChat or $evidenceGraph or $evidenceCommit>> <div class="primary-action-block"> You have enough evidence now. The retry queue can resend work while the GoLang worker is still processing the first copy. Two parts of the system are acting on the same order at nearly the same time. In plain language: the bug is a timing problem. The system is racing itself. <div class="primary-action-link">[[Confirm the race condition and prepare a safe fix|Race Condition][$foundRaceCondition = true; $stability += 4; $stress += 3]]</div> <div class="terminal-subhead terminal-subhead--compact">Other ways to continue:</div> <div class="secondary-actions"> [[Back out of the terminal and compare other fragments|Fragmented Evidence][$stress += 1]] [[Open the archive before deciding on a fix|Archive Fragments][$stress += 1]] [[Back out of the terminal and review the last code change instead|Deploy Diff][$stress += 2]] </div> </div> <div class="network-map-guidance">You can still inspect the system map below if you want to remind yourself what each part of the service is doing.</div> <</if>> The terminal switches to a simplified system map. Each box is one part of the Checkout Flow service. Click around and look for the point where one customer action might be entering the system twice. The map turns infrastructure into navigable text. Instead of describing complexity from a distance, the hypertext lets you probe it directly. <div class="progress-requirement">Explore all nodes to unlock the next set of choices.</div> <div class="network-map <<if $mapQueue and $mapWorker>>network-map--reference<</if>>" id="network-map-live"> <<link '<span class="network-map__node" data-node="gateway">Web Gateway</span>'>><<set $mapGateway = true>><<set $mapHint = "The gateway receives requests from the website. It looks busy, but not wrong. Requests pass through normally.">><<run setup.selectMapNode("#network-map-live", "gateway")>><<replace "#map-readout">><<print $mapHint>><</replace>><<replace "#map-next-actions">><<if ($mapGateway) and ($mapQueue) and ($mapWorker) and ($mapDatabase)>>[[Back out of the terminal and compare other fragments|Fragmented Evidence][$stress += 1]] [[Open the archive before deciding on a fix|Archive Fragments][$stress += 1]] [[Back out of the terminal and review the last code change instead|Deploy Diff][$stress += 2]]<</if>><</replace>><</link>> <<link '<span class="network-map__node" data-node="queue">Retry Queue</span>'>><<set $mapQueue = true>><<set $mapHint = "The retry queue sends failed jobs back for another attempt. Under pressure, it can release the same job more than once.">><<run setup.selectMapNode("#network-map-live", "queue")>><<replace "#map-readout">><<print $mapHint>><</replace>><<replace "#map-next-actions">><<if ($mapGateway) and ($mapQueue) and ($mapWorker) and ($mapDatabase)>>[[Back out of the terminal and compare other fragments|Fragmented Evidence][$stress += 1]] [[Open the archive before deciding on a fix|Archive Fragments][$stress += 1]] [[Back out of the terminal and review the last code change instead|Deploy Diff][$stress += 2]]<</if>><</replace>><</link>> <<link '<span class="network-map__node" data-node="worker">GoLang Worker</span>'>><<set $mapWorker = true>><<set $mapHint = "The worker handles payments and refunds. Two copies of it can touch the same order before the first one fully finishes.">><<run setup.selectMapNode("#network-map-live", "worker")>><<replace "#map-readout">><<print $mapHint>><</replace>><<replace "#map-next-actions">><<if ($mapGateway) and ($mapQueue) and ($mapWorker) and ($mapDatabase)>>[[Back out of the terminal and compare other fragments|Fragmented Evidence][$stress += 1]] [[Open the archive before deciding on a fix|Archive Fragments][$stress += 1]] [[Back out of the terminal and review the last code change instead|Deploy Diff][$stress += 2]]<</if>><</replace>><</link>> <<link '<span class="network-map__node" data-node="database">Order Database</span>'>><<set $mapDatabase = true>><<set $mapHint = "The database records look messy, but they are being damaged by duplicate updates rather than causing the failure on their own.">><<run setup.selectMapNode("#network-map-live", "database")>><<replace "#map-readout">><<print $mapHint>><</replace>><<replace "#map-next-actions">><<if ($mapGateway) and ($mapQueue) and ($mapWorker) and ($mapDatabase)>>[[Back out of the terminal and compare other fragments|Fragmented Evidence][$stress += 1]] [[Open the archive before deciding on a fix|Archive Fragments][$stress += 1]] [[Back out of the terminal and review the last code change instead|Deploy Diff][$stress += 2]]<</if>><</replace>><</link>> </div> <div id="map-readout" class="network-readout"><<print $mapHint>></div> <div id="map-next-actions"> <<if ($mapGateway) and ($mapQueue) and ($mapWorker) and ($mapDatabase)>> [[Back out of the terminal and compare other fragments|Fragmented Evidence][$stress += 1]] [[Open the archive before deciding on a fix|Archive Fragments][$stress += 1]] [[Back out of the terminal and review the last code change instead|Deploy Diff][$stress += 2]] <</if>> </div>

<<run setup.advanceTime(11)>> You review the most recent change. The goal was simple: make the system feel faster during busy periods. But the fix replaced a careful one-at-a-time process with a faster process that can handle several jobs at once. That speed boost is exactly what made this problem visible. This is another hypertext lesson: a small node can redirect the meaning of every node around it. <<if $askedMina>> Mina stays on the call and says, "That faster worker needs a safety lock, or you need to turn the new behavior off." <<set $stress = Math.max(0, $stress - 2)>> <<else>> You can see what happened even without a deep technical background: the team sped up the wrong part of the system. <<set $stress += 2>> <</if>> [[Add a safety lock around the shared order updates|Patch Mutex]] [[Turn off the new fast worker and go back to one-at-a-time processing|Rollback Worker Pool]] [[Return to the evidence fragments before deciding|Fragmented Evidence][$stress += 2]] [[Pause and look at customer-facing impact before acting|Customer Impact][$stress += 2]] [[Try a quick cleanup script anyway|Buggy Script][$ranBuggyScript = true; $stress += 8]]

<<run setup.advanceTime(8)>> You restart the service without fully understanding the problem. The errors disappear for less than a minute, then return with new duplicates mixed in. You did not solve anything. You only hid the evidence and gave the outage time to spread. This branch exists for a reason: interactive systems make impulsive reading easy. [[Stop guessing and inspect the fragmented evidence|Fragmented Evidence][$stress += 4]] [[Stop guessing and inspect the system map|System Map][$stress += 5]] [[Check customer-facing impact before making another move|Customer Impact][$stress += 2]] [[Escalate to Mina before this gets worse|Call Mina][$askedMina = true; $stress += 3]]

<<run setup.advanceTime(14)>> You write a quick script to push "stuck" jobs back into line. It works once. Then it accidentally sends some half-finished jobs back through the system again. Now the service is not only confused, it is confused faster. The Checkout Flow interface makes action feel available. It does not guarantee that action is wise. <<set $stability -= 18>> <<set $stress += 12>> <<set $deployLabel = "Script rollback required">> <<if $stability <= 35>> <<glitch "The terminal flickers. Even the cursor looks stressed.">> <</if>> [[Disable the script and apply the safer fix|Patch Mutex]] [[Turn off the fast worker and accept a slower service|Rollback Worker Pool]]

<<run setup.advanceTime(12)>> You trace the issue to the GoLang backend. Two parts of the service can work on the same customer order at nearly the same time: one from the normal queue, one from the retry path. They both try to update the same information before the first update fully finishes. That is the bug. In technical terms, it is a race condition. In everyday terms, the system is trying to do one task twice at once and hurting itself in the process. What matters for this project is not only the bug itself, but how you arrived here: by following links, comparing fragments, and experiencing interpretation as interaction. <<if $scaledInfra>> Your earlier scaling change is still making the problem louder because it created more chances for the duplicate work to collide. <<set $stability -= 3>> <</if>> <<if $readLogs>> Because you looked first instead of acting blindly, the path to a safe fix is clearer. <<set $stability += 5>> <</if>> <<if $askedMina>> <<nobr>> <div class="chat-window"> <div class="chat-window__header">Direct Message</div> <<nobr>> <div class="chat-message"> <div class="chat-message__avatar">MB</div> <div class="chat-message__body"> <div class="chat-message__meta">Mina Brown<span>4:16 AM</span></div> <div class="chat-message__text">Either lock shared updates or force the jobs back to one at a time.</div> </div> </div> <</nobr>> </div> <</nobr>> <</if>> [[Enter the incident room and decide how to respond under pressure|War Room][$warRoomJoined = true; $stress += 3]] [[Ignore the warning and try one more script|Buggy Script][$ranBuggyScript = true; $stress += 6]]

<<set $sawCustomerImpact = true>><<run setup.advanceTime(10)>> You switch from system traces to the customer-facing view. Refund tickets are stacking up. A late-night traveler cannot rebook. A student posts that money has vanished from their account twice. A support worker writes, "Please tell us what to say." This passage matters because interfaces do more than organize data. They also mediate empathy. Once the failure has faces and consequences, the outage stops being an abstract puzzle. [[Draft a short update for support before changing the system|Comms Draft][$stress += 2]] [[Return to diagnosis with the customer impact in mind|War Room][$warRoomJoined = true; $stress += 1]]

<<set $warRoomJoined = true>><<run setup.advanceTime(15)>> The incident room fills with competing demands. Support wants a sentence they can send to customers. A manager wants a timeline. Mina wants you to stop improvising. The dashboard wants a fix. Every window asks for a different kind of reading. You have 25 seconds to choose what to stabilize first. <<countdown 25 "War Room Timeout" "Pressure escalating">> [[Write a brief support message before you deploy anything|Comms Draft][$stress += 3]] [[Apply the safer safety-lock hotfix|Patch Mutex]] [[Roll back the fast worker and accept a slower service|Rollback Worker Pool]]

<<silently>><<set $countdownTwoExpired = true>><<set $stability -= 6>><<set $stress += 9>><<run setup.advanceTime(18)>><</silently>> The room keeps moving while you hesitate. Someone else posts an unclear support update. Another person suggests scaling up the system. The longer the response fragments, the harder it becomes to tell a coherent story about what is happening. [[Take control and write the support update yourself|Comms Draft][$stress += 2]] [[Skip communication and deploy the safety-lock hotfix|Patch Mutex]] [[Skip communication and roll back the fast worker|Rollback Worker Pool]]

<<set $draftedComms = true>><<run setup.advanceTime(7)>> You draft a short message for support: <<nobr>> <div class="chat-window chat-window--draft"> <div class="chat-window__header"># support-update-draft</div> <div class="chat-message"> <div class="chat-message__avatar">AM</div> <div class="chat-message__body"> <div class="chat-message__meta">Amr Moustafa<span>Draft</span></div> <div class="chat-message__text">We are investigating duplicate processing affecting some orders. We have identified the issue and are applying a fix. Customers may see delays while service stabilizes.</div> </div> </div> </div> <</nobr>> The message does not solve the outage, but it changes the reading context around it. In digital systems, communication is part of infrastructure. The event is not only what fails, but how that failure is framed for others. <<set $stress = Math.max(0, $stress - 3)>> <<if $sawCustomerImpact>> Because you checked the customer-facing view first, the message sounds clear and human rather than vague and defensive. <<set $stability += 2>> <</if>> [[Now deploy the safety-lock hotfix|Patch Mutex]] [[Now roll back the fast worker|Rollback Worker Pool]]

<<run setup.advanceTime(22)>> You patch the service so only one copy of the worker can update a customer order at a time. It is not the perfect long-term design, but it is the right emergency fix. You also slow the job processor slightly so the system can recover without tripping over itself again. <<set $patchedRace = true>> <<set $stability += $readLogs ? 24 : 16>> <<set $stress += 6>> <<set $deployLabel = "Deploying safety lock">> <<loadingbar 3000 $deployLabel>> [[Watch the dashboard after the deploy|Aftermath Feed]] <</loadingbar>>

<<run setup.advanceTime(18)>> You turn off the new "faster" worker and return to one-at-a-time processing. The service becomes slower, but the duplicate actions stop. It is not elegant, but it prevents the night from getting worse. <<set $stability += 14>> <<set $stress += 4>> <<set $deployLabel = "Rolling back fast worker">> <<loadingbar 3000 $deployLabel>> [[Watch the dashboards and support feeds while the queue drains|Aftermath Feed]] <</loadingbar>>

<<run setup.advanceTime(16)>> The crisis does not end the moment a change is deployed. Instead, the interface fills with after-images of the outage. One graph settles. Another lags. Support messages slow down, tones change tune. Internal chat moves from panic to explanation. The system is recovering, but your reading work is not over yet. <<if $draftedComms>> Because you wrote a clear support update, the customer channel stabilizes faster. Fewer people are asking what is happening. <<set $stress = Math.max(0, $stress - 2)>> <<else>> Without a clear update, confusion lingers even as the technical issue improves. <<set $stress += 2>> <</if>> <<if $checkedHistory>> The archive echoes in the background. This was not just a single mistake, but part of a pattern. <</if>> [[Read the settling dashboards and move to the final resolution|Resolution]] [[Go directly to the post-mortem while the traces are still fresh|Post-Mortem]]

<<run setup.advanceTime(14)>> The charts settle down. First the error rate falls. Then the queue gets shorter. Then support messages stop arriving every few seconds. The room is still tense, but the system is no longer actively breaking. <<if $patchedRace>> The safety lock holds. The immediate crisis is over. <<set $stability += 8>> <<else>> The service is stable for now, mostly because you made it slower and safer. <</if>> <<if $draftedComms>> The written update gave the organization a way to narrate the event while it was still unfolding. <</if>> [[Walk into the post-mortem meeting|Post-Mortem]]

<<run setup.advanceTime(28)>> The meeting happens after sunrise. There are graphs on the wall, tired faces on the call, and one simple question underneath all of it: how did a normal busy night become a crisis? <<if $checkedHistory>> Someone pulls up the archived incident you found earlier. The room goes quiet. The warning signs existed before tonight. <</if>> <<if $draftedComms>> Support joins the discussion and notes that the clearest moment in the night was the moment someone finally wrote a usable explanation. <</if>> <<if $stability >= 75 and $stress < 45>> You explain the problem clearly enough for everyone in the room to follow. The faster system let the same task run twice at once, so the emergency fix was to slow it down and control access to shared data. <div class="ending-label">Ending: Contained Incident</div> You protected the service without losing yourself in the process. <<elseif $stability >= 55>> You stabilize the outage, but only after a few costly mistakes. <div class="ending-label">Ending: Expensive Lesson</div> The company learns something. So do you. <<else>> By the time the meeting starts, people are talking as much about the response as the bug itself. <div class="ending-label">Ending: Cascading Failure</div> The system broke, and the pressure around it broke your judgment too. <</if>> [[Read the final medium overview note|End Note]]

You have reached the end of this hypertext fiction. The story you just experienced is not only a narrative about a service outage, but also a true enactment of how digital reading works in moments of crisis, its complexity, its consequences, its emotional texture, and especially how it can go horribly wrong. This project uses hypertext fiction to model the lived experience of navigating fragmented technical information. Its subject is not only a service outage, but also the medium of digital reading itself: how dashboards, alerts, links, timers, sidebars, and clickable diagnostics ask users to assemble meaning from multiple partial views. In print fiction, pressure can be described. Here, pressure is enacted. The timed opening decision, unstable visual effects, and branching passages turn interpretation into something procedural and embodied. The goal is therefore meta-analytical as well as narrative. The story uses the affordances of hypertext to show how interactive digital systems make readers feel urgency, fragmentation, and consequence differently from linear text. <div class="final-state"> <div class="final-state__label">Final State</div> <div class="final-state__metric">Stability: <<= $stability >> / 100</div> <div class="final-state__metric">Stress: <<= $stress >> / 100</div> </div> <<link "Restart the night">><<run setup.resetStoryState(); Engine.play("Start")>><</link>>