Vibe Decoding: Building the Critical Code Studies Workbench
David M. Berry
The 9th Critical Code Studies Working Group ran online from January to February 2026, hosted by Mark C. Marino and Jeremy Douglas. Working Groups have become an important part of the CCS field, bringing together scholars, programmers, artists, and critics to examine the cultural significance of computer source code across several weeks of sustained but asynchronous discussion. In previous iterations I had contributed as a participant, for example on ELIZA as a code object selected for collective discussion and analysis. This time I arrived with a different idea, to vibe code a tool for the practice of undertaking critical code readings.
Over eighteen days, from 19 January to 6 February, working in dialogue with Claude Sonnet 4.5, Anthropic's large language model running in their Claude Code development environment, I developed a web application for annotating and analysing source code. The reason for developing this was the result of working with a group of colleagues attempting to annotate the ELIZA source code during 2024/25 and the difficulty of using off-the-shelf tools like Google Docs. The challenge of creating an ideal tool to undertake this kind of distributed analysis drove many of the design decisions that underpin the development.
Using Claude Code, the Critical Code Studies Workbench grew from a rough prototype to 15,116 lines of TypeScript and React across 363 files. It acquired real-time collaboration, offline resilience with operation queuing, a progressive web app architecture, OAuth authentication through three providers, an admin panel, a project library with an accessioning workflow, AI-assisted annotation with six comment types and threaded discussions, fourteen sample projects spanning the history of computing from 1958 to 2017, custom interface skins, Easter eggs, and session export in three formats. It was deployed to the web and used by CCS Working Group participants as it was being built.
I want to document this process here as a record of what AI-augmented development looks like when undertaken by a humanities scholar rather than a professional software engineer, and what the experience reveals about the changing conditions under which scholarly tools can be made (Berry and Fagerjord 2017). These new tools, like Claude Code, are extremely promising for building digital tools for working with materials, and they can build them exactly to the needs of a research project. However, vibe coding is not just about throwing a prompt into a chatbot interface, it requires care and thought about the environment in which you are developing, including careful curation of the context, what is increasingly being called context engineering, and about how you plan or, as Jeff Shrager reminded me calling back a much older software engineering term, "spec" your proposal (that is creating a technical specification).[1]
The Workbench
Critical Code Studies proposes treating source code as a cultural text amenable to humanistic interpretation (Marino 2020). The practice requires close engagement with code, careful annotation, contextual research, and collaborative discussion. The tools available for this work have been limited. Scholars typically annotate code in word processors or shared documents, losing the relationship between annotation and the specific lines under discussion. The CCS Workbench was conceived as a purpose-built environment for this practice.
The application provides three modes of engagement. Analyse Code offers an IDE-style layout with a CodeMirror 6 editor supporting syntax highlighting for dozens of programming languages, with a custom annotation layer supporting six types, observation, question, metaphor, pattern, context, and critique, each colour-coded and displayed inline beneath the relevant line of code. Learn Methods provides pedagogical scaffolding introducing five critical reading approaches drawn from the CCS literature, close reading, materialist analysis, hermeneutic interpretation, practice-based analysis, and software studies (Montfort et al. 2012, Marino 2020, Berry 2011, Berry 2014). Create Code supports experimental algorithm building through AI-assisted vibe coding in Karpathy's sense (Karpathy 2025).[2]
The collaboration infrastructure goes further than this description suggests. Projects are stored in a PostgreSQL backend with real-time synchronisation, staleness detection to prevent conflicts, and an operation queue that persists to IndexedDB so that edits made during connection interruptions are preserved and replayed when connectivity returns. The system includes OAuth authentication, a project library with an accessioning workflow modelled on archival practice where projects are submitted, reviewed, and approved before entering the shared collection, and an admin panel with user management and orphaned project recovery.
The application has fourteen sample projects included spanning the history of computing, from Grace Hopper's FLOW-MATIC (1958) and Weizenbaum's ELIZA (1965) through the Apollo 11 guidance computer source code (1969), Ward Christensen's XMODEM protocol (1977), Will Crowther's Colossal Cave Adventure (1977), Richard Stallman's GNU Emacs (1985), and William Gibson's Agrippa (1992), to Nanako Shiraishi's git-stash (2007), the Transformer architecture (2017), nine esoteric programming languages spanning 1972 to 2003, and John Gruber's Markdown (2004). Each includes a comprehensive README with historical context, critical code studies analysis, and suggested annotations.[3]
Eighteen Days Later
The Working Group discussion thread documents the development in real time (Berry 2026a). The timeline is worth examining because the pace itself reveals something about the nature of AI-augmented development, and because the scholarly community's engagement with the tool as it was being built shaped what the tool became.
On 19 January I announced the initial concept, a tool for critical code studies practice that would work with local LLMs through Ollama and provide multiple entry modes for engaging with source code. By the following day, discussion on the Working Group had already prompted some significant developments. Jeff Shrager shared a Claude conversation analysing ELIZA code, Mark Marino questioned the LLM's accuracy and advocated for theory-grounded critique rather than generic hermeneutic analysis, and I had developed the skill system, a structured prompt encoding CCS methodology from key texts. The skill compressed over 15 books into a format the LLM could use to shape its analytical engagement. This amounted to context engineering in the sense I have developed elsewhere (Berry 2025d), deliberately structuring the computational intermediation so that the AI's analytical engagement with code would be governed by critical methodology rather than the generic patterns of its training data.
The most consequential intervention came from Erika Fulop, who raised critical concerns about whether algorithmic assistance contradicts the critical nature of human reflection and whether the tool itself required critical examination. It was a good point. Within hours, I implemented an AI toggle allowing users to work without any LLM involvement – a means to control the pharmakon of AI (Stiegler 2013). The Learn Methods mode, built two weeks later, would extend this principle further, providing five structured reading approaches that function entirely without AI assistance. Fulop's critique helped redirect the software to multiple openings into CCS, ensuring that computational assistance remained optional rather than constitutive.[4]
The pace of feature development is difficult to convey without the stating of specific dates, so by 23 January, four days in, the application had code highlighting with CodeMirror 6, built-in sample projects, line highlighting with intensity settings, and a focus mode. By 25 January, six days in, real-time collaborative annotation was operational, with OAuth login through Google and GitHub. The Clippy Easter egg (summoned by typing "clippy" in the UI), an ironic assistant that quotes Walter Benjamin and Jacques Derrida alongside conventional annotation help, appeared the same day. Jeremy Douglas's positive comments on the Clippy Easter egg also encouraged me to add another character inspired by Hackerman from Kung Fury (summoned by typing "hacker" in the UI).
On 27 January, custom interface skins appeared, HyperCard, Myspace, Commodore 64, each less nostalgia than provocation, inviting users to ask how interface design shapes the experience of reading and annotating code. On 29 January, Marino tested the tool and requested reply functionality for annotations. It was implemented and deployed the same day thanks to Claude Code and a lot of very careful prompting. By 30 January, twelve days in, the application had thirteen sample projects, annotation replies with real-time synchronisation, permission management for collaborative projects, and the codebase had been through its first major refactoring, decomposing a 2,381-line monolithic context into seven focused "hooks".
February brought architectural work rather than features. The progressive web app infrastructure went live on 1 February. On 3 February the mode system was consolidated from four modes to three, and the pedagogical guidance system was built, seven new components implementing five critical reading methods with suggested readings and annotation guides. On 5 February the cloud connection resilience system was completed, implementing operation queuing with merge strategies for annotations, files, and replies, the same pattern Google Docs uses to ensure that no edits are lost during network interruptions. Mobile responsiveness was addressed. Auto-save became operational.
On 6 February, an attempt to package the application as a native desktop application using Electron was made and abandoned after three phases of implementation proved unworkable. This was actually a rather painful experience as the complexity of the Electron app was beyond Claude Code's abilities as we just went around in circles trying to implement this platform. In the end I had to abandon it and strip out all of the supporting code. See below.
The rhythm was not that of a planned development cycle. Some days produced thousands of lines of new code across multiple features. Other days were consumed by a single bug, the auto-save infinite loop that required tracing through three layers of React state management before the root cause was found, or the seven failed attempts at mobile responsiveness before a kind of workable approach emerged (it still isn't quite right at version 3.2). The pace was shaped by conversation, with the AI system and with colleagues in the Working Group who were testing features, reporting issues, and requesting capabilities in something closer to real time than any formal requirements process could achieve.
Critique Shapes the Tool
It would, however, be misleading to present this as a solo development effort augmented by AI. The Working Group functioned as something between an imagined user community, a testing team, and a scholarly review panel, and its interventions reshaped the tool in ways I had not anticipated.
Fulop's critique about AI dependency has already been noted. But there were other shaping interventions. Shrager warned that the tool remained a prototype, observing that deployment requires infrastructure, databases, and DevOps work beyond what language models can simply generate. This was technically correct, though practically solvable since the application runs on hosted services that handle infrastructure concerns without requiring DevOps expertise. More interesting was his observation about computing limitations persisting despite AI advances. The Electron failure two weeks later would prove this right in ways neither of us could have expected.
Claire Carroll's response to the AI critique reframed the question productively, suggesting that LLM dialogue functions as conversation rather than final analysis, drawing on Johanna Drucker's work on close reading and computational interpretation. This observation influenced the design of the annotation system, which positions AI suggestions as prompts for further human analysis rather than authoritative interpretations.
Marino's testing and feature requests had the most direct architectural impact. His request for annotation replies on 29 January triggered a significant expansion of the collaboration system, adding threaded discussions with user identification, profile colours, and real-time synchronisation. The feature required database schema changes, new API endpoints, UI components, and synchronisation logic, all implemented and deployed within a single day. That was a lot of work in a very short time period.
What emerged from these interactions was a development process that combined AI-augmented implementation speed with scholarly community direction. The Working Group did not set requirements in any formal sense. Its participants used the tool, encountered limitations, articulated what was missing, and watched those gaps fill. This is closer to participatory design than to traditional software development, though the temporal compression, weeks rather than months or years, changes the character of participation considerably.
What AI Augmentation Can't Do
Reflexively documenting this development process also requires examining not only what worked but what failed. The development succeeded because web development in React and Next.js is extraordinarily well-represented in AI training data. When I described a feature in natural language, Claude could propose implementations that were correct, architecturally coherent, and often surprisingly elegant. The CodeMirror 6 annotation system, the IndexedDB operation queue, the real-time synchronisation hooks, these are technically sophisticated components that I could not have written from scratch. The AI's contribution was not mere typing. It was technical implementation skills drawing on patterns from the vast corpus of web development code in its training data. I have elsewhere distinguished this mode of working as productive augmentation, rather than cognitive delegation (Berry 2025b), a distinction that matters because the human retains critical and architectural judgement whilst the computational system handles implementation within defined parameters. This was an exemplary experience of productive augmentation, and when it is working one feels rather like a conductor guiding an orchestra through a particularly difficult piece of music.
And yet the process was far from frictionless. My idea for a per-file line number tracking feature, which would remember scroll position when switching between files, caused React DOM conflicts that crashed the entire application and had to be completely reverted. Mobile responsiveness required seven failed attempts, each producing code that was syntactically correct but interfered with the browser runtime in ways the AI could not predict. In the end I paused it at a "good enough" version for mobile. The auto-save system entered multiple infinite loops because the AI placed a timestamp in a React dependency array, a mistake that made sense to it, but was disastrous, requiring systematic debugging of root causes across multiple files over several debugging sessions.[5]
The Electron desktop application mentioned above was the most instructive failure. This was a clear example of the competence effect in operation. After completing three phases of implementation, the main process, the IPC bridge, the file system adapter, and the native menu bar, the application simply would not package into a distributable format. The gap between running in development mode and shipping as a native application proved too wide. The problem was not any single technical issue but the accumulation of integration challenges, Next.js static export configuration, Electron's security model, code signing requirements, native module compatibility, that required understanding how multiple complex systems interact at their boundaries. This is where AI augmentation breaks down. Language models trained on documentation and code examples perform remarkably within individual frameworks but struggle at the seams between systems, where implicit knowledge and platform-specific behaviour matter more than API surfaces.
In the end the application ran (sort of) locally but severe and catastrophic failures would occur, such as the pages would drop their CSS or entire bits of the UI would suddenly be unclickable and unresponsive. But most critically, deploying it as a signed, bundled desktop application proved impossible. Claude Code would propose a fix for one integration point that broke another, then fix that and discover the original problem had returned. The circularity was exhausting and, in retrospect, tells us something important about this failure mode with AIs. This was where cognitive overhead really came in as Claude Code would repeatedly tell me that this implementation was within reach, and then create yet another failed version and I felt like if only I managed the project just that bit better it might work. In the end this had to be dropped as an idea. As Claude Code noted,
### 2026-02-06 - Electron Desktop App - ABANDONED
❌ **Electron desktop app abandoned** after Phases 1-3 were implemented and tested. The approach proved unworkable. PWA remains the distribution strategy. Electron files (`electron/`, `electron-builder.json`) and related `package.json` changes should be cleaned up.
One of the things that is really strange about working in this mode is that Claude Code can be quite literal in its interpretation. When I finally had to throw in the towel and accept that the Electron app was not possible, I called a halt to the work and instructed Claude Code we would remove the Electron work. It therefore wiped the project of all traces of code and recorded archives. The only trace is the above file log. When I later attempted to document this failure mode, it is somewhat disconcerting to find that the documentary traces were so thoroughly deleted. I wonder if this is a new type of digital loss in AI-driven scholarship that we will have to contend with. The AI’s "literalism" can be seen as a radical form of archive destruction, and by wiping the failed code (and nearly all its intermediate artefacts), the AI doesn't just "clean up", it erases the hesitations of scholarly thought.
The ARCHIVE.md for the project contains no entry for the Electron app. The only traces that survive are the abandonment notice in WORKING.md above, a stub file, and, somewhat eerily, five orphaned permission entries in the Claude Code settings file still authorising Electron build commands that will never run. The documentary record of several days of intensive work was reduced to what amounts to a footnote and some ghost configurations. This is what I mean when I say the experience of working in this mode can be disconcerting. The tool's literalism, its tendency to execute instructions completely and without sentimentality, means that the archive of failed attempts is not preserved in the way a human developer's abandoned git branches or commented-out code blocks would be. The failure happened, consumed real time and cognitive effort, but its material traces were almost entirely erased at my own instruction. Whilst AI accelerates production, it also potentially destroys the genetic history of a project, including the abandoned branches, byways, and pathways that usually show a scholar's hodos, the path of inquiry, through a project.
The pattern holds across all the failures. AI augmentation dramatically accelerates development within well-understood domains and fails at boundaries, whether between frameworks, between code and browser runtime, or between development and distribution. The training data is densest where documentation is most abundant. At the edges, where real-world deployment introduces constraints that are poorly documented or require accumulated hands-on experience, the augmentation thins and the human developer must either solve the problem independently or accept the limit.[6]
Conclusion
The CCS Workbench is finished (for now), functional, and in active use. Fourteen sample projects spanning the history of computing are available for annotation. Scholars can collaborate in real time, annotate code with six types of interpretive commentary, engage or decline AI assistance, and export their work. It was built in eighteen days by a humanities scholar working through sustained dialogue with a language model, shaped by the critical engagement of a scholarly community.
What this tells us about scholarly tool-building is that the conditions that made it possible, such as extensive web development training data, a mature React component ecosystem, well-documented backend services, a development environment designed for sustained project work, are specific and contingent. A different kind of application, a computational linguistics pipeline, a TEI-XML transformation system, a custom machine learning platform, would likely not work to the same degree. The lesson is not that anyone can now build anything, but that the space of what a scholar with domain expertise can build has expanded, and expanded rapidly, under conditions of AI augmentation (cf. Berry 2025a).
The more interesting question, it seems to me, is what this expansion does to the division between those who theorise technology and those who build it. If a critical theorist can construct the instruments for their own field's practice, that boundary becomes less stable. Indeed, "vibe coding" allows for what we might call Tactical Digital Humanities. The question of who authored the resulting tool becomes a version of what I have elsewhere called provenance anxiety (Berry 2025e), newly pressing when scholar, AI system, and scholarly community all contributed to shaping it. The digital humanities have long argued for the value of building as scholarship (Berry 2024; Berry and Fagerjord 2017; Ramsay 2011). But the speed at which AI-augmented development operates changes the calculation. Eighteen days is fast enough that building becomes part of the intellectual work of a Working Group session rather than a separate multi-year funded project requiring a dedicated development team.
Whether this constitutes progress or a more efficient form of dependency on computational infrastructure is a question that cannot be answered in the abstract. The CCS Workbench is a pharmacological object in Stiegler's sense (Stiegler 2010, 2013), both a tool for critical understanding and a product of the very computational systems it invites users to critique. That tension is not a problem to be solved. It is the condition under which critical engagement with computation now operates (Berry 2025a).[7]
Notes
[1] I have explored the hermeneutic dimensions of this process, the dialogue between human intention and AI generation, and what I term the hermeneutic-computational loop, in a piece focused on vibe coding as critical method. An earlier experiment building a critical code studies tool with Google's Gemini is documented in Berry (2025c), where the limits of that attempt helped clarify the approach taken with the Workbench. This article is concerned with the documentation of the development and what the timeline, scale, and failures reveal about AI-augmented development as a mode of scholarly production.
[2] Karpathy coined the term vibe coding in February 2025 to describe a style of programming in which one fully gives in to the vibes and forgets that the code even exists. The CCS Workbench's Create mode recuperates the term for scholarly practice, inviting users to experiment with code generation whilst maintaining critical awareness of what the AI produces.
[3] The sample project curation constituted a form of scholarly work in itself. Each README required historical research, critical framing, and the identification of productive annotation points. The esoteric programming languages collection, for instance, draws on Daniel Temkin's critical work to frame languages like Malbolge, Shakespeare, and Whitespace as interventions in computational rationality, aesthetics, and the politics of readability. The Transformer sample includes five implementations from different frameworks, Harvard NLP's pedagogical version, PyTorch's production code, and TensorFlow's tensor2tensor, enabling comparative analysis of how the same mathematical architecture is expressed differently across programming cultures.
[4] The question of whether critical reflection requires human cognition, and whether computational tools for critical practice require their own critical examination, shaped the tool's architecture in lasting ways. The AI toggle switch, the pedagogical CCS cards, and the transparent CCS skill markdown document, a structured file that users can inspect, critique, and modify, all encourage critical engagement with the software at multiple levels. The skill system itself is a plain markdown file called Critical-Code-Studies-Skill.md, readable by anyone, that compresses the methods of over fifteen books and articles into a structured prompt. It is not hidden. Users can open it, read it, disagree with it, rewrite it. This matters because it makes the computational intermediation visible rather than opaque, the critical methodology governing the AI's engagement with code is not buried in training data or system prompts but sits in a file that can be scrutinised and contested in the same way one might contest a syllabus or a reading list.
[5] The auto-save infinite loop illustrates the specific character of AI-augmented debugging. The code Claude generated was logically coherent. Auto-save should trigger when the session changes. The session's lastModified timestamp changes when metadata updates. Therefore auto-save should watch lastModified. But this created a cycle, save completes, updates lastModified, triggers auto-save, which saves again, updates lastModified, indefinitely. The fix was to remove lastModified from the dependency array, a change of five words that required understanding React's re-render model at a level the AI's initial implementation had not accounted for. This is, I think, a characteristic AI failure mode, code that follows from correct premises to incorrect conclusions because the premises omit a material constraint.
[6] An earlier attempt to build the CCS tool using local LLMs through Ollama, before the web deployment, demonstrated different limits. Local models lacked the capacity for sustained analytical work, and the infrastructure requirements, adequate GPU, model downloads, configuration, created barriers for scholars without technical backgrounds. The web deployment with cloud-based LLM APIs removed these barriers at the cost of introducing dependency on commercial services. Every architectural decision in this process involved trade-offs of this kind.
[7] The pharmacological character of the Workbench extends beyond its relationship to AI. The tool enables a form of scholarly practice, collaborative code annotation, that was previously difficult to sustain. But it also risks instrumentalising that practice, reducing it to a workflow supported by software rather than a mode of thinking that resists systematisation. Whether the Workbench opens or forecloses interpretive possibility is not a question the tool can answer. It will be interesting to see how scholars use it, and whether they bring to their use the same critical attention the tool was designed to support.
Bibliography
Berry, D.M. (2011) The Philosophy of Software: Code and Mediation in the Digital Age. Palgrave Macmillan.
Berry, D.M. (2014) Critical Theory and the Digital, Bloomsbury.
Berry, D. M. and Fagerjord, A. (2017) Digital Humanities: Knowledge and Critique in a Digital Age. Polity Press.
Berry, D.M. (2024) 'Critical Digital Humanities', in J. O'Sullivan (ed.) The Bloomsbury Handbook to the Digital Humanities. Bloomsbury.
Berry, D.M. (2025a) Synthetic media and computational capitalism: towards a critical theory of artificial intelligence, AI & SOCIETY. Available at: https://doi.org/10.1007/s00146-025-02265-2.
Berry, D.M. (2025b) AI Sprints, Stunlaw, November. Available at: https://stunlaw.blogspot.com/2025/11/ai-sprints.html.
Berry, D.M. (2025c) Co-Writing with an LLM: Critical Code Studies and Building an Oxford TSA App, Stunlaw, October. Available at: https://stunlaw.blogspot.com/2025/10/co-writing-with-llm-critical-code.html.
Berry, D.M. (2025d) Intermediation: Mediation Under Computational Conditions, Stunlaw, December. Available at: https://stunlaw.blogspot.com/2025/12/intermediation-mediation-under.html.
Berry, D.M. (2025e) Provenance Anxiety: LLMs and the Death of the Author, Stunlaw, December. Available at: https://stunlaw.blogspot.com/2025/12/provenance-anxiety-death-of-author-in.html.
Berry, D.M. (2026a) Critical Code Studies Workbench, Critical Code Studies Working Group 2026, Discussion 205. Available at: https://wg.criticalcodestudies.com/index.php?p=/discussion/205/critical-code-studies-workbench.
Karpathy, A. (2025) 'There's a new kind of coding I call "vibe coding"...', X post, 2 February. Available at: https://x.com/karpathy/status/1886192184808149383.
Marino, M.C. (2020) Critical Code Studies. MIT Press.
Montfort, N., Baudoin, P., Bell, J., Bogost, I., Douglass, J., Marino, M.C., Mateas, M., Reas, C., Sample, M. and Vawter, N. (2012) 10 PRINT CHR$(205.5+RND(1)); : GOTO 10. MIT Press.
Ramsay, S. (2011) On Building, Stephen Ramsay Blog. Available at: https://web.archive.org/web/20160512150548/https://stephenramsay.us/text/2011/01/11/on-building/.
Stiegler, B. (2010) Taking Care of Youth and the Generations. Stanford University Press.
Stiegler, B. (2013) What Makes Life Worth Living: On Pharmacology, Polity.
Comments
Post a Comment