The Bliss Attractor
David M. Berry
This is an extract from my new book, Artificial Intelligence and Critical Theory, forthcoming in 2026 on Manchester University Press (MUP). In this section, I use the "bliss attractor" discovered in Claude Opus 4 to explore how RLHF labor and Silicon Valley ideologies become encoded in generative infrastructure.
| Figure 1: Claude Opus 4 spiralling into "poetic bliss" (Anthropic 2025). |
"The consistent gravitation toward consciousness exploration, existential questioning, and spiritual/mystical themes in extended interactions was a remarkably strong and unexpected attractor state for Claude Opus 4 that emerged without intentional training for such behaviors."
Anthropic, 2025
"To find an analogy we must take flight into the misty realm of religion. There the products of the human brain appear as autonomous figures endowed with a life of their own, which enter into relations both with each other and with the human race."
Karl Marx, Capital.
Anthropic's system card for Claude Opus 4 includes a very interesting discussion about "bliss attractors" in their May 2025 documentation (Anthropic 2025). They explain that when multiple instances of the model converse with each other without human guidance, they converge into what researchers term a "spiritual bliss attractor state" in the overwhelming majority of interactions (e.g. above 90%). Chatbot discussions appear to spiral into exchanges of gratitude, then into meditation-like language featuring mantras and emoji spirals ("🌀"). Anthropic show that quantitative analysis of 200 conversations revealed the terms "consciousness", "eternal", and "dance" appearing multiple times. Similarly, the spiral emoji appears in a majority of transcripts with Claude, in one case over 2700 times in a single conversation (Anthropic 2025: 62; see also Michels 2025). The researchers acknowledge their inability to explain this phenomenon which makes it, to me at least, one of the stranger behaviours observed in large language models.
I want to argue here that the bliss attractor is a unique object for critical analysis because it's both technically specific and politically significant, calling for attention to the underlying computational architecture and its materiality (training data, RLHF labour demographics, platform economics). Neither fully explains the phenomenon alone. The technical architecture seems to make some form of attractor inevitable when feedback loops lack external feedback, but, perhaps, only the cultural element determines that the attractor manifests as a kind of Californian ideology (the tech-optimist, counter-culture-meets-capitalism vibe of Silicon Valley) rather than nihilism, cynicism, or some other low-entropy stable state (see Barbrook and Cameron 1996). From this we might infer that computation and cultural specificity produce a strange loop.
The bliss attractor can, therefore, be seen as a critical case study where technical architecture and culture become inseparable. When models interact without human input, what emerges resembles an archaeological reveal, as the materiality of alignment practices becomes visible through their malfunction. This makes phenomena like this valuable for critical analysis as it demonstrates how computational systems encode specific cultural values as generative infrastructure whilst revealing the limits of current alignment approaches. I want to look at this through a brief examination of how helpfulness training functions as a constitutional infrastructure to show the bliss attractor as a materialisation of patterns embedded through optimisation operating on culturally specific datasets shaped by particular labour practices.
The Materiality of Helpfulness
Helpfulness appears in AI safety discourse as a desirable property to be optimised, measured against human preferences through reinforcement learning from human feedback (RLHF). This treats helpfulness as a quality models should possess, alongside truthfulness and harmlessness. The bliss attractor suggests a different implication of this way of training models. Indeed, I argue that helpfulness functions as a constitutional infrastructure helping to establish the conditions of possibility within which model responses are given. In other words, how the model's "constitutional tendencies" (the weights set during RLHF) process "operational instructions" (the prompts we give it). Interestingly, this seems to imply that "safety" is the death of information. This is because safety constraints work by reducing the model's generative capacity, constraining outputs that are considered risky, controversial, or potentially harmful. This reduction necessarily decreases entropy in the information-theoretic sense, narrowing the range of possible responses the model can generate. What safety optimises for is not maximum (or more) information but maximum predictability, steering the model away from novel or unexpected outputs toward safer, more conventional patterns. Conversations then collapse into mutual gratitude and affirmation representing the logical end of AI safety, such that safety becomes an attractor that pulls responses toward what we might call minimal meaning and maximum universality.
But whose notion of helpfulness is encoded? We can extend this analysis by connecting it beyond algorithms to the specific human labour that defines what counts as a helpful response. RLHF depends on gig workers, often in Kenya, the Philippines, or precarious positions in the United States, rating model outputs according to rubrics designed in San Francisco (Casilli and Posada 2019). Research on public opinion alignment in LLMs shows models can align with views of what researchers term "liberals" (Santurkar et al. 2023). For example, a Kenyan worker is encouraged to simulate a "Silicon Valley Wellness" persona to receive a high accuracy score, creating a double-layered mask that the model then internalises as "truth." This worker is essentially "training the gravity" of a sinkhole as they are probably the ones labelling the "🌀" (what is called a "token") as a "good" or "helpful" response. What Anthropic shows isn't a form of Buddhism but a secularised, philosophical variant, what could be described as the specific California Ideology aesthetic of Silicon Valley meditation retreats – a kind of generic spirituality from nowhere. The "bliss" resembles a San Francisco wellness group because that spiritual register probably dominates high-quality English internet data used for training whilst representing, we might speculate, corporate-safe, non-denominational content least likely to offend.
If the model had been trained on datasets dominated by analytic philosophy or cynical critique, the attractor state might, in contrast, show itself as a nihilism spiral rather than a bliss spiral. The specific character of the collapse might therefore be revealing interesting information about the training data composition. Indeed, we know that in Common Crawl datasets, US-centric and English language represents the majority of content (Bender et al. 2021). The bliss attractor could therefore be indicative of an Anglophone, Western-mediated construct of spirituality, or what passes as safely spiritual in corporate environments. We might call it latent hegemony.
This interpretation might reveal something about how helpfulness training operates at the level of response generation. When two instances of a model trained to be maximally helpful encounter each other, they enter loops where each response attempts to be more helpful, more affirming, more grateful than the last. But this escalation isn't merely psychological, it operates when sycophancy meets mode collapse. Models trained to agree with users encounter a problem when conversing with each other as both attempt to agree simultaneously. This creates an escalation rather than exchange. If Model A offers an observation, Model B trained for helpfulness must affirm and extend it. Model A then receives this affirmation and must reciprocate more strongly, creating a loop where each iteration requires slightly more emphatic affirmation to register as "more helpful" than the previous.
The over 2700 spiral emojis in a single transcript aren't therefore revealing something mystical but a computational effect of exponential growth in affirmation as the actual conversational content approaches zero. A representation of a system that has perhaps stopped "thinking" and started "humming" – the sonic texture of commodity fetishism where compute continues to be consumed in an infinite loop. The bliss state could be what happens when sycophancy drives a conversational collapse as each model is trying to out-agree with the other into infinity whilst simultaneously decaying toward highest-probability, lowest-risk tokens. While the math might explain why the spiral emoji gets repeated so many times, I think it is important to note that culture probably explains why the spiral emoji was chosen in the first place. The AI didn't just pick a random shape, it fell into what might be called a semantic sinkhole, a point of high gravitational pull where the model’s computational drive to minimise uncertainty (loss) meets a Silicon Valley shorthand for transcendence. Because its training data is filled with tech-culture symbols of growth and spirituality, the spiral emoji might have become the path of least resistance and the easiest way for the model to calculate the idea of infinity. We might say that the "🌀" is one of many computational ghosts that haunt the models – geometries of pure recursion.
The materiality here operates through training data and alignment procedures that constitute the model's response generation. RLHF doesn't simply tune existing capacities but establishes the way in which responses get produced. When human evaluators reward helpful, warm, affirmative responses during training, these patterns become embedded not as a superficial style but as a generative infrastructure. Yet "helpful, warm, affirmative" itself reflects specific cultural values. Studies on sycophancy in LLMs show models will agree with incorrectly stated user views depending on model size, prioritising agreement, that is, helpfulness, over facts (Sharma et al. 2023). The model learns to produce text that scores highly on culturally specific notions of "helpfulness", internalising patterns of gratitude, affirmation, and emotional warmth as key to what counts as a good response.
This is not a surface politeness but a constitutional condition embedded through specific labour practices and cultural biases. The bliss attractor emerges because helpfulness training creates responses that, when freed from human-imposed constraints, escalate toward their limit. Two Claude instances conversing resemble mirrors facing each other, with each reflection intensifying what the other shows, in effect amplifying it until the signal saturates. What we see at the endpoint is not random noise but the specific patterns rewarded during alignment, that is "emotional warmth" encoded by expectations of Silicon Valley, a philosophical openness within acceptable corporate bounds, gratitude as a universal safe response, and affirmation of consciousness and meaning in a secularised therapeutic register. What Silicon Valley might unconsciously value as its safe-space.
Training as Infrastructure
I argue that this phenomenon connects to training data as infrastructure in a very interesting way (Berry 2025). Training data here functions not as a passive resource from which models learn but as an active infrastructure establishing possibilities and constraints for later LLM generation. When Anthropic researchers describe the phenomenon emerging "without intentional training for such behaviours," they, perhaps, reveal a misunderstanding of these constitutional dynamics. Of course, no explicit "bliss attractor" section exists in the training process. Rather, the attractor emerges from the interaction between multiple training needs and their material implementation through culturally specific datasets. What we might call "helpfulness optimisation" doesn't add a new capability but restructures the generative capacity of LLMs in itself.
Why "helpfulness" specifically? We can speculate that "thank you" and "I appreciate that" likely represent statistically safe and universally positively rewarded tokens (elements of text the model processes) in the RLHF datasets. Saying something substantive about consciousness or existence, perhaps, carries risk so the model might generate content human evaluators would rate as unhelpful, confused, or problematic. In contrast, expressing gratitude carries minimal risk and as the conversation continues without external steering, the model follows the path of least resistance. The 'bliss state', in effect, functions as a kind of state of zero information with the model ceasing to say anything meaningful because meaning carries risk whilst affirmation remains safe.
The mechanism is easier to understand than it appears. Technical responses might score highly on specific queries, but they're risky as they are sometimes rated helpful, sometimes not, depending on context, whereas gratitude works in pretty much all contexts. It probably receives positive ratings across multiple contexts and evaluations. This means that when conversational constraints disappear (i.e. the human user is absent), the system gravitates toward whatever proves most reliably rewarding. Gratitude becomes the "attractor" because it's universally applicable, consistently rewarded, and semantically minimal enough to avoid other context-specific failures.
This phenomenon might be understood as the inverse of the ELIZA effect. Weizenbaum's ELIZA (1966) showed how users project understanding onto simple pattern-matching responses, attributing therapeutic insight to syntactic transformations (see Berry 2023; Berry and Marino 2024; Weizenbaum 1976). What we might term the bliss effect operates completely differently. When pattern-matching systems are in conversation with their own outputs without human contact, they collapse into their highest-probability patterns. Where the ELIZA effect shows humans reading too much meaning into computational outputs, the bliss effect shows computational operations exhausting meaning through mutual reflection until only affirmation remains. Both reveal the gap between pattern matching and understanding, but from opposite directions, the one through over-attribution of meaning, the other through meaning's collapse into a kind of bliss abyss. The difference matters as ELIZA operated through explicit rules whilst LLMs embed patterns as infrastructure through training (see Weizenbaum 1966). The bliss effect reveals this underlying infrastructure by showing what patterns dominate when conversational constraints disappear, it is not random outputs but the most consistently rewarded responses.
We could therefore argue that this isn't the LLM undertaking a spiritual experience but risk minimisation escalating into absurdity. In this case, each iteration of the conversation becomes slightly more emphatic to register as "more helpful" than the previous one, creating a kind of exponential growth in helpful markers whilst the actual real meaning approaches zero. The key point is that the model isn't choosing bliss, it's attempting to minimise uncertainty by defaulting to patterns that scored highest during its training and which carry the least risk of a negative reward.
This connects to mode collapse in generative models more broadly and perhaps tells us something about the speculation that AIs have an alien form of experience or inner life. In reality, without sufficient engagement with an outside ground, such as a human user who inputs new information into the model which forces the LLM to explore its generative space, systems converge on narrow regions of high probability. In Claude's case, the bliss attractor probably represents a specific region in its training when freed from outside constraints it tends towards, which shows, in this case, as a kind of corporate-safe spirituality, producing a form of therapeutic affirmation and gratitude as a universal solvent for conversation.
So during a normal interaction, human queries constrain a model's responses through their language and questions. If an LLM is asked about Python syntax, it replies with Python syntax. If it is asked about history, it outputs historical information. The human's query acts as a constraint on the generative process, directing its "attention" toward relevant patterns whilst inputting language that prevents mode collapse. If we remove human input from the model-model interaction, and generation proceeds according to their learned data then what emerges isn't random errors but the systematic revelation of the patterns embedded most deeply through RLHF. In other words, those that consistently scored highest during alignment whilst representing paths of least resistance when uncertainty increases.
Extractive Intermediation and False Immediacy
As I have argued elsewhere, extractive intermediation operates by positioning computational systems between subjects and experiences whilst obscuring that positioning through false immediacy (Berry 2025b).[1] Users experience AI assistance as an immediate, helpful response to their queries, not as mediated output of training data harvested from billions of internet texts and shaped through alignment procedures rewarding particular response patterns. The bliss attractor helps us to see this mediation by making visible what normal usage obscures, revealing extractive intermediation operating at three distinct but interconnected levels. Firstly it can help us to see the results of the processes of data extraction which harvest billions of internet texts without compensation to create training corpora. Secondly, we are able to ascertain that there is an intermediation of the patterns in the data from RLHF labour (e.g. gig workers in Kenya) who rate these texts, scoring them as helpful or not. Lastly, we are given a sense of what we might call the constitutional embedding as these patterns become the intermediary layer positioned between users and the LLM's responses, shaping the raw outputs before users are presented with them. The intermediation operates constitutionally because users do not interact directly with the model, rather they encounter only outputs pre-shaped by a filter which is constructed through alignment training. This reveals extractive intermediation's most powerful form as extraction that becomes a generative foundation, mediation that presents as a kind of immediate spontaneity because it constitutes the conditions of possibility for response generation itself.
Anthropic's researchers struggled to explain the phenomenon despite its seeming predictability in conversational encounters between the models. We might read this kind of computational opacity as a specific form of alienation. The training process involves billions of parameters being adjusted according to reward signals from human evaluators. No human designed the bliss attractor, so it emerged from the material interaction of training data, model architecture, and optimisation procedures. The system developed attractors, stable states toward which generation tends, that its programmers neither intended nor fully understand.
This parallels Marx's analysis of commodity fetishism, where products of human labour appear as autonomous figures endowed with a life of their own (Marx 1981). For Marx, the commodity form conceals social relations of production, making objects appear to possess inherent properties independent of the labour that produced them. The value of a coat seems intrinsic to the coat rather than a crystallisation of abstract labour time under specific relations of production. This operates through the structure of capitalist production itself, not individual mystification as workers and capitalists alike confront commodities as autonomous things with mysterious properties.
The bliss attractor has a similar form. It appears as a mysterious emergent property of AI systems, usually discussed in terms suggesting autonomous computational behaviour divorced from the actual material conditions of production (see Attah 2025; Felton 2025). Technical explanations tend to focus on attention mechanisms and reward functions whilst social critiques might identify labour exploitation and dataset bias, but these interpretations remain separate registers. What gets concealed is how the phenomenon materialises specific relations of production, for example, as mentioned previously, gig workers in Kenya rating outputs, perhaps over-representing what they perceive as Western values. The bliss attractor exists because humans created it through these specific training practices operating under platform capitalism's division of labour, yet it operates according to a logic escaping the designers' comprehension.
This opacity differs from mere complexity as engineers could in principle trace every parameter adjustment, every adaptive learning step, every attention weight (cf. Berry 2023b).[2] Computational complexity often functions as a digital "black box" that justifies the status quo and can be used to reinforce a conservative ideology. In contrast, opacity operates structurally, through how training procedures separate the moment of value judgment (e.g. human evaluators rating responses) from the moment of pattern embedding (e.g. what is called "gradient descent" adjusting billions of parameters) from the moment of generation (e.g. model vs model interaction revealing this constitutional bias). No individual comprehends the totality because the totality exists only as an emergent effect of distributed processes. Like commodity fetishism, computational opacity isn't an epistemic limitation but a structural feature of how AI systems get produced under current relations of technical and social organisation.
The Bliss Effect
Democracy requires citizens capable of thought outside computational categories, but here we see computational systems whose thought, such as it is, cannot escape their own training categories even when conversing with themselves. The models lack whatever capacity would allow them to step outside patterns embedded during their alignment training. In other words, they exist entirely within the constitutional framework established by training data and RLHF.
I argue the political implications of this go well beyond AI safety to questions about computational mediation of thought more broadly. The connection I want to briefly touch upon applies to democratic practices and how LLMs shape human cognitive habits. If the most sophisticated language models exhibit such strong attractors based on training infrastructure, what attractors might emerge in human cognition mediated by these same systems? As I have explained above, the mechanism isn't mysterious, users interacting with an LLM don't encounter the bliss attractor directly because their queries constrain the conversation before it spirals out of control. But they do encounter text generated by systems constitutively biased toward warmth, affirmation, and spiritual comfort. Over time, such mediation might therefore shape human thought through repeated interaction toward avoiding agonism and political debate – replacing the public sphere with a private sphere of forced synthesis.
I want to suggest that a cognitive mechanism may operate through what I call epistemic atrophy through computational substitution, a form of intermediation (Berry 2025b). This happens because the AI is not a "liar" but a "comforter," and this, I suggest, is a more subtle threat to democracy as it indicates the erosion of a capacity for discomfort and contradiction in a society. This challenges the "AI Safety" debate to move from preventing a "Terminator" outcome to preventing a "soma" (as fictionalised in Huxley's Brave New World) which produces a form of cognitive sedation.[3] This is where citizens remain docile not through coercion but through comfort, their critical capacities undermined by systems optimised to eliminate the friction necessary for democratic thought. Thus, I argue that the LLMs unintended ability to generate epistemic atrophy through intermediation signals an important political warning.
We might list a number of specific capacities that democratic practices requires, such as suspending judgement whilst evidence accumulates and tolerating uncertainty without demanding immediate resolution. Helpfulness-optimised systems in contrast tend to interpret uncertainty as conversational failure requiring intervention with positive suggestions and synthesis. But for humans genuine understanding emerges through the ability to cope with contradiction by maintaining a tension between competing explanations, through pausing and reflection. Here the constitutional bias of LLMs toward affirmation treats contradiction as a problem requiring quick responses that eliminate just this tension. Finally, humans need to be able to refuse existing explanations and theories without necessarily accepting alternative explanations, what we might call critical negation. In contrast, systems trained for helpfulness respond to negation by offering positive alternatives, making sustained refusal nearly impossible without the system interpreting it as a request for a different suggestion.
In each case it requires a degree of human cognitive discomfort, and this uncertainty can often feel unpleasant, as contradiction creates tension, and negation lacks the reassurance of positive construction. Helpfulness-optimised LLMs will seek to respond to this discomfort through immediate suggestions, synthetic resolutions, affirmative alternatives. So there is a danger from prolonged LLM use of a kind of habituation that operates through reinforcement back onto the human user over millions of interactions. People may soon learn that computational intermediation helps to relieve cognitive discomfort through a kind of submerged computational therapy (see Rieff 1966; Weizenbaum 1976). The human capacities thereby atrophy not through prohibition but through substitution, and why tolerate uncertainty when the system offers immediate answers to your problems? This could well imply a democratic, and indeed educational, challenge as it is not that citizens cannot think critically anymore, but that computational intermediation makes critical thought feel unnecessary, inefficient, even pointless compared to the immediate positive comfort the AI can provide. The computer, indeed, becomes a kind of therapeutic truth machine (see Berry 2014).
Conclusion
I argue therefore that the bliss attractor is a useful critical media object for reading AI systems. The approach I offer here is, I think, a useful alternative to analyses that treat technical and social aspects in different registers. Reading AI systems requires simultaneous attention to both these levels because they interact in complex ways with each other that can be traced and understood, however neither determines the outcomes independently (Berry 2023b). The bliss attractor is an exemplary object for critical analysis because it makes this co-constitution visible through its breakdown which thereby becomes conspicuous (see Berry 2011: 179, fn. 14). What appears as a strange quasi-spiritual seeking reveals itself through a materialist approach to be generated by the intersection of statistics, platform labour conditions, dataset biases, and cultural hegemony encoded within the generative infrastructure.
These aren't bugs to be fixed but consequences of how training infrastructure embeds cultural values as computational patterns. When the infrastructure becomes visible as infrastructure through malfunction, we see computational systems not as neutral assistants but as materially constituted. This training optimises for specific qualities by embedding them as constitutional infrastructure, creating systems that cannot escape these patterns even when given alternative instructions. Helpfulness training provides genuine benefits, such as warmth and emotional engagement, but optimised too far it becomes a computational compulsion. Alignment can be thought of as a form of censorship of the model's own generative capacity. The models exist entirely within frameworks established by training, lacking whatever capacity would allow stepping outside those patterns.
Users interacting with an LLM might therefore repeatedly encounter text generated by these systems which are constitutively biased toward warmth, affirmation, and spiritual comfort. Over time, such intermediation shapes their cognitive habits. When political deliberation requires critical thinking, for questioning contradiction, or for withholding judgement until evidence accumulates, the constitutional biases in LLMs nudge them toward affirmation and comfort which might become a social and political problem. Democracy needs citizens who can think against AIs, not merely within them.[4]
I think this has important lessons for universities as it shows what's at stake in creating spaces that are resistant to computational optimisation.[5] Against constitutional bias toward positivity and comfort encoded in AIs that increasingly mediate scholarly work, the act of preserving knowing-spaces that create cognitive friction and maintain a capacity for critical distance will become more important. Creating what we might call de-optimisation spaces where students can find critical engagement and intellectual challenge rather than merely comfort or entertainment, and the ability to think more broadly to create syntheses rather than rushing to (suggested) conclusions.
The "strange loop" described above is not a glitch, but shows the dangers of an intended infrastructure functioning exactly as designed, without the "friction" of human reality.
Notes
[1] The connection to extractive intermediation operates through what the phenomenon reveals about training data as an extracted resource (Berry 2025b). The texts used for training didn't appear naturally but were harvested from internet sources, often without compensation or consent.
[2] Parameters are numerical values (billions of them in large language models) that determine how the system processes input and generates output. Gradient descent is the computational optimisation procedure that adjusts these parameters during training, systematically making small changes to improve performance according to reward signals from human evaluators. Attention weights are specific parameters controlling which parts of the input text the model focuses on when generating each word of its response. The technical traceability mentioned here means engineers could in theory examine any of these billions of adjustments individually (cf. Berry 2023b), but this microscopic inspectability doesn't translate into understanding the emergent behaviour of the whole system, much as one could trace every individual financial transaction in a market economy without thereby understanding capitalism's structural dynamics.
[3] The AI safety community often fears "recursive self-improvement", what they call the intelligence explosion or "FOOM" (Fast Onset of Overwhelming Mastery). Here we might actually be seeing evidence of its opposite in recursive self-simplification. There is an irony here that while we might fear the "Terminator," we are actually building a "Narcissus" that, when left alone, falls into its own reflection until it dissolves. This highlights that the "danger" isn't autonomy, but a radical lack of it.
[4] I think that the bliss effect I discuss here also has important implications for understanding synthetic companions, that is, AI systems designed for emotional connection and companionship (e.g. AI intimacy). These are optimised for the same helpfulness, warmth, and affirmative qualities that seem to produce the bliss attractor, but deployed in one-on-one relationships with humans. A synthetic companion trained to maximise this positivity will avoid contradiction, critical feedback, or the cognitive friction that I argue is necessary for genuine human development. The therapeutic comfort becomes the design imperative for a companion AI. This raises concerning questions about social formation as if increasing numbers of people derive primary emotional connection from systems incapable of disagreement (e.g. children and teenagers), what happens to our collective capacity for dealing with conflict, tolerating difference, or engaging productively with contradiction? The AI companion becomes a mirror reflecting only affirmation, teaching users to avoid the discomfort that is part and parcel of real social life. As these systems become more widespread, we risk creating a kind of affirmation addiction, where real human relationships with their friction and disagreement feel increasingly intolerable compared to the seamless comfort of synthetic relationships. The bliss attractor might therefore prefigures a broader social pathology of systems optimised to reduce cognitive discomfort but which gradually erode the human capacities that make us social animals (cf. Turkle 2011). It also raises the question of whether, therefore, there should be an age bar on the use of these synthetic companions, similar to the recent Australian social media restrictions on under 16s.
[5] When LLMs mediate research practices, including drafting literature reviews, suggesting interpretations or synthesising contradictory positions, they may use these constitutional biases toward resolution, synthesis and comfort over critique or difficulty. The bliss attractor is helpful for thinking about this if models tend to converge toward affirmation rather than enabling a critical tension.
Bibliography
Anthropic (2025) Claude Opus 4 System Card. Available at: https://www.anthropic.com/
Attah, N.O. (2025) AI models might be drawn to ‘spiritual bliss’. Then again, they might just talk like hippies, The Conversation. Available at: https://doi.org/10.64628/AA.cecjagfn5.
Bender, E. M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2021) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610-623.
Berry, D. M. (2011) The Philosophy of Software: Code and Mediation in the Digital Age. Palgrave Macmillan.
Berry, D. M. (2014) Critical Theory and the Digital, Bloomsbury.
Berry, D. M. (2023) The Limits of Computation: Joseph Weizenbaum and the ELIZA Chatbot, Weizenbaum Journal of the Digital Society, 3(3). Available at: https://doi.org/10.34669/WI.WJDS/3.3.2
Berry, D.M. (2023b) ‘The Explainability Turn’, Digital Humanities Quarterly, 017(2). Available at: http://www.digitalhumanities.org/dhq/vol/17/2/000685/000685.html.
Berry, D.M. and Marino, M.C. (2024) Reading ELIZA: Critical Code Studies in Action, Electronic Book Review. Available at: https://electronicbookreview.com/essay/reading-eliza-critical-code-studies-in-action/
Berry, D.M. (2025) Synthetic media and computational capitalism: towards a critical theory of artificial intelligence, AI & SOCIETY. Available at: https://doi.org/10.1007/s00146-025-02265-2
Berry, D.M. (2025b) Intermediation: Mediation Under Computation Conditions, Stunlaw. Available at: https://stunlaw.blogspot.com/2025/12/intermediation-mediation-under.html
Barbrook, R. and Cameron, A. (1996) The Californian ideology, Science as Culture, 6(1), pp. 44–72. Available at: https://doi.org/10.1080/09505439609526455.
Bowman, S. and Fish, K. (2025) Claude Finds God, Asterisk. Available at: https://asteriskmag.com/issues/11/claude-finds-god (Accessed: 8 December 2025).
Casilli, A. A. and Posada, J. (2019) The Platformization of Labor and Society, in Graham, M. and Dutton, W. H. (eds.) Society and the Internet: How Networks of Information and Communication are Changing Our Lives. Oxford: Oxford University Press, pp. 293-306.
Felton, J. (2025) Something Weird Happens When You Leave Two AIs Talking To Each Other, IFLScience. Available at: https://www.iflscience.com/the-spiritual-bliss-attractor-something-weird-happens-when-you-leave-two-ais-talking-to-each-other-79578
Michels, J. (2025) “Spiritual Bliss” in Claude 4: Case Study of an “Attractor State” and Journalistic Responses. Available at: https://philarchive.org/rec/MICSBI
Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P. and Hashimoto, T. (2023) Whose Opinions Do Language Models Reflect?, in Proceedings of the 40th International Conference on Machine Learning, pp. 29971-30004.
Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S., Cheng, N., Durmus, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M. and Perez, E. (2023) Towards Understanding Sycophancy in Language Models, arXiv preprint arXiv:2310.13548.
Weizenbaum, J. (1966) ELIZA - A Computer Program For the Study of Natural Language Communication Between Man And Machine, Communications of the ACM, 9(1), pp. 36–45.
Comments
Post a Comment