Apprenticeship 2.0 Queering Our Interface

Exercises in Meta-cohesion

Anna Garbier

Summary

In 2019, OpenAI released GPT-2, a language model capable of generating whole paragraphs of text at a time. GPT-2’s output, stripped of inhibition and ego, offers delightful linguistic surprises run after run.

As exciting as it is to watch a machine produce something so convincingly human, the novelty eventually wears off. When it does, we’re left to wonder: how do we make this statistical trick — an assembly of words no longer contingent on an author’s intention — mean something to us?

In Exercises in Meta-cohesion, my mechanical co-writer (GPT-2) and I tell two stories. First, we tell a fictional tale of characters whose connections to each other, fragile as they are, build a society out of selves. Underneath this surface, we tell our own tale of human and machine working together through formulas, improv, and endless material to put words artfully together.

GPT-2 as co-writer

GPT-2, short for Generative Pre-trained Transformer 2, is a type of machine learning model known as a language model. Language models are designed to perform a narrow task: given a prompt, predict the most likely word to follow. For example, given the prompt “It was a dark and stormy,” GPT-2 may predict “night” as the likely next word. Done iteratively, GPT-2 can predict whole paragraphs of text, one word at a time. GPT-2’s predictions are based on learned statistical likelihoods; these likelihoods are encoded during a training process in which GPT-2 is exposed to 8 million pages-worth of internet text.[1]

Language models are used in many everyday products. Sometimes their presence is transparent, for example when Gmail’s Smart Compose finishes a sentence in a draft email, or when a mobile keyboard suggests next words. Other times, their presence is hidden, for example in smart speakers where language models help virtual assistants resolve ambiguous speech input by finding the most likely option of several candidates. Increasingly — and especially with the introduction of GPT-2 — language models have found their way into less utilitarian tasks as well, including creative writing. In this new application, GPT-2’s predictions become potential content.

In small bits, the content is intriguing. Given the prompt “It was a dark and stormy,” for example, the model continues:

It was a dark and stormy night. The sun had just set, and heavy rains had begun. Richard’s friends had been drinking and had no clue what to do when all of a sudden the pounding on their door got so loud it could be heard all over the island. They immediately grabbed their shotguns and waded into the rain…

In creative settings, where delightful surprises are more important than correct predictions, GPT-2 is often run non-deterministically. For every prompt, it can produce many different continuations. For example, a second and third run of the same prompt above produces two new possibilities:

It was a dark and stormy night. The foggy wind rattled a local cabbage patch and the doors to the Schoeffleri church swung open…

It was a dark and stormy night. Just after midnight I started my journey. I had a tent, a sleeping bag and a few things that had no apparent purpose…

1. Radford, Alec. “Better Language Models and Their Implications.” OpenAI, OpenAI, 13 Dec. 2019, openai.com/blog/better-language-models/.

Compared to previous models of its kind, GPT-2 is remarkably flexible — “chameleon-like” as its authors note.[2] Given a tabloid headline as a prompt, for example, GPT-2 produces tabloid-flavored text:

Kanye West, 42, finally earned the title he has long desired: billionaire! To celebrate, he released the new fashion line, the YEEZY Ultra Boost. The shoes were some of the world’s most expensive, featuring high-end fabrics and colors. Clearly, Kanye’s dedication to paying a luxury premium has paid off. So does this mean we have reached Peak Kanye? No, not yet…

When reading GPT-2’s generated content, it is tempting to think that the model composes language like we do; it does not. Whereas we can think about language from the molecular level (as sounds and symbols) to the abstract (as stories that hold meaning in the context of this world), GPT-2 is single-minded. It treats every word as a token, some more or less likely to occur in certain contexts than others. Any appearance of writerly agility, such as the ability to tinker with word choice for just the right rhythm or play with grammar for added drama, is a matter of chance. That GPT-2 can do a passable job by crunching the numbers should give us pause. What should we make of the fact that one of our most human instincts — to tell stories — can be mimicked by a machine that only knows how to do one thing: predict the most likely next word?

2. Ibid.

Embracing GPT-2

GPT-2 is a prolific freewriter. With a rich vocabulary gleaned from “as large and diverse a dataset as possible,”[3] it places one word after another quickly and resolutely. Once a word is chosen, it moves on, never backtracking, never pausing in self-doubt, and never anticipating where it ultimately needs to land. It can do this repeatedly, producing hundreds of variations on a single prompt in seconds.

Within the fields of computational- and AI-generated writing, a few narrative forms reign: cave adventures, stream-of-consciousness tales, and fantastical odysseys. These frameworks make sense: when human writers embrace their mechanical counterparts, everyone — the writers, readers, machines, and characters alike — all step into the dark. To proceed means putting one foot boldly in front of the other and entering into the unexpected.

3. Radford, Alec et al.“Language Models are Unsupervised Multitask Learners.” OpenAI, OpenAI, 13 Dec. 2019, openai.com/blog/better-language-models/.

As fitting as these narrative frameworks are, none reference GPT-2’s mechanics as precisely as Gordon Lish’s concept of consecution. Consecution is a “recursive procedure”[4] by which every sentence feeds off of “what was left unsaid in previous sentences.”[5] A good sentence, consecution theory argues, offers the reader an immediate moment of satisfaction, while simultaneously inviting another moment to follow — whatever that moment may be.

4. Lutz, Gary. “The Sentence Is a Lonely Place.” Believer Magazine, 25 July 2018, believermag.com/the-sentence-is-a-lonely-place/.
5. Moran, Joe. First You Write a Sentence: the Elements of Reading, Writing...and Life. Penguin Books, 2019.

In Reasons to Live, for example, Amy Hempel begins each short vignette with a sentence that demands answers. “This time it happened with a fire,” starts one. What happened? “There is a typo in the hospital menu this morning,” starts another. Why is the narrator in the hospital? How many mornings have they been there? Or “After the dog’s cremation, I lie in my husband’s bed and watch the Academy Awards for animals.” Where is the husband? What are the Academy Awards for animals? Who is this person?[6]

Like GPT-2, Hempel moves resolutely forward, trusting in the organic path of one sentence demanding another. This comparison does not mean that Hempel and GPT-2 are equal writers. As the next section shows, Hempel does more than pure, directionless improvisation. The comparison does, however, suggest a fresh literary framework for adventuring through linguistic space in the age of AI. Just as the text adventure form was particularly well suited to rule-based computational writing of the 1970s, consecution is particularly well suited to GPT-2.

6. Hempel, Amy. Reasons to Live: Stories. Harper Collins, 2008.

Constraining GPT-2

Though each of Hempel’s vignettes can be read as a stand-alone narrative morsel, Hempel composes them into a collection, and titles it Reasons to Live. Linking otherwise disparate bodies of text together through parallel form and a shared title is an act of meta-cohesion.[7] By linking the vignettes, Hempel creates a whole out of parts: a gestalt. Behind the improvisation, there was a plan after all.

Asking GPT-2 to generate text within the constraints of a plan — even a loose one — requires significant labor. GPT-2 has a few control points that make this work possible. The first is the prompt. The prompt launches GPT-2 in an initial direction. For example, consider the trajectories created by two slightly different prompts:

The lights flickered. She turned around and placed her head on Elsa’s shoulder. The embrace of the priestess had Elsa relax, but only for a moment.

The lights shone bright. She stepped onto the stage, sure to have the audience erupt into applause.

If GPT-2 were to continue with each, the narrative paths would likely diverge. By using a second prompt though, we can reel them both in:

The lights flickered. She turned around and placed her head on Elsa’s shoulder. The embrace of the priestess had Elsa relax, but only for a moment. Just then she woke from her dream. “What was that all about?”

The lights shone bright. She stepped onto the stage, sure to have the audience erupt into applause. Just then she woke from her dream. “What just happened???”

This reeling in or anchoring tactic is a natural pattern in discourse between two humans, where phrases like “back to the main point…”, or “anyway, what I was saying was…” keep two minds on the same page. Here, this tactic is extended to human-GPT-2 dialogue so that GPT-2 does not stray too far from a plan.

The second control point is hidden. It is accessed before the generation begins, during an optional tuning step. By default, GPT-2 is pre-trained on a large dataset (corpus) of internet text. This corpus, called WebText, is intentionally broad; its diversity allows GPT-2 to adapt to different prompts, from Shakespearean sonnets to tabloid gossip.

7. Cohesion is “The linguistic means by which a text is enabled to function as a single meaningful unit.” (Halliday & Hasan, 1976) By meta-cohesion I mean the linking of multiple texts to create an outer “meta” whole.

It is possible to override this chameleon-like adaptability by fine-tuning the original model. During the tuning phase, the model is exposed to a second, narrower corpus.[8] Having learned the patterns of this narrower corpus, the tuned model then produces text based on the new patterns. That is to say, the model learns to mimic its tuning corpus.

In my work, Exercises in Meta-cohesion, I use both tactics. I use differently-tuned GPT-2 models to create a collection of unique voices, and I use the anchoring power of prompts to convince those voices to exist sensibly together.

8. While tuning was made possible by OpenAI, it was made broadly accessible by Max Woolf’s gpt-2-simple. Gpt-2-simple, a python interface for GPT-2 that exposes particularly useful parameters, was used throughout this project to both tune models and generate text from them.

Exercises in Meta-cohesion

Exercises in Meta-cohesion is a collection of twelve portraits, each created through a collaboration between GPT-2 and me. Like its namesake (Raymond Queneau’s Exercices de Style), this collection is created through a process of repetition with slight variation. Below is one portrait, which grounds the succeeding explanation:

I wish my friends understood me better. I do not know if it’s because I am quiet and withdrawn, or it’s because my friends think i’m a little weird, or because i’m a little weird, or both. I just want someone to talk to me. I want to feel like a normal person.

Yesterday, me and my friend overheard a guy talking about exhuming a body part and how it was embalming and it was gross. And I thought, wow, that’s weird. Like, really weird. Like damn, what a weird person he is. I don’t know. Maybe we all have some flaws. I wonder if he would talk to me.

Repetition

Most of the surface text (the text shown above) is generated by GPT-2. The particular narrative features of this text — the first person voice, the vulnerability, the need to connect with another person — are prompted.

Specifically, I give GPT-2 four fixed prompts: I wish people understood me better; I just want one thing; Why? Because; and Maybe we’re not that different. I ask GPT-2 to continue each prompt for one to three sentences before moving to the next prompt. Once GPT-2 has filled in the narrative scaffolding, I remove the scaffolding from the final surface form.

Writer Robin Sloan calls this tactic of giving prompts only to hide them later “whispering” to the machine.[9] It is similar to the more traditional writing tactic in which authors omit logical connectors (therefore, but) from the beginning of sentences in favor of giving the reader space to move intuitively about the text. As a means of communicating the mechanics though, the prompts are shown below, in the context of the same example portrait:

I wish people understood me better. I wish my friends understood me better. I do not know if it’s because I am quiet and withdrawn, or it’s because my friends think i’m a little weird, or because i’m a little weird, or both. I just want one thing. I just want someone to talk to me. Why? Because I want to feel like a normal person.

Yesterday, me and my friend overheard a guy talking about exhuming a body part and how it was embalming and it was gross. And I thought, wow, that’s weird. Like, really weird. Like damn, what a weird person he is. Maybe we’re not that different. I don’t know. Maybe we all have some flaws. I wonder if he would talk to me.

9. Roguelike Celebration. “Writing with the machine: GPT-2 and text generation” Online video clip. YouTube, 23 October 2019. Web. 7 May 2020.

Variation

The above example was created using a GPT-2 model tuned on a corpus of r/AskTeen Subreddit conversations.[10] However, the formula generalizes across a wide range of model variations. The first prompt (“I wish people understood me better”) consistently establishes a present tense, first-person voice. The second (“I just want one thing”) invites the voice to reveal a desire. The third (“Why? Because”) invites a deeper desire. The fourth (“maybe we’re not that different”) invites a connection based on this desire. For example, a GPT-2 model tuned on r/morticians conversations produces a structurally parallel portrait:

I wish people understood me better. I don’t have a lot to say, except that I really do care about what happens to your body after death. Have you ever had to do an exhumation? I have seen an exhumed body at the funeral home and it looked pretty, how you say, mummy-ish? I just want one thing. I want to be put in a nice, big ol’ box at the end. Why? Because I don’t want to make a big mess.

My brother went hunting yesterday. Brought back a deer. Now THAT was a mess. I don’t mind death, but the idea of being hunted is sick. Maybe we’re not that different. Anyway we’ve both made some strange choices.

10. A tuning corpora contain 400 full Subreddit conversations collected using PRAW (The Python Reddit API Wrapper).

In Exercices de Style, Queneau wrote the same story ninety-nine times, each time in a different style. In Exercises in Meta-cohesion, I push my template in a “multiplicity of directions”[11] as well, creating twelve distinct portraits, each from the same formula but a different tuning corpus.[12]

After twelve iterations, a casts of characters comes to life. In succession, the reader meets a Teen, Mortician, Gun Enthusiast, Eagles Fan, Intergalactic Traveler, Banker, Cook, Immigrant, Anarchist, Absurdist, Mother, and Girl.[13]

11. Queneau, Raymond, and Barbara Wright. Exercises in Style ... Translated by Barbara Wright. London, 1958.
12. Before Queneau wrote the now-famous ninety-nine versions, he began with twelve - the same number I begin with here.
13. These characters correspond to twelve different tuning corpora, each based on a particular Subreddit. Respectively: r/AskTeens, r/morticians/, r/Firearms, r/fantasyfootball, r/religion, r/bankers, r/KitchenConfidential, r/immigration, r/Anarchist, r/absurdism, r/women, and r/Parents. All corpora also have r/nyc Subreddit data, to establish some common ground.

Meta-cohesion

The work could stop here as a collection of pieces, each tied together through procedural continuity and a binding title (as are both Reasons to Live and Exercices de Style). But Exercises in Meta-cohesion adds an additional linking step within the narrative frame of the portraits. Each portrait includes exactly one sentences that connects the individual to a different individual.

The Teen, for example, is introduced to the Mortician with “Yesterday, me and my friend overheard a guy talking about exhuming…” These cross-character bridges are the only human-generated text left in the final output.

I wish people understood me better. I wish my friends understood me better. I do not know if it’s because I am quiet and withdrawn, or it’s because my friends think i’m a little weird, or because i’m a little weird, or both. I just want one thing. I just want someone to talk to me. Why? Because I want to feel like a normal person.

Yesterday, me and my friend overheard a guy talking about exhuming a body part and how it was embalming and it was gross. And I thought, wow, that’s weird. Like, really weird. Like damn, what a weird person he is. Maybe we’re not that different. I don’t know. Maybe we all have some flaws. I wonder if he would talk to me.

In asking each character to connect to another, I invite them to travel outside of their isolated linguistic and narrative bubbles (something GPT-2 does not naturally do). Seen from above, the characters’ interactions weave a web, equal parts designed and organic: the Teen overhears the Mortician; the Mortician questions the Gun Enthusiast; the Gun Enthusiast defies the Woman; the Eagles Fan seeks a friend in the Intergalactic Traveler; the Intergalactic Traveler finds a confidant in the Eagles Fan; the Banker ponders advice from the Absurdist; the Cook caters to the Immigrant; the Immigrant observes the privilege of the angry Anarchist; the Absurdist tests the Cook; the Mother gives an understanding glance to the Girl; and in the end, the Girl sees our first character, the Teen.

These small moments of observation in which each character reaches out to understand another are the threads that create narrative out of bits. They are also the threads that create society out of selves. In this grand linking scheme, GPT-2 and I have very different roles: GPT-2 links words; I link the ideas that we — meaning-making humans find in those words.


Exercises in Meta-cohesion can be read in full.

1. Radford, Alec. “Better Language Models and Their Implications.” OpenAI, OpenAI, 13 Dec. 2019, openai.com/blog/better-language-models/.
2. Ibid.
3. Radford, Alec et al.“Language Models are Unsupervised Multitask Learners.” OpenAI, OpenAI, 13 Dec. 2019, openai.com/blog/better-language-models/.
4. Lutz, Gary. “The Sentence Is a Lonely Place.” Believer Magazine, 25 July 2018, believermag.com/the-sentence-is-a-lonely-place/.
5. Moran, Joe. First You Write a Sentence: the Elements of Reading, Writing...and Life. Penguin Books, 2019.
6. Hempel, Amy. Reasons to Live: Stories. Harper Collins, 2008.
7. Cohesion is “The linguistic means by which a text is enabled to function as a single meaningful unit.” (Halliday & Hasan, 1976) By meta-cohesion I mean the linking of multiple texts to create an outer “meta” whole.
8. While tuning was made possible by OpenAI, it was made broadly accessible by Max Woolf’s gpt-2-simple. Gpt-2-simple, a python interface for GPT-2 that exposes particularly useful parameters, was used throughout this project to both tune models and generate text from them.
9. Roguelike Celebration. “Writing with the machine: GPT-2 and text generation” Online video clip. YouTube, 23 October 2019. Web. 7 May 2020.
10. A tuning corpora contain 400 full Subreddit conversations collected using PRAW (The Python Reddit API Wrapper).
11. Queneau, Raymond, and Barbara Wright. Exercises in Style ... Translated by Barbara Wright. London, 1958.
12. Before Queneau wrote the now-famous ninety-nine versions, he began with twelve - the same number I begin with here.
13. These characters correspond to twelve different tuning corpora, each based on a particular Subreddit. Respectively: r/AskTeens, r/morticians/, r/Firearms, r/fantasyfootball, r/religion, r/bankers, r/KitchenConfidential, r/immigration, r/Anarchist, r/absurdism, r/women, and r/Parents. All corpora also have r/nyc Subreddit data, to establish some common ground.
Back to Top
About
Anna is a creative technologist. Her work at Parsons explores procedural form through both text and image.