- Magazine Dirt
- Posts
- Paper Cities
Paper Cities
To the sky.

Hartmann Schedel (German, 1440–1514), View of Florence from Liber chronicarum (Nuremberg Chronicle) (Nuremberg: Anton Koberger, 1493).
The Clark Library, NE1255 S35
Daisy Alioto on OpenAI’s Sora and an ongoing art exhibition at the Clark Art Institute.
On February 17th, 2024, OpenAI introduced Sora: “a new AI model that can create realistic and imaginative scenes from text prompts.” The video debut is over 10 minutes long and, at least on YouTube, completely silent. Each prompt is timestamped, with video clips ranging from a “Candle Monster” to a “Surfing Otter.” The clip that generated the most buzz, at least on my timeline, comes in at 08:31 and is called simply, “Tokyo Train.”
Each clip is preceded by a title card with the prompt that allegedly produced the moving image. “Reflections in the window of a train traveling through the Tokyo suburbs,” reads this one. We see brown and tan apartment complexes, the silhouettes of other passengers, and—briefly—a potential self, a young woman with bangs and wired headphones. If there are flaws in this video, my eyes aren’t discerning enough to note them. The silence was pretty eerie though.

In Europe, the mesolithic period is more of an idea than a time period. “It has become clear that the term has significance only in a temporal sense,” writes archaeologist T. Douglas Price, “The Mesolithic is not associated exclusively with the utilization of microlithic tools, nor with the exploitation of forests and coasts, nor with the domestication of the dog.” Instead, like the stone that is carved away and cast off to reveal a statue, the mesolithic is primarily known as absence: the retreat of the last Ice Age, the lack of cultivated agriculture and ceramics, the void of permanent structure. This period lasted roughly from 10,000 to 8,000 BC.
“Time is not a cage in which the ‘no longer now,’ the ‘not yet now,’ and the ‘now’ are cooped up together. How do matters stand with ‘time’? They stand thus: time goes,” writes Martin Heidegger in What Is Called Thinking? “What comes in time never comes to stay, but to go. What comes in time always bears beforehand the mark of going past and passing away. This is why everything temporal is regarded simply as what is transitory.” In this way, the mesolithic is like every epoch.
It was an era defined by triangles.
It was an era defined by triangles. The microliths that made up spear tips and flint tools, the tents that provided temporary shelter and the shape of the glacial shelf retreating North through modern day Britain. The Venus of Willendorf, the most erotic collection of triangles of all, is even older—paleolithic. Similar carvings have been found from France to Siberia. Venus is a stable relic from a nomadic time, an axis the world could rotate around. We haven’t changed much.
In his 1965 article Utopia, the City and the Machine, historian and sociologist Lewis Mumford writes about “the pre-urban community of the Neolithic cultivator,” the period directly after the mesolithic. “The members of the community shared in its goods and its gods—in which there was no ruling class to exploit the villagers, no compulsion to work for a surplus the local community was not allowed to consume, no taste for idle luxury, no jealous claim to private property, no exorbitant desire for power, no institutional war,” writes Mumford. These are conditions that Classical thinkers, like Plato, would later describe as the attributes of a utopian society.
But then came something new: the city. “Much of the contents of the city—houses, shrines, storage bins, ditches, irrigation works—was already in existence in smaller communities: but though these utilities were necessary antecedents of the city, the city itself was transmogrified into an ideal form—a glimpse of eternal order, a visible heaven on earth, a seat of the life abundant—in other words, utopia,” Mumford writes.
For as long as we have had cities, people have been fascinated by them—and no city is more fascinating than the one we don’t live in. Paper Cities, an exhibition currently on at The Clark in Williamstown, Massachusetts through June 23, 2024, tells the story of what happened when accounts of foreign cities were no longer primarily written and oral. Here, we jump 1800 years after Plato.
“In the sixteenth century, a vast consumer market emerged (largely in Europe) for images of cities, spurred by developments in print technology and new global exploration,” write the curators of Paper Cities, including travel books and artist’s own renditions of maps. The exhibition is divided into three sections: The City in View, The City in Focus, and The City in the Background. The museum provides clear magnifying glasses for visitors due to the small details on so many of the prints.
I was drawn to Albrecht Dürer’s Saint Anthony Reading, a 1519 engraving on paper categorized within The City in the Background. The saint reads barefoot above a river, his hooded robe tipped like a seashell, beard long enough to tuck under his book. The buildings behind him wind higher and higher, from least to most fortified. The eye loses itself in the pile, even with a magnifying glass, the furthest connecting roofs might as well be part of an Escher.
But it wasn’t the art, rather the label, that caught my eye: “Albrecht Dürer reproduced the background of a drawing that he made two decades prior in this print. It has not been identified as a specific location; rather it is a generalized representation of towns that would have existed during this period.”
And here we have a fundamental truth about the public appetite for cities. It is not stable—rather, the pendulum swings between the specific and the composite…
And here we have a fundamental truth about the public appetite for cities. It is not stable—rather, the pendulum swings between the specific and the composite in step with the available mediums of the time. The 16th century citizens wanted more detail, so they created the demand that brought forth works like the panoramic Long Venice (1886) by James McNeill Whistler, best known for his painting of his mother.
Long Venice is, well, long. As I walked through The City in View section of the exhibition I regretted not taking a magnifying glass. My discomfort was in accord with the narrative of the exhibition, though, which quickly turned to The City in Focus—photography. The label of A.C. Webb’s New York Cityscape (1931), an etching on wove paper, explains the turn to the lens. “Astonished and disoriented by the massive buildings and mechanical architecture of industrialization, A.C. Webb produced numerous views of large American cities.” The artist retreated from detail into blocked forms both by choice and necessity.
With photography, the pendulum swung back to mechanical and industrial details—skylines otherwise hard to capture in a single etching. Then, of course, to overhead video. An even better way to take in the city. But city footage is in a way a composite as well—so personal photography proliferated. Eventually, we consume cities in TikTok-framed bursts. And now, we can prompt engineer the city from both close up and afar.
My hope is that the ability to instantly teleport into these AI-generated composites will spur new desires, similar to the dissatisfaction that created the marketplace for visuals in the first place, shepherding people toward firsthand accounts of city life. Paul Skallas, best known as prolific tweeter LindyMan, had a strong reaction to Sora’s video launch. “Video and the ‘image’ is entering its Faustian stage and will burn out,” he tweeted, adding, “I’m long on text.”
I compared Sora’s “Tokyo Train” prompt to a passage in one of my favorite books, Breasts and Eggs by Mieko Kawakami, in which the protagonist, Natsu, travels from Tokyo to Osaka by rail. “Maybe it was the way people held themselves. How they made eye contact, how they walked…I watched the way that people moved and listened to the folks beside me talking. Once I was on the next train, I gazed out on the city through the windows. The train leaned into turns and rumbled down the tracks, uncomfortable reminders of the laws of gravity.”
Or consider a building like the Sphere, which broadcasts its message across the world through outdoor advertising, echoing across the internet in images real and doctored. It still demands to be visited. “Like with cathedrals or caves, the first thing I noticed when I got to the main space was not exactly the structure itself, but all the air it held above my head, empty space I could feel and hear,” writes Elena Saavedra Buckley in The Paris Review.
I’m not afraid of Sora (which means “sky” in Japanese). A technology can be productive and useful, even beautiful, and still not compare to the real and the visceral—something that literature will always provide. In autofiction, especially about cities, we see the need to zero in on specific details prompted by the brain’s most powerful engineer—memory, a composite trained on inputs we scrape from our brief existence. After all, Sora may be trained on the world, but not the way we are trained on it. How could it be?
After all, Sora may be trained on the world, but not the way we are trained on it.
However, I am afraid that as these images achieve more technical perfection it will be harder to make the case for urban preservation, which is urgently needed—whether in Venice or Palmyra. AI can do the “generalized representation” work Dürer once did by hand, flooding the internet with images of Tokyo best understood as the idea of Tokyo. It is also a very poor crutch for the city intentionally destroyed by war or recklessly surrendered to climate change.
Lewis Mumford again: “In the end, utopia merges into the dystopia of the twentieth century; and one suddenly realizes that the distance between the positive ideal and the negative one was never so great as the advocates or admirers of utopia had professed.”
When we tire of the number of pixels at our fingertips, we must default to experiencing the world one triangle at a time.