An interactive VR environment presenting historical 3D photos and colonial texts of China in 1900.

Map 1 and Map 2 from the box set. Courtesy of the Map and Data Library, University of Toronto

Introduction

Latent Cartographies is one of two artworks that resulted from a joined project between the VRC at HKBU and the eMPlus at EPFL. A paper was presented at SIGGRAPH Asia 2025 by the name Entagled Gazes: Reconstituting the Stereoscopic Box Set with LLMs and Virtual Reality. These stereoscopic box sets were a multimedia storytelling device that allowed its viewers to travel to and learn about distant worlds at a time when international traveling was largely impossible. Such a record is China Through The Stereoscope: A Journey Through The Dragon Empire At The Time Of The The Boxer Uprising, written and conducted by James Ricalton and published in 1901.

The journey begins

Ricalton starts his adventure in the Canton region, specifically in Hong Kong. After visiting today's Guangdong he travels to Shanghai, then up the Yangtze River, and ultimately reaches Beijing - the focal point of the Boxer Uprising. He records what he sees and hears along the way; those observations become the backbone of the book.
But there's more. He travels with a stereoscopic camera - a two-lens device that captures paired photographs with slight differences. When viewed correctly, those differences create depth, making the scene feel three-dimensional. Below are example stereographs from a set that originally accompanied the book.

Looking into Shappat-Po Street from one of the Nightwatch Bridges, Canton, China \\ Courtesy: Moonchu Foundation

1 / 10

Seeing double

At first glance, the two photos on a stereograph look identical. Look closer and small differences appear; the farther an object is, the more its position shifts between images. That shift (parallax) is the major property our brain uses to perceive depth.
Viewed in a stereoscope, each image is shown to a different eye, allowing the brain to fuse the pair into a single 3D scene. Without a dedicated viewer on the web, we hint at this depth by alternating between the left and right images - just enough to suggest the 3D structure your eyes would otherwise reconstruct.

I wish to state to those not already familiar with the genuine realism of the stereograph that its power to produce vivid and permanent impressions on the mind is scarcely less than that of one's natural vision. — James Ricalton

What's inside the book

The book contains 100 short chapters, each paired with a stereograph. Ricalton visits 13 locations, and in eight of the larger places he gives precise bearing and position for his photographs. It's important to recognize that this, so called “armchair travel” product, is an immersive entertainment device dedicated to Western audiences at the time. It offers rich impressions rather than objective documentation. As Ricalton moves north, his writing shifts toward political tensions and the Boxer Uprising, which dominates the book's later stages.

The objectives

Seeing the vivid photographs and detailed maps, we wanted to let people see through Ricalton's lens - and also see the lens itself: its assumptions, its omissions, its era. The book was produced for Western consumers and reflects a colonial viewpoint that shaped how the Boxers, and ultimatately China, were portrayed. Our aim is to surface and challenge those portrayals, preserving the historical record while offering critical frames that foreground Chinese perspectives and the broader human cost of the conflict. As non-Chinese ourselfs, we didn't want to add our bias or any AIs bias on top of Ricalton's. Instead, we wanted to let the public explore the book's content and form with the help of modern tools and a critical eye. Therefore our objectives & scopes are as follows:

Restore the book's historical content: through digitization, cleaning, and structuring of the text and images
Amplify its immersive, multimedia character, by combining text, maps and stereoscopic images in a VR world
Create an intuitive browsing experience with modern tools, relying on text embeddings and information retrieval as the main navigation
Stress out the colonial gaze, by breaking the book's linear, temporal form so visitors can find deeply concerning content in semantic clusters
Critize the author, through communal annotation of the content

Preparing the historical content

The Moonchu Foundation Hong Kong, povided a full set of the original photographs that we were allowed to scan at high resolution. From there, it was careful manual work: cropping, aligning, cleaning, and color correcting each stereo pair. For this task we were fortunate to have help from stereography expert Paul Bourke.

Scanned pages are available via the Internet Archive. We started with their OCR and then manually corrected it.
We split each chapter into smaller chunks based on token count and punctuation. Each chunk, and each image description, was embedded using the mixedbread-embed-large model. This produces a 1,024-dimensional vector for each piece of text, allowing us to compute semantic distances - the foundation of our information retrieval.

To perceive the semantic structure in an a virtual world, we reduced the 1,024-dimensional space to three dimensions using the UMAP algorithm. Effectively, this method spatially groups related text passages.

In the above example, 2,492 text chunks appear as points in a 3D space. Color encodes chapter order with yellow to purple mapping to chapter 1 to 100, while meaning is encoded spatially as clusters. Just by looking at the color distribtion, we can see that topics remain consistent across chapters, which is a strong signal that the embedding captures the book's structure.

The User Experience

The following footage was captured at the Visualization Research Centre, Hong Kong Baptist University. Its 4-meter-tall, 8-meter-wide, 3D/360-degree LED wall - nVis - lets us immerse visitors inside the adventure of James Ricalton. Inspired by Ricalton's ideas about two realities - a physical and a concious world - we sepearted the focus of the viewed content into two scenes.

Physical Realm: gallery view and handheld device for interaction

1 / 4

The Physical Realm

The first scene shows the full set of stereographs randomly placed in a 360 gallery view. A handheld device allows to reorganize the images, by typing in a prompt. This prompt filters images based on the relevance of the prompt to its chapter. Furthermore, visitors can click and foreground an image, which scales the picture to about 3m height and reveals the its 3D character. Along side the picture, the location and angle of the shot is revealed on the one of 7 maps Ricalton prepared. Furthermore the transcript of the accompanying chapter can be scrolled through on the right. With a selection foregrounded, visitors may again type a prompt into the handheld device which highlights relevant passages in the chapter. Clicking on a passage will then trigger the transition to the second scene.

The Concious Realm

We exported the semantic model into the virtual space and replaced dots with text. A handheld mic captures a visitor's voice. Our Narrative Engine then retrieves the most relevant passages and smoothly guides a virtual camera to those excerpts. As the text glides by, visitors pick up cues for their next prompt; when the camera settles, nearby passages offer related context to continue exploring. This approach invites visitors to navigate by curiosity, not just chronology.

What's next

With the basics in place, we're focusing on:

Better tools to surface and contextualize political worldviews
Richer text analysis: semantic relations, topic modeling, named entities, and paragraph-level function tags (e.g., scene description, cross-reference)
Integrating geolocation to deepen the sense of journey and place
More natural interaction via features like SQL generation, web search, and prompt reformulation

Our hope is to preserve the book's immediacy and wonder while making its perspectives legible - so visitors can see both the world Ricalton recorded and the lens through which he recorded it.

About the Boxers (Yihetuan Movement)

The Boxer Uprising (1899-1901) was a complex, fast-moving conflict centered in northern China. The participants commonly called Boxers (trad. Chinese: 義和團運動) were members of local militias - formally the Yihetuan, or “Righteous and Harmonious Fists.” Their movement grew out of years of social strain: droughts and famine, economic disruption, anti-foreign sentiment, and resistance to missionary activity. Some factions within the Qing court sympathized with, and at times supported, the Boxers; others opposed them.
In 1900, violence escalated around Beijing and Tianjin, culminating in the siege of the foreign legations and a multinational intervention by the Eight-Nation Alliance. The aftermath included heavy casualties among Chinese civilians, Boxers, Qing soldiers, and foreigners, and it ended with the 1901 Boxer Protocol - large indemnities, foreign garrisons, and further entanglement with imperial powers.