← mur.ad/blog

Building Touring Test, a geography guessing game with AI generated content

2026-03-16

Back in the summer of 2023, when DALL-E-2 and GPT-3.5 had just become accessible via API, I got this idea to make a geoguesser-like game with AI-generated text & images. This became Touring Test. The game featured paintings from DALL-E-2 and various writing samples (travel blogs, poems, landmark descriptions) from GPT-3.5.

In early 2026, I rebuilt much of the backend and UI of the game, and thought it would be a good opportunity to write my first ever personal blog post.

Selecting cities

Selecting cities was a surprisingly difficult task. I wanted to select about 250 cities that were mostly well known while avoiding bias towards a particular country or region. The most obvious choice was to use the list of cities from the Wikipedia article on the most populous cities in the world, but this had several issues:

I ended up hand-selecting cities from various most populous cities lists, and also including a few personal picks. This resulted in a slightly biased list, but I think it's a good compromise.

Generating the content for the game

The game had two types of content: paintings and text. The paintings were generated using DALL-E-2, and the text was generated using GPT-3.5. The scripts and data for generating the content are available here.

Paintings

The paintings were generated using DALL-E-2, using simple prompts containing the city and country name, specifying either a painting, an oil painting, or simply a picture. Some examples of the paintings are shown below:

Almaty, Kazakhstan
Almaty, Kazakhstan (DALL-E-2, 2023)

Ottawa, Canada
Ottawa, Canada (DALL-E-2, 2023)

Amritsar, India
Amritsar, India (DALL-E-2, 2023)

Why modern image models don't work as well

When DALL-E-3 came out, I was very excited to try it out, and used it to generate some paintings. Surprisingly, the results were actually much worse for the purposes of this game. The main issue was that despite my best efforts with prompting, the paintings were a lot more detailed than DALL-E-2, and often included elements that weren't present in the actual city, or made the guess way too easy. I recently tried this again with Nano Banana 2 (Google Gemini's image generation model) and ran into similar issues.

Hong Kong, China
Hong Kong, China (DALL-E-2, 2023)

Hong Kong, China
Hong Kong, China (Nano Banana, 2026)

Hong Kong, China
Hong Kong, China (DALL-E-3, 2024)

As seen in the paintings of Hong Kong above, the original DALL-E-2 version already features a recognisable skyline, leaning towards easy but still a fair challenge. Modern models go much further: adding a dramatic sunset, a harbour full of boats, and geographic details that don't exactly match Hong Kong. The boats make it immediately obvious you're looking at a Chinese coastal city, while the invented details undermine any confidence in your answer.

Travel blogs, poems, and landmark descriptions

The game also featured various pieces of text such as travel blogs, poems, and descriptions of a landmark. At the time, GPT-3.5 was the best model available via API. GPT-3.5 was frustratingly poor at following constraints, especially negative ones such as "do not reveal the city name". This meant I had to rely on good old scripts to scrape out the city and country names and replace them with *** to hide the location from players. This was largely fixed with GPT-4, but I never got around to regenerating the text clues.

The samples below are the raw GPT-3.5 output; notice that both the travel blog and the landmark description name the city and country directly, which is exactly the problem the scrubbing script had to fix.

Travel Blog sample - Hanoi, Vietnam

As I stroll down the bustling streets of Hanoi, Vietnam, I am struck by the vibrant energy of this bustling city. From the delicious street food to the stunning architecture, there is something truly magical about exploring Hanoi.

One of my favorite experiences was wandering through the charming Old Quarter, where narrow streets are lined with colorful buildings and shops selling everything from silk scarves to handcrafted ceramics. As I weaved my way through the labyrinthine alleys, the scent of sizzling noodles and fragrant spices filled the air.

But Hanoi isn't just about the food (although it's definitely a highlight!). The city is also home to numerous historic landmarks, such as the majestic Ho Chi Minh Mausoleum and the beautifully ornate Temple of Literature.

Landmark description sample - Vienna, Austria

One of the most famous landmarks in Vienna, Austria is the Schönbrunn Palace. This Baroque-style palace was built in the 1700s as a summer residence for the imperial family. It has over 1,400 rooms and is home to multiple art galleries, museums, and gardens. The palace has a rich history, having been the site of various important events such as the signing of the Treaty of Schönbrunn in 1809 and the state funeral of Emperor Franz Joseph in 1916. It is also a UNESCO World Heritage Site and a popular tourist attraction in Vienna.

Poem sample - Montreal, Canada

Montreal, city of wonder
A vibrant spirit that will never sunder
Streets alive with art and sound
Building old and new that astound

A blend of French and English pride
A culture that you cannot hide
Mont-Royal standing tall and grand
A symbol of the city’s brand

Hallucinations

Text samples sometimes included hallucinations, especially for "interesting facts" prompts or landmark descriptions. These would sometimes occur in lesser-known cities, where GPT-3.5 would invent a landmark that doesn't actually exist. It was hard to verify this at the time without a lot of manual work. I did get a few reports about this via the feedback Google Form. More modern models would likely hallucinate less often, but including the city's Wikipedia article as context in the prompt would probably be the right solution.

Building the game

The game was originally built in 2023 in a way that you would build enterprise software. Not because I thought it was a good idea, but mostly because that's what I knew and I didn't care enough about this part to build it well. I've recently completely reworked the backend and redesigned the frontend to make it feel a lot more like a geography game (and work a lot better!).

Building and hosting the frontend

The frontend is a standard React/Next.js app with Material UI components, hosted on Cloudflare Pages. Initially it used the default white-and-blue Material UI theme, but I recently redesigned it with a green and tan theme that fits the game better.

Touring Test original UI (blue Material UI theme)
Original UI (2023): default Material UI white/blue theme.

Touring Test updated UI (green and tan game theme)
New UI (2026): custom green/tan theme matching the game style.

The map uses MapLibre with static tiles hosted alongside the website. Since scoring is based on how close your guess is to the correct city, the map doesn't need precise coordinates. Knowing the approximate location is enough, and using low-fidelity static tiles is much simpler than integrating something like OpenStreetMap or Google Maps.

Building the original backend

As I mentioned earlier, the original backend was built like overly complicated enterprise software. The game is turn-based: each round has a timer, and every player's guess needs to be broadcast to all others. Simple enough, but I significantly over-engineered it. It used a web server hosted on GCP Cloud Run to handle HTTP polling from web clients, while game timeouts were handled by a separate Cloud Run instance reading from a GCP Pub/Sub queue, triggered whenever a round would finish or time out. To avoid race conditions between the round ending early (when all players guessed before the timeout), Redis locking was used. The game state itself was also stored in a temporary Redis entry.

Replacing the backend

I ended up tearing down the backend completely and replacing it with PartyKit. PartyKit uses WebSockets, which improved response times significantly. It also supports timers and broadcasting, which meant I could fully get rid of the queue. Going from three separate services and a message queue down to a single PartyKit worker dramatically simplified the codebase, and the switch to WebSockets made the multiplayer feel noticeably more responsive.

Releasing and sharing the game

I like getting projects in front of real users, so I shared the game on various geography and quiz subreddits. It peaked at around 2k monthly active users according to Cloudflare Analytics. A few people made YouTube videos about it (1, 2), and I received a handful of responses through the feedback form with suggestions and reports of AI hallucinations. Not a blockbuster, but good enough for a mostly experimental side project.

Closing thoughts on AI-generated art

As of early 2026, AI image generation models are good at generating attractive images, but I haven't found them practical enough for building software products yet. Older models worked well enough for this game: the images were recognisable and captured the feel of each city. The main limitation I ran into was following constraints, as seen in my attempts to prompt for lower-fidelity oil paintings.

I recently tried generating textures and 3D models for my Minecraft mod Little Logistics, with similarly poor results. As someone who is not great at making art, I'll continue commissioning paid artists for projects that need to look good.