Collective Note: Weâre opening the notebook early. Twelve Balloons is where we explore what happens when interfaces start listening and responding. This space will collect our prototypes, reflections, and the occasional provocation as we design interfaces that talk.
So We Built This Thing
There wasnât a grand plan. Just a feeling. We work with AI, but we didnât want to build another chatbot. Well⌠we kind of already did. Our Twelve Balloons chatbot exists, but thatâs a story for another post. This time, we were after something else. Something you could feel. Something responsive. Designed with care. And, above all, something useful.
So we started with music. Weâve always cared about it, all kinds, all moods. It felt like the right place to begin.
Now, does the world need yet another way to play âEspresso Macchiatoâ? Probably not. But imagine calling up a song just by saying it in a conversation. Not searching. Or better yet, imagine a system that understands what youâre already talking about and starts the right track without you even asking.
Thatâs the shift we were curious about. It led us to a bigger thought:
What if conversation wasnât just the extra layer?
What if it was the interface itself?
Technically neat, but not the point
Where it actually lives
Our Spotify Agent isnât a product. Itâs not even standalone. When you use it, youâre inside the Twelve Balloons chatbot interface. Thatâs the whole idea.
You log in with a button, right there in our chatbot. The browser opens up Spotifyâs own login flow. OAuth handles permissions directly.
Under the Hood
Letâs get something out of the way. This isnât a technical breakthrough. Itâs just a working UI that feels right. Weâre not claiming itâs a new protocol. Itâs a tool call. It behaves a bit like MCP, but thereâs no official integration, and Spotify doesnât offer one anyway.
Say I ask it to play a song by, letâs say, Bob Marley. The agent queries Spotify, gets a list of tracks, grabs the top result by ID, and sends the play command.
No scrolling, no clicking, no typing out track names. You just ask, and it plays on the device where youâre logged in. Thereâs a caveat though: Spotify has its own rules. It only plays music when the player is active and in the foreground. If itâs not, nothing happens. So we had to work around that. Not ideal, but fair enough.
The UI is the thing
Whether itâs a tool call, an MCP instruction, or some other framework, itâs all background noise. What matters is the feel. The responsiveness. The way the screen responds to your sentence and the way the button shifts when a track starts.
Everything happens inside the conversation. No extra layers. No clutter. Just a clear action and a fast reply. Something that behaves the way you expect it to.
Thatâs where weâre aiming.
Thatâs what makes it interesting.
And honestly, we think thatâs the beauty of it.
We shaped the music player in a soft-edged, neumorphic shell and deployed it quietly to Vercel. It has a sense of physicality. Soft, touchable edges. Just something youâd want to press, even though itâs digital. The UI stays minimal, with just enough feedback to feel alive. No flashy transitions.
Right now, itâs separate from the chatbot, but everything points toward convergence. The idea is for design and conversation to eventually meet in one space.
You communicate, it listens. Then it acts. You can queue songs, skip, shuffle, turn it up â all through natural language. No keyword memorising, no command syntax, no visual overload. The UI responds immediately.
Our Collective
Weâre a collective of people who build with language and interaction â writers, designers, developers, researchers, and people who blur those lines.
The goal isnât to chase hype or crank out demos. Itâs to create things that actually work. Tools that feel alive. Interfaces that listen and do something useful.
This Spotify Agent is our first working piece. Not the biggest. Just the first to fly off đ.
Built by two brothers â Michele and Claudio Romano â as the first of many Twelve Balloons experiments.





