Thoughts on XMTP

I use 9 messaging services to varying degrees: Slack, Discord, Telegram, WhatsApp, Facebook Messenger, Signal, SMS, Google Voice, and Twitter. Not counting various narrower inboxy things for vertical services and “secure messaging centers” associated with banking or healthcare apps. About half of these ride on top of phone numbers in one way or another, for discovery and meta-alerting, and the other half ride on email (which of course is now primarily for password resets, 2FA codes, official government notices, and newsletters, with the Substack app trying to siphon away that last category and reduce it to pure messaging middleware).

While there is some rationale for segregated basic comms (I think financial and healthcare being segregated is good, and perhaps some like dating might have softer cases for being compartmentalized) this is overall obviously a stupid situation. Stupid but foreseeable, since every agent in the picture except the end user has an interest in aggregating and walling off their share of messaging flow for various other reasons, and both the identity systems underlying these services are controlled by third parties you’re forced to trust (the phone number or email provider).

And trying to solve the problems within the paradigms (web/email and phone systems) that created it will inevitably lead to the xkcd “Now there are 15 competing standards” situation. In non-cartoon world, we have the beyond-stupid green bubble/blue bubble wars in Appleverse and the associated sorry tale of Beeper. Is there no way out of this misery?

Enter XMTP. It’s not a solution so much as a way to potentially replace this tired old misery with a fresh new one. The xkcd cartoon dynamic will reappear, but at a whole new locus, and that might be at least fun.

The cost of a workable single standard, for the moment, is apparently unavoidable conflation with financial transactions on blockchains. Let me explain, to the extent I understand what’s going on.

In brief, XMTP is a messaging protocol that implements the by-now-familiar architectural pattern of separating a decentralized, encrypted data state and persistence layer (a network of nodes) from a public identity and discovery layer, using a user-controlled public/private keypair to hook up the two layers. Blockchains are the most familiar and fully realized example, but other kinds of decentralized systems use versions of this basic architectural pattern, usually in less-complete ways. The pattern itself is validated, and here to stay. The question is whether we can do messaging with it.

To use XMTP, you use an Ethereum wallet address as your identity, and any wallet (or generic sort of app) that can manage keypair-based identifiers can access the single, unified message history. Neat, huh? You can even achieve various sorts of soft segregation I suspect, by designing the apps correctly. The identity, being attached to the user-controlled keypair rather than to a fragile 2FA-based client-server relationship, is entirely portable. There is no data portability issue at all since the data is neither in the client layer, nor associated with a particular centralized server. If you have a Coinbase wallet, you can use XMTP already (and as far as I can tell, you don’t actually need any tokens in the wallet; XMTP is just using the global addressing capability of the Ethereum blockchain, with message passing itself being offchain). If you decide you don’t like Coinbase wallet, you can take your keys and go somewhere else, and your messages will just go with you.

Note something important: XMTP does not ride on either the phone number or the email address system. For now, it rides on an independent third system of basic identity — Ethereum addresses — that can ride any sort of physical layer (giving it a degree of censorship resistance to control of any particular physical layer). The cost of riding on Ethereum is that you share the infrastructure with token transaction flows to some extent. Kinda like riding trains implies riding a system that also has cargo trains moving around. A passenger train might collide with a cargo train carrying dangerous chemicals. XMTP is built for phishing scams you might say. The financial flows are right there, riding the same rails. You’re doing messaging inside a wallet. This is not a bad thing in my view, just a dangerous thing that requires more adult responsibility to navigate safely. And of course you can be fairly safe by simply not using an XMTP address for financial transactions at all.

More worrying to nation-states, it is also a perfect fit for messaging tied to money laundering, black markets, terror-funding, and so on. All the reasons nation states hate E2EE in regular messaging apps apply doubly to XMTP.

To put this in a longer historical context: XMTP is basically the old idea of “smart pipes” except that they work, and in ways that are hard for hostile interests to dumb down. All the smarts — the secure, personal data — is in the pipes (the network of backend nodes) rather than in either the client end points or some centralized server somewhere. Central aggregation is not of much value if you have end-to-end encryption. In fact it’s a pure burden that you probably have to pay for via subsidies from something else.

Speaking of smart pipes, I was in a conference nearly 20 years ago now during my postdoc where a senior engineering leader from Lucent/Bell Labs gave a very wishful talk about hoping to make the traditional telecom pipes “smart” by making them support precisely such a ghost-in-the-machine identity layer that would be independent of both client and server ends of infrastructure. The problem was, in that paradigm of technology, there was no way to actually make that happen besides appealing to people’s loftier design aesthetics and hoping the parties with the means, motives, and opportunity to murder the ghost in the machine with aggregation plays would… not notice. And that people would be sympathetic to the idea of telecom companies being in charge of identity, which is… funny to put it mildly.

Of course we know what actually happened. Storing your voicemails on their servers rather than in answering machines was about as far as the telcos got towards being “smart.” Now those voicemail boxes are mostly full of spam. Media content got pwned by streaming services. And as a two-decade war for the messaging turf unfolded, email got captured by the big providers and turned into a meta switchboard for other messaging systems (a mostly legitimate response to the spam problem), and the phone system became entirely captured by autodial spam in a way that makes all of us wish the entire phone number system would just die. There’s some sort of larger story there I think, about how messaging media trace out an arc that goes from vertically integrated, to horizontally interoperable, to aggressively vertically integrated, to killed by parasitic activity.

XMTP does not, let me emphasize, “solve this” (the crypto world has been very ill-served by over-broad techno-triumphalist postures best exemplified by “bitcoin fixes this” type sloganeering). For instance, it’s already overflowing with spam before legitimate users have arrived in significant members. When I first logged in to one of my addresses with an XMTP app, it was already full of junk.

What XMTP does do is give the “smart pipes” architectural pattern a fair shot. Unlike the utterly wishful failed telcos vision of smart pipes 20 years ago, XMTP aims for smart pipes based on a fundamentally new technological affordance: the ability of public-key cryptography to separate data and identity in a way that is more capture resistant than older patterns, and relatively new technologies for distributed data persistence (commoditizing the idea of a “server” with which you have a trusted client-server relationship). So for example, the spam problem on XMTP (which is actually a positive sign for me — if it’s worth spamming, it’s probably robust enough to use) will get addressed in ways that must necessarily avoid the sort of aggregation and blocking effects Gmail has had on email. Aggregation effects may happen elsewhere on the stack (I’m eyeing ENS, which is a non-critical but arguably soft-necessary piece of the puzzle), but not at a level that allows arbitrary Google designers to send things to spam in ways you can’t protest.

And perhaps XMTP will be fundamentally less friendly to broadcast than email, but not quite as unfriendly as Web2 messaging systems (the “group DM” model does not scale the way email distribution lists do; the meaningful limit for a DM group for me is about 200, but I’m on many good newsletters that have 10s of 1000s of subscribers).

That doesn’t mean the pattern XMTP is aiming to enable is easy to implement. It’s easy to see how hard it is if you compare three protocol-based decentralized/federated twitter replacements.

  • ActivityPub, the protocol underlying the Fediverse, does not support meaningful and secure direct messaging at all, but sort of fakes it in an insecure way (correct me if I’m wrong)
  • AT Proto, the protocol underlying Bluesky, seems to have an architecture that I think could support something like XMTP, but because it ultimately rides the DNS system for identity, it is about as vulnerable as DNS itself (which is vulnerable to nation-state interference, but a much better thing than being reliant on a single corporation’s ethics)
  • Farcaster, the protocol underlying a variety of clients, with Warpcast being the default, does not have a proper DM layer either, despite riding on top of Ethereum. As I understand it, the direct cast model is a sort of lightweight Web2ish hack that’s specific to Warpcast. Correct me if I’m wrong.

I don’t blame any of these 3 systems for not solving the problem. The problem is not a minor side quest on the roadmap towards decentralized social media. Secure private messaging is in fact the dog being wagged by the tail here. Public social media is easier than private cozy media, which in turn is easier than personal media.

Decentralized identity and secure, non-capturable, censorship-resistant messaging is hard. The best mainstream messenger, Signal, is still vulnerable to control/shutdown (if not snooping) by telecom regulators.

Hell, even the centralized version is hard. Slack has struggled to allow for coherent identity management across individual instances you might belong to (Discord does somewhat better). And I’ve already mentioned Apple’s dumb blue/green problem.

XMTP is currently best supported on the Coinbase wallet, and there’s a few others, like Converse, which I haven’t yet tried. From my limited kicking of tires (I’ve exchanged messages with one friend), it works as advertised, and seems slightly magical. At first glance, I don’t see anything in the design that necessarily restricts the idea to Ethereum. That’s just one address space that meets the conditions to participate. I suspect any system that satisfies two conditions can probably be made to interoperate with XMTP: a) the data persistence layer is separate, encrypted, and sufficiently decentralized b) the client layer is “dumb” in the sense of being mostly stateless, except for some cosmetic configuration elements, with all load-bearing data being in the data layer and accessed via a keypair-based identity. You’d need something like TCP/IP style meta layers to move messages across address space boundaries (that doesn’t seem hard technically, just politically; it’s a bit like how credit-card readers can distinguish among Visa, MasterCard, and AmEx based on their distinct numbering schemes).

But of course, there’s no such thing as a free lunch, so what’s the dark side of XMTP? The cost of “solving” the messenger fragmentation problem? It’s riding rails that were primarily design with financial transaction flows in mind. At least Ethereum addresses are mainly used for financial transaction flows, though the roadmap features future capabilities designed to support a lot more. This is obviously not a necessary feature for messaging, more like a kind of initial-conditions comorbidity. You can imagine an architecturally compatible decentralized system implementing an XMTP flavor without recognizing the financial flows attached to Ethereum addresses.

Now, the idea of conflating financial flows and messaging flows is not by itself insane or even bad per se. In fact it’s a qualified good idea, especially for trusted communications, and we’re already familiar with it. Apple Pay and Google Pay deliberately couple a messaging device with financial flows. mPesa deservedly became the Southwest Airlines of developmental economics case studies. Credit card POS systems ride the same pipes as phones.

Going the other way, Venmo offers a minor example of a messaging flow that piggybacks a financial flow. The vibe of Venmo’s social messaging layer is weird; it feels voyeuristic to see people turning payments activity into an editorialized sort of public performance art; like many older people, by default I do not make my Venmo payments publicly visible, but younger people apparently enjoy sharing that they paid their roommates so many dollars for groceries or whatever. So putting a “phone” inside a wallet (or more accurately, a phone inside a wallet inside a phone) is not obviously any worse than putting a “wallet” inside a phone. Slightly adjacent, the flow of commit messages on Github is another example of a messaging layer that rides a different kind of transactional layer (code commits).

The problem, as illustrated by the vibes of Venmo and Github, is that the base medium sets the message of the medium. When messages ride smart financial pipes, the resulting sociology takes on a decidedly economic character. When they ride code-commit pipes, they take on decidedly engineering vibes. I don’t know if that’s good or bad. But I do know it will have consequences.

I do think it is rather unfortunate that the “wallet” became the primary UX metaphor for Web3 and beyond. But I suppose it’s no worse than the “document” becoming the primary UX metaphor for the Web and the “infinite scrolling stream” for Web 2. Or the “phone” for “pocket supercomputer.” Metaphoric initial conditions are an unavoidable cost in innovation. This is how manufacturing normalcy works.

I have a wishful thought experiment I’d like to see tried, if only to demonstrate that it fails: a high-quality “pure” XMTP messaging app that hides or disables the token-transactional affordances of Ethereum addresses, and only implements XMTP. It should commit to hiding the public-key cryptography bits behind a traditional local password model until the user is literate enough to manage keys under an “advanced” tab. It should be built in a way that encourages interoperable XMTP flavors to be built on non-Ethereum identity spaces (which probably means being able to hold multiple keypairs that use different schemes). And of course, it should do the keypair(s) in a way that allows you to move off the reference client. This would allow us to explore how far the idea of smart pipes can go, and whether they can ride on things other than blockchains.

This thought experiment illustrates how and why the xkcd cartoon problem will reappear. If XMTP, or something close to it, takes off and sparks a polyglot metaverse of data-layer “nations” defined by particular public-key-based (ie user-controlled) address spaces, you’ll replace the “14 messaging standards” with “14 key flavor” standards. Maybe you’ll maintain One Keypair To Rule Them All and be forced to manage the security of that against Tom Cruise’s Mission Impossible team. It will be miserable. But at least it will be an interesting new kind of misery.

Get Ribbonfarm in your inbox

Get new post updates by email

New post updates are sent out once a week

About Venkatesh Rao

Venkat is the founder and editor-in-chief of ribbonfarm. Follow him on Twitter

Comments

  1. Just as the Fediverse stopped too short of the ultimate goal, I fear this approach does, also. Fediverse / Mastador replaces one Elon with 50,000 petty admins nuking the online identities of anyone they don’t like (or simply abandoning the server your identity was @).

    Blockchain describes nodes all syncing up to mirror One Grand Authoritative State. That makes any blockchain as centralized as Facebook’s database. A blockchain can only ever be called ‘distributed’, not ‘decentralized’. With XMTP your messages are stored in the same database alongside everyone else in the world’s messages. That’s a step backward from SMTP. You also invoke a huge architecture, a massive technology stack, and centrally controlled code.

    Nostr, on the other hand (Notes and Other Stuff Transmitted by Relays), implements the key-pair identity layer without a blockchain nor an authoritative code base. Nostr itself is simply an idea, a JSON blob design. https://github.com/nostr-protocol/nostr is just a selection of markdown text files that describe Nostr implementation possibilities (NIPs) to any developer who wants to build compatible client or relay software. Anyone can host a relay, with whatever policies they like. You can (are encouraged to) publish the same Note to as many relays as you like, and subscribe to as many or few relays as you like.

    Syncing between relays is NOT part of the Nostr protocol. The transport layer is truly decentralized; unlike blockchain, where your messages live in the same database as everyone else’s. You can publish to a relay nobody else does, if you want to set one up. Relays can be pay-walled or enforce whatever content policies they like. What they cannot do is have your Notes deleted from other relays. That makes it effectively uncensorable.

    Your identity is also uncensorable in the same way. A given relay or client can block requests for your pubkey, but other relays or clients should always be available (or you can create one or both). Characters with permanent uncensorable identities seen on Nostr include Edward Snowden, Jack Dorsey and Max Keiser.

    Your Nostr public key is an indestructible, non-revokable identity for a (very) public messaging system (no real delete option). It is totally decentralized, but there is a nice introduction at https://nostr.com

    – Popular iPhone apps: https://damus.io (by Nostr’s author) / https://nostur.com
    – Popular Android apps: https://github.com/frnandu/yana/releases / https://play.google.com/store/apps/details?id=com.vitorpamplona.amethyst
    – Desktop apps: https://lume.nu / https://github.com/mikedilger/gossip
    – Web interfaces: https://iris.to / https://hamstr.to / https://nostrudel.ninja / https://coracle.social

    Enjoy! Just remember the price of non-revokable identites and uncensorable notes: There’s no real Delete option, so mind your manners. :-)

  2. A decent XMTP client should also let you filter your inbox by public keys you care about. Bam, no more spam.

    • I don’t even think XMTP is on a blockchain yet, is it? At least it’s not even in Alpha as of May 17, 2024 according to their road map. You’re right though that if you have a lot of money in an address then everyone will spam you because they want to get a piece of your money.

      As for encryption… I don’t even know how you would do that when attempting to do a chat room-like feature. Or maybe that isn’t within the scope? If you want to send a message to thousands of people like a newsletter then I would hope that it would at least cost you a lot of money.

      As for SPAM, I think that can be handled on the server / client side as long as you allow people to still have access to the hidden messages.

      Decentralized messaging sounds good but the devil is definitely in the details.