montfort.dev
← essays

·8 min read · #earthquake-alert #push-infra #distributed-systems #build-in-public

How Mexico City's earthquake alert reached phones in 2012

A 7.5 earthquake, a brother calmly looking at the sky, and one question: why can't this alert reach a phone? How I ended up running push delivery for half a million people — and what it taught me.

This story starts at midday on March 20, 2012, when a magnitude 7.5 earthquake struck off the coast of Oaxaca, in southern Mexico. For Mexico City, it was the strongest tremor felt since the terrible morning of September 19, 1985.

By 2012 the city was already covered by the Mexican Seismic Alert System (SASMEX): a network of seismic sensors that turns physics into warning time. The rupture is detected near the coast, and because seismic waves take time to travel the ~300 km to the valley, the alert can arrive up to sixty seconds before the shaking does. In the city, it was broadcast over radio stations running a partial implementation of the NWR SAME protocol — the same one the United States uses for weather radio, the kind that announces tornadoes and civil emergencies — on frequencies near 162 MHz. Those signals were monitored by receivers installed in schools, government offices, and street poles wired to loudspeakers that sound the alarm for everyone nearby.

Everyone nearby. That day I was home with my family, and the closest loudspeaker was several blocks away. We heard nothing. What got us out of the house was the shaking itself: half the water from my 120-liter aquarium ended up on the floor, along with most of our bookshelves. Out in the courtyard everyone was frightened, grabbing for something to hold while the ground moved — everyone except my brother Emiliano, who stood calmly looking at the sky. When the earth went still, he turned to me, serene, and said:

We should figure out how the seismic alert is actually transmitted. It’s dangerous that those speakers can’t be heard. We could send the alert to phones.

It was hard, because there was almost no public information about how the alert worked. The piece that unlocked it was a PowerPoint deck, written by a SASMEX employee and accidentally indexed on a university server. From there, two purchases that were a real stretch for us at the time: a SARMEX seismic alert receiver — the only vendor authorized to sell compliance-grade receivers for civil protection in Mexico — for around $300, an iPhone 4 for about $600, and a Mac mini for close to $800. And then a long road of experimentation: from decoding a digital message hidden inside a radio audio signal, to delivering remote notifications to an iPhone.

Decoding the signal

I ran several experiments to decode NWR SAME messages, and they all failed in the same way. I got a C++ prototype that satisfied the protocol on paper, but I could never optimize it enough to work anywhere near real time — and for an earthquake alert, “near real time” is the only kind that counts.

What saved me was a CTI vendor — computer telephony integration — selling an ActiveX control on .NET Framework 4 that, among other protocols, decoded NWR SAME. I bought it and wrote a program around it. It was efficient, it was reliable, and it was done. But decoding the alert was never the real challenge. Getting it to the phones was.

Talking to the phones

This was practically the prehistory of mobile computing. iOS 5 ran on the iPhone 4; Android 4 was only just becoming a usable operating system. Remote notifications were brand new, and they were primitive.

Apple’s APNs had no broadcast — unicast only. If I wanted to notify a thousand users, I wrote a loop and made a thousand individual requests. Android, meanwhile, had just launched GCM (Google Cloud Messaging, the ancestor of Firebase Cloud Messaging), which replaced C2DM and had one feature APNs lacked: limited multicast. I could send a list of up to a thousand device IDs in a single request. 2012 was also the year users migrated en masse from BlackBerry to iPhone and Android, so I didn’t worry about that third platform.

Somewhere to keep the devices

So I could push notifications — one at a time to iPhones, in batches of a thousand to Android. But first I had to register every device ID somewhere I could query when an alert came in. I wrote a small server in PHP exposing two endpoints: one to receive device registrations, one to receive seismic alerts from the decoder. The data lived in MySQL.

That was the whole system: a decoder and an antenna in my house, and a database server running on shared hosting from a German company. And it all worked fine — until it didn’t.

And then the users arrived

The system was fine for the first 100 users. At 1,000 it proved itself and held. At 50,000, notifications started taking up to two minutes to arrive — especially to iPhones. The first one went out almost instantly, but they went out one by one, and the full list took two minutes to drain. All of this in the first three weeks after launch.

So I changed the stack. I replaced PHP + MySQL with Python Tornado + MongoDB, because I needed to claw back every millisecond I could, and I moved from the German shared host to a dedicated server in the United States to save ping. That bought me the next 50,000 users — through the second month, until it collapsed again. The iPhone list still took an eternity to process.

This time I measured before I rewrote, and the measurement changed the approach: the server wasn’t at its limit. The software simply couldn’t process a single list any faster. So I borrowed Android’s own model — batches of a thousand — and built it myself for APNs. I wrote a dispatcher that ran multiple Tornado instances, each with its own ID that told it which batch it owned. Each instance pulled its own slice of a thousand users from MongoDB and processed it in parallel with the others.

# Each instance owns a slice by its id; they drain MongoDB in parallel.
# A late earthquake alert is not late — it is wrong.
async def dispatch(instance_id, total_instances):
    cursor = devices.find({"shard": instance_id % total_instances})
    async for batch in chunks(cursor, 1000):
        await apns.send(batch, payload=ALERT)   # one instance, one slice

And it all worked fine — until it didn’t.

Learning distributed systems the hard way

The growth was enormous. The app passed 500,000 users. Over time I built orchestrators to run several dedicated servers, each with multiple notification instances, and I learned to configure and operate sharding in MongoDB — all within the first eight to twelve months of the project.

It was the most intense stretch of work I had ever done, under the permanent threat of systems racing toward their operational ceiling. And there was a heavier weight underneath the technical one: the public treated this as an emergency service, even though it was always offered as experimental, a way to promote a culture of civil protection. I had never worked at that pace — and I wouldn’t again until AI became genuinely useful for serious work, in late 2025.

What the platforms shipped later

My worry about keeping a distributed system alive ended years later, and it ended because the platforms caught up. In 2016 Android got real multicast: users could subscribe to notification channels, so my system sent a single notification to a channel and the new Firebase Cloud Messaging did the rest. Apple improved APNs that same year with an HTTP/2 API that accepted batches of up to 50,000 devices, which let me collapse everything back to a single notification server holding cached lists for batched sends.

I want to be precise about what that means, because it would be easy to overclaim. I wasn’t first, and I wasn’t alone — there were already companies whose entire business was mass push delivery as a service, and buying my way out was always an option. I just wanted to build my own. And in building it, under fire, with an antenna in my living room, I ended up hand-rolling the exact things the platforms would later make native: shard-by-instance delivery, batched sends, subscription channels. For a few years I was running, by necessity, the infrastructure the giants would eventually ship as a checkbox.

That’s the part I keep. Not the throughput numbers — those were obsolete the moment Apple and Google decided the problem was worth solving themselves. What stayed with me is the way of thinking the project forced on me: measure before you rewrite; design the failure mode before the feature; respect the gap between what you promise and how people actually use what you built. I learned distributed systems the hard way — sharding, orchestration, back-pressure — but the more durable lesson was that building something yourself, when you could have bought it, is how you learn where the hard edges really are. It’s the same instinct that put me where I am now, building governance for systems that, once again, most people would rather treat as a black box.

The end

I kept the system running until September 2025, when Cell Broadcast arrived for SASMEX — the alert now reaches every phone in a cell directly, no app required — and made push notification obsolete overnight.

I didn’t mind. The job of a project like this was always to make itself unnecessary. It took the platforms four years to match it and the authorities thirteen to retire it, and somewhere in Mexico City there is still a brother who looked at the sky instead of the ground, and asked the right question first.

Thanks for reading. I publish when there's something worth your time — rss.xml is the contract. I'm building StrayMark in public; the next essay will probably be about that.