How a single ChatGPT mistake cost us $10,000+

09 Jun, 2024

Edit: I want to preface this by saying yes the practices here are very bad and embarrassing (and we've since added robust unit/integration tests and alerting/logging), could/should have been avoided, were human errors beyond anything, and very obvious in hindsight.

This was from a different time under large time constraints at the very earliest stages (first few weeks) of a company. I'm mostly just sharing this as a funny story with unique circumstances surrounding bug reproducibility in prod (due again to our own stupidity) Please read with that in mind

Asim Shrestha

We first turned on monetization for our startup last May. We had low expectations but were pleasantly surprised when we got our first customer within an hour of launching. It was a magical moment. We sent them a thank you, gave each other a toast, and, given we’d just spent two late nights getting everything ready, promptly went to sleep.

We woke up that morning with over 40 gmail notifications of user complaints. E v e r y t h i n g seemed to have set on fire overnight. None of these users could subscribe. We had no idea why.

Our path to monetization 🛣️

For some background, May was the start of the S23 YC batch and we were unsure about the optimal direction to take post launch. Dalton, our YC group partner, advised us to use paying subscribers as a compass and told us to double whatever monthly price we already had in our heads. Eventually (and reluctantly) we landed at $40 a month. Following the meeting, we immediately got to work getting monetization set up. Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI. We got this done (with some help from ChatGPT), got stripe fully integrated… and then followed this up with five days of the least sleep we’d get the whole month. (Yes five days was a long time to find this bug)

During that stretch of five days, we started to dread waking up—knowing we’d just be greeted with 30/40/50 emails of complaints. I still like to ponder exactly how many customers we lost through this. 50 emails/day x 5 days x $40 a m o n t h =$ 10,000 a month in lost sales—and that was only from people who cared enough to complain. We’d respond to these emails like clock-work everyday. They would complain about an infinite loading spinner when they clicked subscribe, we’d investigate by opening a new account, verify that subscriptions were completely fine for us, and then carry on with our day confused. Nothing we did would replicate the issue, and even more strangely, we got close to zero complaints throughout our actual working hours.

The $10,000 hallucination 💰

The journey from identifying the issue to actually resolving it felt like it took months. Fast forwarding five days, countless emails, hundreds of sentry logs, long discord messages with stripe engineers, and hours upon hours of staring at five key files later, we found it 🎉. Try to see if you can spot it yourself before reading on.

Asim Shrestha

If you haven’t found it yet, the culprit was a single, innocent looking line. A line that was the bane of our existence for that week. A line that quite literally cost us $10,000. The dreaded line 56.

Asim Shrestha

What happened was that as part of our backend migration, we were translating database models from Prisma/Typescript into Python/SQLAlchemy. This was really tedious. We found that ChatGPT did a pretty exceptional job doing this translation and so we used it for almost the entire migration. We copy pasted the code it generated, saw everything worked fine, tried it in production, saw it also worked, and went on our merry way. At this point however, we still used our Next API for all database insertions. Python was only ever reading from the database. The first time we started actually inserting DB records in Python was when we implemented subscriptions. Though we created completely new SQLAlchemy models during the process by hand, we ended up just copying over the same format that ChatGPT wrote for our existing models. What we failed to notice was that we were copying over the same issue with the way we were generating IDs in all our models.

Bug Catching 🐛

The issue with line 56 was that we were just passing in a single hardcoded ID string instead of a function or lambda to generate UUIDs for our records, . This meant that for any given instance of our backend, once a single new user had subscribed and used this ID, no other user could perform the subscription flow again as it resulted in a unique ID collision. This problem became really well hidden because of our backend setup. We had eight ECS tasks on AWS, all running five instances of our backend (overkill, yes we know, but to be fair we had AWS credits). This meant any single user had a pool of potentially 40 unique IDs they could land upon.

During the work day, this was fine. We probably committed 10-20 times a day (directly to main of course) which would cause new backend deployments to occur, giving us 40 new IDs for customers to potentially use. At night however, when we finally stopped making commits (how lazy of us right?), the single ID in every server would get captured and cause all new subscriptions to have ID collisions. Users would start with 40 possible servers that could allow them to subscribe, and quickly end up with near zero as the night progressed. Finally solving this was like a weight being lifted from our shoulders. Adam quickly pushed up the fix after discovering this and for the first time that week we could finally rest easy (well not really since we still had ten other fires but those are stories for another write up).

Conclusion 🤖

In retrospect, however painful those five days were, it was one of those startup moments we’ll never forget. Like all startups, we've made a ton of mistakes throughout our journey with this perhaps being the worse. Maybe I'll share some of the rest down the line. We’re just happy that we can now look back at those days and laugh. Yes we should have done more testing. Yes we shouldn't have copy pasted code. Yes we shouldn't have pushed directly to main. Regardless, I don't regret the experience.