How One Developer Fixed GTA Online's 6-Minute Load Times With a Profiler and a Hunch

If you played GTA Online between 2013 and 2021, you know the loading screen. You'd sit there, staring at a slowly rotating logo, watching minutes tick by. Not seconds — minutes. On a decent machine, six to eight of them. People ran errands. They made coffee. They questioned their life choices.

Rockstar Games, a studio with hundreds of engineers and billions in revenue, let this ship for nearly a decade.

Then, in February 2021, a developer going by the handle t0st posted a blog article explaining exactly what was wrong — and how they fixed it in a weekend. Rockstar later confirmed the findings and shipped a patch that cut load times by roughly 70%.

This is a story about profiling. About what happens when you actually look at what your code is doing. And about why embarrassing inefficiencies can live in production software for years when no one is willing to reach for the tools.

---

The Setup: Reverse Engineering a Live Game

t0st didn't have access to Rockstar's source code. This was a closed, shipped binary. So step one was figuring out what to look at.

The first move was the most obvious one: measure wall clock time against actual work. By comparing how long the loading screen took against how long the actual asset loading seemed to take — estimated via disk I/O monitoring — t0st noticed a huge gap. The disk was idle most of the time. The CPU, however, was not. Something was burning cycles, but it wasn't reading files.

That gap — idle disk, hot CPU — is a classic smell. It usually means one of three things: lock contention, unnecessary computation, or a bad algorithm. This was the first real hypothesis.

---

The Profiling Methodology: Don't Guess, Sample

Here's where the approach gets instructive for anyone doing performance work.

t0st attached a sampling profiler to the GTA Online process. Sampling profilers work by interrupting the program thousands of times per second and recording the current call stack. Over time, you accumulate a statistical picture of where the CPU is actually spending its time. No instrumentation, no recompilation required — you just attach and watch.

The tool used was a standard Windows sampling profiler. The output was analyzed through a flamegraph.

If you haven't used flamegraphs before, the mental model is simple: the x-axis is time (or rather, sample count), the y-axis is the call stack. Wide bars mean "this function was on the stack a lot." You're looking for wide bars near the bottom — those are your hotspots. Functions high up and narrow are noise.

The flamegraph showed something immediately suspicious: a huge portion of loading time was spent inside what appeared to be a JSON parsing routine. Not asset decompression. Not network calls. JSON parsing.

---

The Smoking Gun: O(n²) in a 63,000-Entry List

Digging into the JSON parsing code (via disassembly — remember, no source), t0st found the problem.

GTA Online loads a file called fxmanifest (or similar) that contains a JSON structure with an array of script names — somewhere around 63,000 entries. For each entry in that array, the parser was doing a duplicate check: scanning the already-processed entries to make sure it hadn't seen this string before.

That duplicate check was a linear scan. Every single time.

So for entry number k, the parser scanned k-1 previous entries. For 63,000 entries, that's:


sum(0..62999) ≈ 1.98 billion string comparisons



That's the definition of O(n²). And string comparison isn't free — each one is a character-by-character walk until a mismatch is found or the strings are confirmed equal.

On t0st's machine, this routine alone was consuming the bulk of those missing minutes.

It gets slightly worse: there was also a secondary issue with sscanf being called in a tight loop for parsing numeric values — sscanf is notoriously heavy because it has to interpret a format string every invocation — but the JSON duplication check was the dominant cost by far.



---

The Fix: What a Hash Set Does in Milliseconds

The correct data structure for "have I seen this string before?" is a hash set. Insert is O(1) amortized. Lookup is O(1) amortized. For 63,000 entries, you go from ~2 billion operations to ~63,000. That's not a 10x improvement. That's a 30,000x improvement on the duplicate checking alone.

t0st couldn't modify Rockstar's binary directly in a clean way, so the fix was implemented as a DLL injection that hooked the relevant function and replaced the slow path with a fast one. The patched version reduced load times from around 6 minutes to under 2 minutes on the same hardware. A 70% reduction.

The code change itself was trivial. A few dozen lines. The work was in finding the problem.

Rockstar later shipped their own official fix (credit given to t0st in the patch notes, which was a classy move). Their internal fix shaved even more time because they had full source access and could clean up the surrounding code too.

---

Why Did This Survive for Eight Years?

This is the question worth sitting with.

The most honest answer is: nobody profiled it. Or if they did, nobody acted on the results.

There are a few dynamics at play in large shipped products that make this kind of thing common:

1. Load time becomes normalized. If it's always been slow, it becomes a baseline assumption. Engineers stop questioning it. "GTA Online takes forever to load" becomes a meme, not a bug report.

2. The cost of investigating shipped binaries feels high. When you don't have the source context in front of you, performance work feels harder than it is. The truth is, a sampling profiler doesn't care whether you have source code. It still shows you where the time goes.

3. The "it's complex" excuse. GTA Online is a massive game with enormous asset loads. It's easy to attribute slow startup to "there's just a lot of stuff to load." That explanation feels plausible enough that it stops further investigation. But plausible isn't measured.

4. Ownership gaps. In large codebases, old subsystems often lose their original authors. Nobody has a strong sense of ownership over a JSON parser written in 2012 for a game that shipped in 2013. When nobody owns it, nobody optimizes it.

This isn't unique to Rockstar. I've seen similar things in every large engineering organization I've worked in or around. Legacy code accumulates. Hot paths go unmeasured. Slow becomes the new normal.

---

The Repeatable Template

What makes t0st's approach worth studying isn't that they're some genius 10x developer. It's that they followed a dead-simple methodology that any developer can apply:

Step 1: Measure the gap. Compare expected time to actual time. If there's unexplained time missing, find out where it went. I/O profilers, CPU monitors, and simple wall-clock timing can tell you a lot before you even open a profiler.

Step 2: Attach a sampling profiler. Don't instrument by hand. Don't add logging everywhere. Just sample. Tools like perf on Linux, VTune or the Visual Studio profiler on Windows, Instruments on macOS, or open-source options like py-spy for Python — they all give you a flamegraph or equivalent. Run it. Look at the wide bars.



Step 3: Form a hypothesis, then verify. "The profiler says 80% of time is in this function" is a hypothesis, not a conclusion. Dig into why. Read the code (or the disassembly). Understand the algorithm. The big-O behavior of a 63,000-element array scanned inside a loop is not subtle — once you see it, it's obvious.

Step 4: Make the minimal fix, re-measure. Don't refactor the world. Make the targeted change. Run the profiler again. Did the hotspot go away? Did a new one appear? Performance work is iterative.

Step 5: Understand the before and after in real numbers. Percentage improvement is fine for reporting, but absolute numbers matter more. "Reduced from 6 minutes to 1.5 minutes" is concrete. "70% faster" is meaningful. Both together tell the full story.

---

The Democratization Angle

One thing I genuinely appreciate about this story is that t0st used completely standard, publicly available tools. No proprietary profiler. No insider access. No special hardware. Just a sampler, a disassembler, and enough patience to understand what was being shown.

This matters because performance analysis has historically had a reputation for being a dark art. The kind of thing that only "performance engineers" do. The reality is that flamegraphs are readable by anyone who understands what a call stack is. Sampling profilers are either free or cheap. The bottleneck to this kind of analysis isn't tooling — it's the willingness to actually do it.

Brendan Gregg's work on flamegraphs and Linux perf basically handed every developer on the planet a profiling methodology for free. The fact that most developers rarely use it isn't a tooling problem. It's a habits problem.



---

What to Take Away

The GTA Online story is funny because of the scale — a billion-dollar game, a nearly decade-old bug, fixed in a weekend by someone with no source access. But the underlying lesson isn't about Rockstar's embarrassment. It's about what's probably lurking in your own codebase.

Every sufficiently large, sufficiently old system has code that nobody has profiled in years. Some of it has O(n²) behavior that was fine at n=100 and catastrophic at n=60,000. Some of it calls sscanf` in a loop because the original author didn't know better and nobody changed it. Some of it does duplicate detection with a linear scan because that's the obvious first implementation and there was never a reason to revisit it.

You won't find it by reading the code. You'll find it by measuring.

So next time something is slower than it should be and the instinct is to shrug and say "it's just a complex system" — open a profiler. Look at the flamegraph. Find the wide bar. It might take a weekend. The fix might take an hour.

That's the job.

---

Further reading: t0st's original blog post is still up and worth reading in full. Brendan Gregg's flamegraph repository and his book "Systems Performance" are the canonical references for this kind of work.