Facebook promises to have slashed iOS app crashes by half

Facebook's iOS bug problem was actually discovered several months ago, said to be stemming from Apple’s Core Data system.
Written by Rachel King, Contributor

Given that mobile is Facebook's cash cow, it makes sense that the social network's engineering teams are likely very dedicated to ensuring the apps run as smooth as possible.

Facebook's engineering team touted on Tuesday that it recently solved a long-term mobile debugging problem, slashing the crash rate on the iOS app by more than 50 percent.

The problem was actually discovered several months ago, having to do with Apple’s Core Data framework for automating common tasks associated with object life-cycle and object graph management.

In a chicken-and-egg scenario, Slobodan Predolac and Nicolas Spiegelberg, members of Facebook's engineering team based in New York, admitted in a blog post that it took so long to fix the problem primarily because of how fast the world's largest social network continues to grow — especially on mobile.

The ability to work at scale is one of the most exciting parts of engineering at Facebook. However, certain fundamental programming challenges inevitably become more difficult with scale. Debugging, for example, can prove difficult even if you can reliably reproduce the problem – and this difficulty increases when debugging a highly visible but nondeterministic [sic] issue in a rapidly changing codebase.

Facebook's monthly active user count stood at approximately 1.32 billion as of June 30, but mobile jumped 31 percent year-over-year to 1.07 billion alone.

"It turns out that abandoning manual code analysis was a good strategy," the engineers concluded.

Predolac and Spiegelberg outlined the investigation into a myriad of crash reports, leading to "different theories about race-condition situations, architectural changes, and even false fundamental premises."

In the end, the engineering team traced back the problem to the networking stack, resolving the issue in partnership with the networking team and deploying Fishhook, an open source method for rebinding system APIs.

"It turns out that abandoning manual code analysis was a good strategy," the engineers concluded. "The bug surfaced with existing code that was exercised more as we ramped up default secure connections for all our users."

Editorial standards