Instagram作为一款千锤百炼的优秀应用,它的成长历程充满了其团队的态度和原则,阅读下来感受颇多,转到墙内,以资共勉。
https://engineering.instagram.com/instagram-android-four-years-later-927c166b0201
The first version of Instagram for Android was built in four months by a team of two engineers. It’s been four years since that launch, and in that time we’ve added features such as video, direct messaging, photo maps, advertiser support, and new ways to discover and explore the amazing content shared by users around the world. We’ve regularly released new filters, editing tools, and apps to unlock creative potential. Almost 30 engineers now work in our Android codebase every day. All this — yet Instagram for Android is still one of the fastest-starting apps on the platform, and is only a 16MB APK download for most users. How did we scale the team and create so many awesome new features while maintaining our best-in-class app size and performance? We’ve focused on providing the best possible experience for a small, well-scoped feature set, we have an extremely efficient UI layer, we take ownership of all the important code in our app, and we’ve invested heavily in maintaining our values as the team grows.
A core value of Instagram Engineering is to “Do the Simple Thing First”. We build for the use case that exists now, rather than the one that may exist later. We still care deeply about performance — but “doing the simple thing first” reminds us not to prematurely optimize our code or chase every small performance win. We think holistically and pragmatically, always considering the downside of increased complexity that often accompanies micro-optimization.
At the core of this principle is the idea that the Instagram app is simply a renderer of server-provided data, much like a web browser. Almost all complex business logic happens server-side, where it is easier to fix bugs and add new features. We rely on the server to be perfect, enforced through continuous integration testing, and dispense with null-checking or data-consistency checking on the client. Because of this, the app crashes when there is malformed data, rather than remain in a weird state. Our automated crash reporting triggers alarms and an investigation, and we can fix the bug. It is much easier to fix a crash, with an attached stack trace, than to debug a weird state issue based on a user’s report.
Living in a fast-growing codebase for four years has made us value straightforward, readable, debuggable code, and so we do not heavily rely on opaque code-gen, runtime annotation processing, or other clever “magic”. The only annotation processing that we use happens at compile-time, generating Java source files that look and behave as if they were handwritten. We prefer code to be right there on the screen in front of us, not hiding behind a complex meta-processor. It is simple for new developers to ramp up in this environment; they can easily trace what’s happening in the app and track down bugs.
The original Instagram app was much simpler than what exists today. As a small team, building quickly to keep up with market pressure, we used a lot of inheritance to share code. This approach didn’t scale with the team’s growth: it led to a confusing, tightly-coupled, brittle architecture, where execution bounced between different levels of a class hierarchy. Our desire to have small, simple, single-purpose classes has led us to embrace the principle of ”composition over inheritance”. We’ve found that it often takes a little more thought to build things without falling back on inheritance, but as a team grows, more emphasis needs to be placed on architecture to give a solid foundation for future development.
Equally important to us is to “Optimize What Matters”. We have a high bar for the performance of our most-used features, and as the product has grown and evolved we have needed to continually reevaluate our previous assumptions. Sometimes code must be rewritten to deal with new feature requirements or operating system capabilities. This is best illustrated with a series of examples:
We have a sophisticated set of tools, mostly built by our counterparts working on Facebook for Android, which report on and analyze all sorts of data about our apps in the wild. We track scrolling performance, start time, data usage, stability, and bug reports to make sure that we’re never regressing on our commitment to provide the best experience to our users.
The first version of Instagram was a luscious, skeuomorphic masterpiece with textures, shadows, and gradients everywhere. In early 2014, we embarked upon one of our largest optimizations yet: a project to overhaul Instagram’s look and feel on Android. Our goals were to make the app both faster and more beautiful. We designed and built an interface that makes use of flat colors, lines, and simple icons, combined with a subtle sense of space and layout to create a refined, efficient UI layer. We expected some performance gain, but the magnitude of our results surprised us. I’ll summarize them here:
Over time, as we have refined our codebase via relentless optimization, we have reduced or eliminated dependencies on many third-party libraries that commonly appear in other Android apps, preferring to fully own our infrastructure code. One of my colleagues likes to describe our app as a “race car” — every single component is specialized for the job it needs to do.
Our image cache, for example, is homegrown, and comprises less than 1500 lines of Java code. It is designed to download, decode and display large images while the user is scrolling feed, without dropping frames. It is not a general purpose image library, so it eschews features that are not directly needed by the product, but it works extremely well for our use case.
As mentioned above, we developed a JSON parser/serializer generatorwhich works with jackson-core (a low-level streaming JSON parser) to generate fast, memory-efficient parsing code. We do not use dependency injection, as we believe the code size, complexity, and performance hit do not justify the benefits. We use only a small subset of Guava, carefully evaluated for performance on mobile. We do not include the Play Services library, writing our own code to interface with GCM.
At a time when many popular Android apps are multidex, we still ship a single dex file. Secondary dexes incur a performance penalty on every method call, and loading too much code is generally bad because it eats up a lot of memory. We carefully track method reference count. It’s important that we do not unintentionally or carelessly add new dependencies. We created tests that run for every diff that check against an approved set of libraries that can be included in the app. If the diff adds a new library without fixing the test (which triggers a review from engineers on our team), it cannot be committed. We have also specified method count ‘budgets’ for internal libraries, and created tests to enforce them.
We recognize that writing so much custom code may not be a feasible approach for smaller teams which do not have the resources to write everything from scratch. For us, too, this was an iterative process: as our team grew and new features made more demands on size and performance, we started to be much more selective about the code we shipped. We removed third-party libraries that we could replace with our own code, tailored to our use case and therefore smaller.
Now that we have a great set of libraries, we’ve made sure that they are reusable amongst our various apps. Having an “app starter kit” shortened the development time of both Layout and Boomerang by months. Tests enforce that the “app starter kit” doesn’t depend on app-specific code.
As the team size doubled, doubled, and then doubled again, it became important to teach new engineers about the codebase and our mobile engineering philosophy. We didn’t want people to invent new solutions for problems we had already solved because of lack of awareness.
Our main teaching tool has always been code review. Every diff at Instagram (and Facebook) is reviewed by another engineer. We are especially thorough, asking people to conform to patterns already in the app and to make their code fairly robust, as well as checking for obvious errors. We pair each new engineer with an experienced mentor, who serves as their main reviewer and can answer most questions. The best, longest-tenured engineers are expected to make themselves available regularly to debate and inform technical architecture decisions, and provide specialized expertise to help solve problems — not just sequester themselves away writing code. The average engineer spends 20–40% of her day on various code review and mentorship activities.
We try to make our code as self-documenting as possible. We use annotations such as @Nullable, or for an even stronger guarantee, Guava’s Optional class, to document the null-contract of methods. We are ruthless about proper naming, believing that it prevents bugs and promotes readability. We strongly-type everything, which in addition to documenting allows the compiler to do as much work as possible to prevent bugs. We use enums regularly because they are safer than Strings or ints and their performance downsides are minimal.
As the team has grown, we’ve constantly reevaluated how we work — doubling down where appropriate and shifting our tactics when things aren’t working. The thing that hasn’t changed is the values that underlie all these decisions. A team that shares a set of values can work well even when decentralized. We spend a couple hours every month presenting our mobile engineering values and culture to all new engineers who join Instagram.
Every team and codebase develops its own philosophy and strategy as it grows and matures. This is the one that has worked really well for us, built on many hours writing code and reading other people’s insights shared via blog posts like this one. We hope you can glean some useful strategies and ideas to incorporate into your development process.
Over the coming weeks, we’ll be sharing more specific details on a number of projects we’ve worked on recently that line up with the learnings above…stay tuned!