Large-Scale Static Analysis at Mozilla

Taras Glek - taras@mozilla.com

https://blog.mozilla.com/tglek

Software Development Stone Age

Any C++ developers in the audience? C hackers? Currently we rely too much on heroic efforts and utter-bruteforce by the developers. Throwing more people at code doesn't help. The general impression, is that initially the developer is fully in control of a program and as the program grows it develops a life of it's own and the developer becomes more and more helpless. Code is always growing, our ability to understand the codebase is shrinking. There is no cure for this, but this talk will show how we can forestall the inevitable doom. Little ability to ensure apis are used correctly. Hard to ensure optimizations are not broken

Mozilla is Big and Fast Moving

Can't stop programming to do refactoring. Competitive landscape means we are always looking into any potential wins. Tried switching mozilla to garbage collection, brand new js engine, etc. Optimizations are very risky in a mature codebase, safeguarding them with static analysis makes them plausible.

Need Better Tools

Trivial bugs: no more whack-a-mole. api contracts. Allowed to use object on the stack, must cleanup after allocating, allocators must match, code follows a set style, be able to specify subset of C++ Invarients: C++ type system is weak, assertions aren't enforced statically..example: Does this code call into javascript?

Mozilla Tools

Refactoring C++

Mozilla did stop-the-world refactoring in pre Mozilla 1.0 days. That sucked. Competitive market means users wont wait years for us to make our code more elegant. At the time Bjarne was writing C++ C was throught to be unparsable with an LR parser. So he added all kinds of other ambiguities to the grammar. C++ parser has to instatiate types, so to implement a parser one needs complete C++ typesystem. C preprocessor makes this ever more awesome by working at the lexical level.

Pork

GCC For Static Analysis?

GCC isn't really a choice. It's more of a matter of why would one NOT use GCC for static analysis? It is both THE C++ compiler on open source platforms AND THE ONLY C++ compiler that works. So it is really a question of why would one not use GCC. When I started working on analyzing C++ there was a lot of folklore about how abysmal the GCC intermedite forms were. Clang has the potential to become a formidable GCC competitor, but at the moment their C++ frontend isnt complete, so it's not in the running. I started out with Elsa which is a from-scratch C++ parser which is well suited for refactoring code, but not so well for analysis. After an initial failed attempt on elsa, i Moved on to gcc and never looked back. The other problem is that any non-gcc C++ frontend will end up in a C++ arms race with G++ as it introduces new features. Unfortunately when I started, GCC did not support any way of being extended with third-party functionality.

GCC 4.5: Here Come the Plugins

GCC Features

GIMPLE is awesome because it basically allows one to treat C++ as C with a few extra features. it's a great simplified ast for static analysis. GCC attributes are fantatic. Messing with grammars to figure an annotation scheme isn't trivial(as can be seen by C++0x). GCC attributes allow annotating anything we want so far. Release is due any day now. Currently Mozilla relies on 4.3 for production, we'll be moving to 4.5 asap. Other big 4.5 features: LTO

The Hydras

Dehydra

Treehydra

DXR Demo

Before I get into how mozilla write analyses, here is a pretty demo. Search for nsJARInputStream. Show clicking on parent, members, how to jump to implementation search

Mozilla Analyses

Future

A Browser Vendor Hacking Compilers?

Closer Cooperation is the Way Out

Open source tools are in a position to cross-polinate. Yet in reality there is relatively little [vertical in relation to the diagram] cooperation spanning projects. I'm not sure how other open source projects work, but generally mature projects are treated as black boxes...earily similar to non-Open software.

Thank you