I don’t believe in any statistics I didn’t fake myself (German saying)
Doing good performance benchmarks is hard and benchmarking on Android is no exception. Here is a checklist on how to do fair and robust benchmarks:
- Avoid low power mode: connect your device and ensure the screen keeps on. This should avoid the CPU falling into “half sleep”, where it slows down drastically.
- Avoid apps doing background work: some apps (we noticed Facebook is a very good candidate) may do some background work at any time introducing jitter at random times distorting results. The less apps you have on your device the better!
- Switch on Airplane mode: without network, other apps cannot do a lot of updates.
- Do explicit GCs in between tests: GC (garbage collection) can distort results, so at least try to not have to many dangling objects around before the test.
- Be aware of warm ups: The thing with VMs is that they may execute code faster over time. VM “warm up” because of JIT compilation and other optimizations. So, be aware of what you are testing (cold vs. warm VMs).
- Get a lot of data: Don’t just run your test once or twice. Collect as many data points as you can with reasonable effort. This will stabilize your results. This class may help you by appending tests result into a tsv file, which can be opened by your favorite spreadsheet application.
- Check your data’s quality: now that you collected lots of data, check its quality using basic statistic functions. A good first sign is when the median is close to the average. To be safe however, look at the variance: a high variance hints at an unstable test scenario or environment.
- Know your clock: In Android there’s not just System.currentTimeMillies()/.nanoTime(), but SystemClock.currentThreadTimeMillis(), SystemClock.elapsedRealtime(), SystemClock.elapsedRealtimeNanos(), SystemClock.uptimeMillis(), and also Debug.threadCpuTimeNanos(). Wow! It’s easy to make the wrong choice here.
- Remove logs and other disturbances: Are you really measuring what you think you are? For example, logs may screw up your results, especially if they do a lot of String concatenation.
- Be transparent: What *exactly* did you measure? And how? On what device? Etc. Just open source your benchmarks and let other check it for flaws.
- Try to be fair: pay attention to what you test and that tests are as comparable as possible. For example, when doing database tests, do not use a lot of transactions while your own product just uses one (DB transactions are very expensive).
We hope this gives you some ideas on how to improve your next benchmark.