We started a new project a few weeks ago. The discussions around how to write and structure automated tests popped up once again. Do we use mocks? always? How do we measure code-coverage? What level of coverage must we achieve?.
Here are some personal thoughts on these matters:
-
I think we should use Code Coverage as a tool to uncover untested or dead code and not as a target number we have to achieve (see Martin Fowler on TestCoverage).
Second paragraph.
-
I think classifying tests (unit vs. integration vs. functional, etc) should be driven by practical reasons:
-
Avoid arbitrary rules like: “if it reads from disk it’s not a unit test” or “if it tests more that N objects/classes then it’s an Integration test”
-
Start with no classification first. Then, if the need arises (for some practical reason) separate your tests.
For example, you might want to separate slow-running tests to run less-frequently. Or separate those that require special environments to run.
-
You may also handle Acceptance tests differently (like, with Cucumber), which are useful if they are defined by (or with participation from) “business” people.
-
-
Question: “Should we count Integration/System/Whatever tests into our code-coverage metric?” I think this is the wrong question to ask. Coverage level is not a goal (see 1st bullet above). It’s just a metric to help you identify untested code and make decisions on that. Therefore, I see benefits in knowing how much code is covered by each group of tests individually as well as all of them together.
On previous projects I saw tons of unit tests that didn’t really assert much at all. However, code coverage was high! Managers were happy!. I’d rather have few good tests than to have tons of bad tests which “cover” (no pun intended) the truth: that we have improperly tested code.
-
I want the tests to be a safety net that gives us confidence on future refactorizations/optimizations and to detect regressions:
-
Try to avoid writing brittle tests. I mean tests that break after even minor code changes, even when the code is still behaving correctly. To me, a test that fails when it shouldn’t is almost as bad as one that passes when it shouldn’t.
-
Tests that get into the inner working of every method or function implementation will surely break during refactorizations/optimizations… which makes them useless.
-
On Java land, I like Mocikto over EasyMock because (among other things) it clearly separates stubbing from setting expectations. This helps to write tests which are less brittle. See Yoga for Your Unit Tests (it’s about jmock, but the ideas are relevant). I think EasyMock’s “strict” mode is evil.
-
-
Regarding whether to write “higher-level” tests vs. “lower-level” tests, my rule of thumb is to start writing tests that aim at the highest-layer of the system that is possible/practical to test automatically. Go with lower-level tests as need:
-
Higher-level tests are usually better at asserting that the product really complies with a particular feature or behavior.
-
Higher-level tests (if written properly) are usually less brittle because they use the external interfaces of the system, assuming less about the implementation details. This helps improving/refactoring implementations.
That said, one must be careful not to rely on high-level details which might break your tests. Testing via the GUI is the obvious example: it’s tricky and brittle if you rely on, say, the size or layout of UI controls. If you cannot avoid relying on such details, you should go a bit lower-level: decouple the UI logic from the on-screen widgets so you can test it separately.
- Covering all possible combinations and scenarios with high-level tests may explode to a huge number of tests, which is impractical
- Lower-level tests are better for testing specific code areas/algorithms which would benefit from having many scenarios and edge-cases. This is harder to achieve with higher-level tests.
- Mind that tests which are too low-level can easily become tied to the specifics of the current implementation. If this is the case, they won’t help much on future refactorings.
- It’s not black or white. Use your own judgement.
-
-
Let’s use mocks/stubs only when needed. It’s fine to mock an external system which is not available or is way too slow for unit testing; for most cases, use the real thing.
I’ve seen a tendency to write one unit test per class or module, mocking all collaborator objects/modules. This makes little sense to me. On a recent project, the tests for the REST API of the application were mocking all the business logic, even the marshalling/unmarshalling of the XML. A waste of time in my opinion.
-
I strongly recommend Growing Object Oriented Software Guided by Tests.
On one of our new projects we are using Clojure and AngularJS. We don’t yet have a lot of experience writing and structuring unit tests for big Clojure or AngularJS applications, so we might find new patterns to augment (or contradict!) my thoughts above. In particular, I don’t feel like using Angular’s built-in testing facilities: they rely too much on Angular-specifics, so we’ll try to keep most GUI tests on Selenium web driver on Clojure.