Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the testing pyramid reflects a false correlation — it seems to assert that higher up the pyramid tests are more expensive to write/maintain and longer to run.

In reality the execution time of a test says nothing about how hard the test is to write. Sometimes a very fast to execute unit test can be much harder to write/maintain than a longer running test that avoids mocking an api and perhaps utilizes abstractions in the test definition that are already written to support the program’s features.

I think test suite execution speed is the real metric to focus on for most projects — to get the most value, test suites should accelerate the time to useful feedback. Write tests in the simplest way that provides useful feedback into the behavior of the system and runs quickly enough that you can receive that feedback with low latency during development.

I quite like tools like jest and wallabyjs that use code coverage data to figure out which tests to rerun as code changes — means you can have a test suite that includes slow(ish) to execute tests but still get feedback quickly in reasonable time as you make changes to the code.



> to get the most value, test suites should accelerate the time to useful feedback

Well, they should also optimise the usefulness of the feedback they provide. Typically, tests higher up the pyramid are also more brittle (e.g. end-to-end tests might fire up an entire browser and Selenium), and thus are more likely to fail when in actuality, nothing is wrong. That's an additional reason for limiting the number of those tests.


Brittle tests seem not useful in general though aren't they?

I'm not sure its necessarily true that brittleness must correlate with height in pyramid or execution time -- in my experience brittleness correlates with selenium more than it does pyramid height (that's a statement about selenium more than it is a statement about any particular category of testing pyramid).

Its possible to write very useful non-brittle tests using something like headless chrome ...


No they're not.

But yes, Selenium is brittle. That said, Google engineers actually did some investigation into this, and although I think their methods were probably a bit heavyweight, they did conclude that it's mostly RAM use that leads to brittleness.

[1] https://testing.googleblog.com/2017/04/where-do-our-flaky-te...


Interesting thanks for the link!

I’m curious how many tests were in the small size range for that chart which provides evidence to show the size-flakiness correlation holds in tests that use tools associated with higher than average flakiness...

I’m also feeling like I want to have more clarity around the mechanism for measuring flakiness — the definition they use is that a test is flakey if it shows both failing and success runs with the “same code” — does “same code” refer to a freeze of only the codebase under test or also a statement about change to the tools in the testing environment ...?

I wonder what the test suites for tools like selenium/WebDriver look like ... do they track a concept of “meta-flakiness” to try and observe changes to test flakiness results caused by changes to the test tooling ...?


Yeah, good questions, the post leaves some to be desired. And meta-flakiness tooling actually sounds like it could be really useful!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: