Stack Traces Considered Harmful

I’m trying to build an open source project in a language I’m unfamiliar with and hit this problem:

(base) $ ruby bin/setup
== Installing dependencies ==

A new release of RubyGems is available: 3.6.2 ? 3.6.3!
Run `gem update --system 3.6.3` to update your installation.

Bundler 2.6.2 is running, but your lockfile was generated with 2.3.9. Installing Bundler 2.3.9 and restarting using that version.
Fetching gem metadata from https://rubygems.org/.
Fetching bundler 2.3.9
Installing bundler 2.3.9
Your Ruby version is 3.4.1, but your Gemfile specified ~> 3.0.4
bin/setup:16:in 'Kernel#system': Command failed with exit 18: bundle (RuntimeError)
	from bin/setup:16:in 'block in 
' from /opt/homebrew/Cellar/ruby/3.4.1/lib/ruby/3.4.0/fileutils.rb:241:in 'Dir.chdir' from /opt/homebrew/Cellar/ruby/3.4.1/lib/ruby/3.4.0/fileutils.rb:241:in 'FileUtils#cd' from bin/setup:10:in '<main>'

This illustrates a common antipattern in error handling. This is a Ruby program, but I’ve encountered it often in Python programs too, including the Google Cloud SDK. It also happens in Java, though less frequently. The most common place it appears in Java is when JUnit tests fail. Do you see it?

The mistake here is printing a stack trace from the tool’s own code. In this example, there’s a correctly diagnosed user problem. I’m trying to build a program that requires Ruby 3.0.4 with Ruby 3.4.1. Ruby should (and does) tell me that. However, that’s all I need to know. I do not need to know that this happened in line 16 of setup, much less that that called line 241 of fileutils.rb. This is code not in my project that is not buggy. The stack trace is distracting noise.

Two general principles before printing a stack trace:

1. Do not print a stack trace when the code itself did not fail. The user making a mistake does not mean the code failed. The environment being predictably not as expected (a checked exception in Java) does not mean the code has failed. The code should deal with these conditions and log them in a place that make sense for the context. However, if fixing the problem isn’t going to mean changing the code, you don’t need a stack trace.

2. Do not mix the tool’s correctly functioning stack with the code’s error stack. This is the one JUnit gets wrong. When a test method unexpectedly throws an exception causing the unit test to fail, JUnit unhelpfully logs the full stack including the test runner’s code. That should be removed before logging because it’s not where the bug is.

Another way of thinking about it: stack traces in a project are for the developers committing code to that project, not for the developers who use the project. If the committer isn’t going to want to see the stack trace because it’s not a bug in the project, don’t log it. In this example do the Ruby committers want to see the message I got because I was using the wrong version of Ruby? No they don’t, so don’t log it.

Generally, distinguish errors in input from errors in the code. In Java the former is a checked exception and the latter is a runtime exception. Stack traces are only helpful when they point to a bug in the code. When the bug is not in the code and never was, there’s no reason to log a stack trace.

Leave a Reply