Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(not your OP) This is true, but I find that metrics are useful whether something is going wrong or not (metrics that show 100% success are useful in determining baselines and what "normal" is), whereas collecting traces _when nothing is going wrong_ is not useful -- it's just taking up space and ingress, and thus costing me money.

My typical approach in the past has been to use metrics to determine when something is going wrong, then enable either tracing or logs (usually logs) to determine exactly what is breaking. For a dev or team that is highly connected to their software, simply knowing what was recently released is enough to zero in on problems without relying upon tracing.

Traces can be useful, but they're expensive relative to metrics, even if sampled at a very low rate.



Yes, and:

Not all problems result in error traces to analyse.

Example, you release buggy client that doesn't call "POST /order/finalize" when it should.

There are no error traces, there are just missing HTTP requests. Metrics reveal that calls to "POST /order/finalize" for iOS apps are down 50% WoW.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: