If you knew exactly what to monitor and control, you would also have put in miti...

coward123 · on June 18, 2022

My point is log analysis is noise to the signal. A poor way to discern what went wrong or to proactively monitor to avoid an incident in the first place. There are loads of tools out there, some of which have been mentioned in this thread, that monitor from network to user to app layer and are superior for triage. If someone is down in the bowels of logs, it's gonna be a bad time. I spent a decade triaging high-profile incidents around the world and teaching organizations how to do this stuff.

throwaway81523 · on June 18, 2022

The logs are what you have. It's like the investigation after a plane crash, where you have some black boxes, some radar images, observed distribution of wreckage, whatever. You probably don't have all the data you would like to, but you use whatever you can get your hands on.

Better tools for analyzing logs are fine, but the idea of some ML tool that you throw random logs through and have it automatically identify significant events seems like a pipe dream.

stochastimus · on June 18, 2022

This is all true. So for the times you end up there, wouldn’t you prefer a tool to surface for you the things you were going to have to spend hours digging for?