Wade gave a memorable talk during our YC batch. When asked how he ran Zapier remotely, which was unusual at the time, he said something along the lines of "When I want to talk with the team, I open Slack. When I want to file a support ticket, I open Zendesk."
Ironically, we were the last in-person batch that was cut short by COVID, so his comments turned out to be quite helpful.
+1 on existing log viewers being particularly well suited for text over non-textual assets. My experience here is limited but I believe Grafana has a dynamic image plugin if you store a link to an asset in blob storage or Base64 encode it.
I've also heard of people storing those links in a database like Snowflake then creating displays on top using Tableau or Looker, to avoid having to build a web app from scratch.
OP here, I only try to write and share things that I find personally interesting, so if it came across as marketing fluff that was the opposite of what I was aiming for :/. But I do appreciate you reading the whole thing. FWIW I also thought including PhD might be pretentious.
Thanks for reading! Including examples is a great point, because otherwise the article can be kind of abstract, especially because each person has a different mental model of data. I'll add some later on.
Maybe thermodynamics is a hammer that makes all things seem like nails, but the connections pop up all over the place. Entropy is another highly applicable concept to data systems.
OP here, I posted this a few days ago and was surprised to see it on the front page this morning. Not sure why it says I submitted 4 hours ago when I wasn’t awake, maybe the second-chance pool (https://news.ycombinator.com/item?id=26998308)?
But I’m also generally skeptical of high upvote/comment ratios, because as a long-time HNer too I also want to read things that are genuinely interesting. In this case, I can promise you neither I nor anyone on the team is soliciting upvotes for this post.
On that note, if anyone has any comments about the content itself, happy to discuss further.
Thanks for commenting constructively. As you have umderstood, my intention was never to point fingers at you or your article, but rather use it as a suitable context to confront with the HN crowd.
Not yet, but eventually! We’re focused on data in warehouses and transactional DBs right now, just to limit the amount of integrations we need to build to start. We definitely plan to integrate with application sources like Google Analytics down the line though. Upstream applications are ultimately the sources of data truth, after all.
I wanted to +1 what you said about “organizations that don’t have a proper data warehouse and dedicated BI staff.” At the end of the day, a huge number of companies (maybe even most?) don’t have dedicated data teams but still want to know be alerted about data anomalies. Heck, we at Metaplane even fall into that camp.
Our customers think of BigEye, Anomalo, and Monte Carlo very similarly (needing to go through a sales process, spending quite a bit of money), so this answer to a previous question about Monte Carlo might be useful: https://news.ycombinator.com/item?id=29228070 (linking to avoid redundancy)
That doesn't seem like real differentiation. What specifically do you do differently than Bigeye or anomalo? Or is the real value add that I don't have to talk to a human?
I guess I just have a hard time seeing why this would help people solve real data quality issues.
Integrating with Microsoft SQL Server is definitely on our roadmap in the coming months. If you're up to discuss your use case, please reach out to team@metaplane.dev because we’d love to explore building this integration for you!
Amazing how many companies use dbt + Snowflake right? Such a different world from 2014…
Good idea, we actually do have a dbt integration that pulls in lineage and job metadata from your dbt manifests: https://docs.metaplane.dev/docs/dbt. Eventually we want to let you configure Metaplane tests from your dbt YAML.
Pricing is still in flux to be honest. We wanted to start with a price that was approachable for small teams, comparable to other tools in your stack, and could be paid for without going through a whole procurement process. But we’re trying to stay as flexible on pricing as possible!
Thanks, and definitely! We're making a big push in the coming months to keep building out our downstream BI integrations. The Superset API is quite nice so we're looking forward to working with it.
Good question! Both Datafold and Atlan support data monitoring as a secondary feature, but have different main focuses:
Datafold is primarily known for their Data Diff regression testing that simulates the result of a PR on your data within a CI/CD workflow. There’s definitely a need for proactively preventing data issues from occurring in the first place, but issues introduced via code are only one subset of potential data quality issues.
Metaplane is focused on catching the symptoms first via continuous monitoring. Regression tests don’t replace the need for observability, and vice-versa.
Atlan is primarily known for their data workspace features that make collaboration easier, like a data dictionary, SQL editor, and governance.
Data collaboration is a huge unsolved problem and data monitoring does play a role there. But Metaplane is focused squarely on the problem of detecting data issues and giving you relevant metadata to prioritize and debug.
Ironically, we were the last in-person batch that was cut short by COVID, so his comments turned out to be quite helpful.