More

kzh_ · on Aug 15, 2022

Wade gave a memorable talk during our YC batch. When asked how he ran Zapier remotely, which was unusual at the time, he said something along the lines of "When I want to talk with the team, I open Slack. When I want to file a support ticket, I open Zendesk."

Ironically, we were the last in-person batch that was cut short by COVID, so his comments turned out to be quite helpful.

kzh_ · on Aug 14, 2022

+1 on existing log viewers being particularly well suited for text over non-textual assets. My experience here is limited but I believe Grafana has a dynamic image plugin if you store a link to an asset in blob storage or Base64 encode it.

I've also heard of people storing those links in a database like Snowflake then creating displays on top using Tableau or Looker, to avoid having to build a web app from scratch.

kzh_ · on Aug 14, 2022

OP here, I only try to write and share things that I find personally interesting, so if it came across as marketing fluff that was the opposite of what I was aiming for :/. But I do appreciate you reading the whole thing. FWIW I also thought including PhD might be pretentious.

ctxc · on Aug 14, 2022

Hi OP! It would be nice to have some examples to go with the article. Some set of minimum data and sample "lineage" etc.

This has broadened my perception of data though, I never linked this with the good old thermodynamic principles.

kzh_ · on Aug 14, 2022

Thanks for reading! Including examples is a great point, because otherwise the article can be kind of abstract, especially because each person has a different mental model of data. I'll add some later on.

Maybe thermodynamics is a hammer that makes all things seem like nails, but the connections pop up all over the place. Entropy is another highly applicable concept to data systems.

Archelaos · on Aug 14, 2022

I would say inlcuding CEO is far more pretentious. A PhD at least means something more substantial, because it requires an external certification.

Dwolb · on Aug 14, 2022

Adding “PhD” and other credentialed titles is standard SEO practice these days.

The thought is Google sees the article as from a “credible source” and ranks you higher.

kzh_ · on Aug 14, 2022

OP here, I posted this a few days ago and was surprised to see it on the front page this morning. Not sure why it says I submitted 4 hours ago when I wasn’t awake, maybe the second-chance pool (https://news.ycombinator.com/item?id=26998308)?

But I’m also generally skeptical of high upvote/comment ratios, because as a long-time HNer too I also want to read things that are genuinely interesting. In this case, I can promise you neither I nor anyone on the team is soliciting upvotes for this post.

On that note, if anyone has any comments about the content itself, happy to discuss further.

ocimbote · on Aug 14, 2022

Thanks for commenting constructively. As you have umderstood, my intention was never to point fingers at you or your article, but rather use it as a suitable context to confront with the HN crowd.

Thanks for having seen this from the start :)

kzh_ · on Nov 15, 2021

Not yet, but eventually! We’re focused on data in warehouses and transactional DBs right now, just to limit the amount of integrations we need to build to start. We definitely plan to integrate with application sources like Google Analytics down the line though. Upstream applications are ultimately the sources of data truth, after all.

I wanted to +1 what you said about “organizations that don’t have a proper data warehouse and dedicated BI staff.” At the end of the day, a huge number of companies (maybe even most?) don’t have dedicated data teams but still want to know be alerted about data anomalies. Heck, we at Metaplane even fall into that camp.

technobabbler · on Nov 15, 2021

Sounds good, thank you!

kzh_ · on Nov 15, 2021

Our customers think of BigEye, Anomalo, and Monte Carlo very similarly (needing to go through a sales process, spending quite a bit of money), so this answer to a previous question about Monte Carlo might be useful: https://news.ycombinator.com/item?id=29228070 (linking to avoid redundancy)

BE2020 · on Nov 16, 2021

That doesn't seem like real differentiation. What specifically do you do differently than Bigeye or anomalo? Or is the real value add that I don't have to talk to a human?

I guess I just have a hard time seeing why this would help people solve real data quality issues.

pm_andersen · on Nov 16, 2021

Congrats Metaplane and welcome to the market. Like you, Bigeye also works for small (but mighty) data teams.

kzh_ · on Nov 15, 2021

Integrating with Microsoft SQL Server is definitely on our roadmap in the coming months. If you're up to discuss your use case, please reach out to team@metaplane.dev because we’d love to explore building this integration for you!

kzh_ · on Nov 15, 2021

Amazing how many companies use dbt + Snowflake right? Such a different world from 2014…

Good idea, we actually do have a dbt integration that pulls in lineage and job metadata from your dbt manifests: https://docs.metaplane.dev/docs/dbt. Eventually we want to let you configure Metaplane tests from your dbt YAML.

Pricing is still in flux to be honest. We wanted to start with a price that was approachable for small teams, comparable to other tools in your stack, and could be paid for without going through a whole procurement process. But we’re trying to stay as flexible on pricing as possible!

kzh_ · on Nov 15, 2021

Thanks, and definitely! We're making a big push in the coming months to keep building out our downstream BI integrations. The Superset API is quite nice so we're looking forward to working with it.

kzh_ · on Nov 15, 2021

Good question! Both Datafold and Atlan support data monitoring as a secondary feature, but have different main focuses:

Datafold is primarily known for their Data Diff regression testing that simulates the result of a PR on your data within a CI/CD workflow. There’s definitely a need for proactively preventing data issues from occurring in the first place, but issues introduced via code are only one subset of potential data quality issues.

Metaplane is focused on catching the symptoms first via continuous monitoring. Regression tests don’t replace the need for observability, and vice-versa.

Atlan is primarily known for their data workspace features that make collaboration easier, like a data dictionary, SQL editor, and governance.

Data collaboration is a huge unsolved problem and data monitoring does play a role there. But Metaplane is focused squarely on the problem of detecting data issues and giving you relevant metadata to prioritize and debug.

julee04 · on Nov 15, 2021

thanks for the reply! great breakdown of the space