Hacker Newsnew | past | comments | ask | show | jobs | submit | tison's commentslogin

We have discussed these in previous blogs:

1. Insight In No Time: https://www.scopedb.io/blog/insight-in-no-time

2. Manage Data in Petabytes for an Observability Platform: https://www.scopedb.io/blog/manage-observability-data-in-pet...

That is, Snowflake follows a traditional data warehouse workflow that requires extenral ETL process to load data into the warehouse. Some of our customers did researching of Snowflake and noticed that their event streaming ingestion can not fit in Snowflake's stage-based loading model - they need real-time insights end-to-end.

Apart from this major downside, about leveraging S3 as a primary storage, Snowflake doesn't have adaptive indexes, and its performance would be significantly degraded as data grows and queries involve a large range of data + multi-condition filters when the simple minmax index can't help.


FWIW, here is a general discussion about error handling in Rust and my comment to compare it with Go's/Java's flavor: https://github.com/apache/datasketches-rust/issues/27#issuec...

That said, I can live with "if err != nil", but every type has a zero value is quite a headache to handle: you would fight with nil, typed nil, and zero value.

For example, you need something like:

  type NullString struct {
   String string
   Valid  bool // Valid is true if String is not NULL
  }
.. to handle a nullable value while `Valid = false && String = something` is by defined invalid but .. quite hard to explain. (Go has no sum type in this aspect)


I think they are almost compatible.

`thiserror` helps you define the error type. That error type can then be used with `anyhow` or `exn`. Actually, we have been using thiserror + exn for a long time, and it works well. While later we realize that `struct ModuleError(String)` can easily implement Error without thiserror, we remove thiserror dependency for conciseness.

`exn` can use `anyhow::Error` as its inner Error. However, one may use `Exn::as_error` to retrieve the outermost error layer to populate anyhow.

I ever consider `impl std::error::Error` for `exn::Exn,` but it would lose some information, especially if the error has multiple children.

`error-stack` did that at the cost of no more source:

* https://docs.rs/error-stack/0.6.0/src/error_stack/report.rs....

* https://docs.rs/error-stack/0.6.0/src/error_stack/error.rs.h...


This is the pull request of this post: https://github.com/fast/fast.github.io/pull/12

See comments like https://github.com/fast/fast.github.io/pull/12#discussion_r2...

Quote my comment in the other thread:

> That said, exn benefits something from anyhow: https://github.com/fast/exn/pull/18, and we feed back our practices to error-stack where we come from: https://github.com/hashintel/hash/issues/667#issuecomment-33...

> While I have my opinions on existing crates, I believe we can share experiences and finally converge on a common good solution, no matter who made it.


Rust's Future is somehow like move semantics in C++, where you may leave a Future in an invalid state after it finishes. Besides, Rust adopts a stackless coroutine design, so you need to maintain the state in your struct if you would like to implement a poll-based async structure manually.

These are all common traps. And now cancellations in async Rust are a new complement to state management in async Rust (Futures).

When I'm developing the mea (Make Easy Async) [1] library, I document the cancel safety attribute when it's non-trivial.

Additionally, I recall [2] an instance where a thoughtless async cancellation can disrupt the IO stack.

[1] https://github.com/fast/mea

[2] https://www.reddit.com/r/rust/comments/1gfi5r1/comment/luido...


Not quite. As described in the FAQ:

What about the interoperability with SQL?

Some libraries and tools enable developers to write queries in a new syntax and translate them to SQL (e.g., PRQL, SaneQL, etc.). The existing SQL ecosystem provides solid database implementations and a rich set of data tools. People always tend to think you must speak SQL; otherwise, you lose the whole ecosystem.

But wait a minute, those libraries translate their new language to SQL because they don't implement the query engine (i.e., the database) themselves, so they have to talk to SQL databases in SQL. However, ScopeQL is the query language of ScopeDB, and ScopeDB is already a database built directly on top of S3.

Thus, what we can leverage from the SQL ecosystem are data tools, such as BI tools, that generate SQL queries to implement business logic. For this purpose, one should write a translator that converts SQL queries to ScopeQL queries. Since both ScopeQL and SQL are based on relational algebra, the translation must be doable.


The syntax is still changable and welcomes any comments for improvements.

To try my best to avoid divergent discussion, I'd include two most significant FAQs:

*What about the interoperability with SQL?*

Some libraries and tools enable developers to write queries in a new syntax and translate them to SQL (e.g., [PRQL](https://prql-lang.org/), [SaneQL](https://www.cidrdb.org/cidr2024/papers/p48-neumann.pdf), etc.). The existing SQL ecosystem provides solid database implementations and a rich set of data tools. People always tend to think you must speak SQL; otherwise, you lose the whole ecosystem.

But wait a minute, those libraries translate their new language to SQL because they don't implement the query engine (i.e., the database) themselves, so they have to talk to SQL databases in SQL. However, ScopeQL is the query language of ScopeDB, and ScopeDB is already a database built directly on top of S3.

Thus, what we can leverage from the SQL ecosystem are data tools, such as BI tools, that generate SQL queries to implement business logic. For this purpose, one should write a translator that converts SQL queries to ScopeQL queries. Since both ScopeQL and SQL are based on relational algebra, the translation must be doable.

*Project Foo has already implemented similar features. Why not follow them?*

ScopeQL was developed from scratch but was not invented in isolation. We learn a lot from existing solutions, research, and discussions with their adopters. It includes the syntax of PRQL, SaneQL, and SQL extensions provided by other analytical databases. We also deeply empathize with the challenges outlined in the [GoogleSQL](https://research.google/pubs/sql-has-problems-we-can-fix-the...) paper.

However, as answered in the previous question, we first developed ScopeDB as a relational database. Then, we learned users' scenarios where an enhanced syntax helps maintain their business logic and increases their productivity. So, directly implementing the enhanced syntax is the most efficient way.


I don't have a dedicated benchmark for these primitives, but we use them in a database that processes petabytes of data [1] and we don't find specific bottlenecks.

[1] https://www.scopedb.io/blog/manage-observability-data-in-pet...

Most of the performance factors would be the sync Mutex in used. I can imagine that by switching between the std Mutex, parking_lot's Mutex, and perhaps spin lock in some scenarios, one can gain better performance. Mea has an abstraction (src/internal/mutex.rs) for this switch, but I don't implement the feature flag for the switch since the current performance is acceptable in our use case.

The internal semaphore's implementation may be improved also. Currently, to keep code safe, I implement the linked list with `Slab<Node>` (you can check src/internal/waitlist.rs for details). Using a link like [2] may help, but that's not always a net win and needs much more time to do it right.

[2] https://github.com/Amanieu/intrusive-rs


Interesting. Thanks! I've been experimenting a bit with my keepcalm library. I have some experimental async concurrency primitives in there but I'd like to compare with what you've got here to potentially replace them.


Welcome to create an issue on GitHub for sharing and discussion :D


> they were probably just trying to be humble about their accomplishment

Thanks for your reply. To be honest, I simply recognize that depending on open-source software a trivial choice. Any non-trivial Rust project can pull in hundreds of dependencies and even when you audit distributed system written in C++/Java, it's a common case.

For example, Cloudflare's pingora has more than 400 dependencies. Other databases written in Rust, e.g., Databend and Materialize, have more than 1000 dependencies in the lockfile. TiKV has more than 700 dependencies.

People seem to jump in the debt of the number of dependencies or blame why you close the source code, ignoring the purpose that I'd like to show how you can organically contribute to the open-source ecosystem during your DAYJOB, and this is a way to write open-source code sustainable.


And contributing back is one of the approaches to maintaining open-source dependencies. I have described how to deal with OSS dependencies in [1] (yet to translate it :P).

[1] https://www.tisonkun.org/2024/11/17/open-source-supply-chain...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: