Basically, a node is an object with one entry, whose key is the type and whose value is an array. It's a rather S-expressiony approach. if you really don't like using arrays for all the contents, you could always use more normal values at the leaves:
It has the nice property that you're always guaranteed to see the type before any of the contents, even if object keys get reordered, so you can do streaming decoding without having to buffer arbitrary amounts of JSON. Probably not important when parsing a tax code, but can be useful for big datasets.
Agreed. Any language that wants to use the fact graph is going to have to “interpret” the chosen DSL anyways, and JSON is more ubiquitous and far simpler to parse than XML. Also way cheaper in the sense that the article uses it (how many langs can you parse and walk an XML document in off the top of your head? what about JSON?)
To see why JSON is simpler, imagine what the sum total of all code needed to parse and interpret the fact graph without any dependencies would look like.
With XML you’re carrying complex state in hash maps and comparing strings everywhere to match open/close tags. Even more complexity depending on how the DSL uses attributes, child nodes, text content.
With JSON you just need to match open/close [] {} and a few literals. Then you can skim the declarative part right off the top of the resulting AST.
It’s easy to ignore all this complexity since XML libs hide it away, and sure it will get the job done. But like others pointed out, decisions like these pile up and result in latency getting worse despite computers getting exponentially faster.
What I don't like are all the freaking quotes. I look at json and just see noise. Like if you took a screenshot and did a 2d FFT, json would have tons of high frequency content relative to a lot of other formats. I'd sooner go with clojure's EDN.
Aesthetically, I consider such JSON structures degenerate. It's akin to building a ECMAScript app where every class and structure is only allowed to have one member.
If you want tagged data, why not just pick a representation that does that?
Because (imo) the goal should be to minimize overall complexity.
Pulling in XML and all of its additional complexity just to get a (debatably) cleaner way to express tagged unions doesn’t seem like a great tradeoff.
I also don’t buy the degenerate argument. XML is arguably worse here since you have to decide between attributes, child nodes, and text content for every piece of data.
Depends on the application, I suppose. For OP's application, pulling in XML is no trouble and gives you a much better solution for typed unions.
To get better than XML, I think you're looking at something closer to a Haskell- or LISP-embedded DSL, with obvious trade-offs when it comes to developer ecosystems and interoperability.
You don't even need to specify a DSL to make that code declarative. It can be real code that's manipulating expression objects instead of numbers (though not in JavaScript, where there's no operator overloading), with the graph of expression objects being the result.
I think strictly speaking, this isn't actually microphonics, because that means that mechanical noise causes electrical noise, which then results in audible noise, whereas what is happening is just transmission of vibrations up the cable into the ear.
Anyway, it can be fixed with better cables. They don't have to be fancy (they don't have to be the 349 euro cables that site is selling!) - i have a pair of KZ ZS10 Pro X earphones, and using the stock cables, i don't get rustling through those.
(more generally, i have an embarrassing number of Chi-Fi earphones, and don't get rustling with any of them)
Well, those KZ ZS10 Pro X are still fairly cheap (40 euros), and i really like them. But there is a huge range of amazing value for money earphones out there. You just have to wade through dozens of pages of forum posts and Reddit threads to find them.
> This solution looks extremely similar to the previous one, which is a good thing. Our requirements have experienced a small change (reversing the traversal order) and our solution has responded with a small modification.
Now do breadth-first traversal. With the iterative approach, you just replace the stack with a queue. With the recursive approach, you have to make radical changes. You can make either approach look natural and elegant if you pick the right example.
> Now do breadth-first traversal. With the iterative approach, you just replace the stack with a queue. With the recursive approach, you have to make radical changes.
The reason is that no programming language that is in widespread use has first-class support for co-recursion. In a (fictional) programming language that has this support, this is just a change from a recursive call to a co-recursive call.
def visit_bf(g):
n, children = g
yield n
if children:
iterators = [iter(visit_df(c)) for c in children]
while iterators:
try:
yield next(iterators[0])
except StopIteration:
iterators.pop(0)
iterators = iterators[1:] + iterators[:1]
The difference between DFS and BFS is literally just the last line that rotates the list of child trees.
Python is a pretty mainstream language and even though the DFS case can be simplified by using `yield from` and BFS cannot, I consider that just to be syntactic sugar on top of this base implementation.
> It seems to be common knowledge that any recursive function can be transformed into an iterative function.
Huh. Where i work, the main problem is that everyone is hell-bent on transforming every iterative function into a recursive function. If i had a pound for every recursive function called "loop" in the codebase, i could retire.
My experience has gone the other way: lots of code with recursion, rewritten to be iterative. There really aren't that many use-cases in vanilla enterprise code that benefit from recursion when the entire cost is considered.
Crucially, SO's election system needs to be bootstrapped: users aren't eligible to vote until they have a history of participation. The level of participation is fairly trivial, but it provides enough signal to allow a reasonable detection (and elimination) of bot / sock puppet networks without resorting to crude measures like blacklists or "bot tests".
For new sites, this meant that the bulk of moderation was done by employees, followed by employee-appointed temporary moderators. This dramatically reduced abuse, but also reduced the explosion of new sub-communities that sites like Reddit thrived on.
It was pretty decent in the mid and late 00s. The community started turning toxic in the very early 10s and by about 2015 was quite poisonous. The saddest part is that the problem was known and spoken about frequently, but the response to that from staff and/or high-level mods was to just double down and dig in.
For sure, advanced difficult topics were never really their forte', although it was really common to get great book or blog recommendations via comments. For me, the golden combination was a good book on the language/framework/topic I was stuyding, supplemented with specific Q&A from Stack Overflow. I have extremely fond memories learning C++ and Qt that way (although that Qt book was a little rough, but at least there was a Qt book. Nowadays every book just seems too outdated to be helpful).
VoltDB took this to an extreme - the way you interact with it is by sending it some code which does a mix of queries and logic, and it automatically retries the code as many times as necessary if there's a conflict. Because it all happens inside the DBMS, it's transparent and fast. I thought that was really clever.
I'm using the past tense here, but VoltDB is still going. I don't think it's as well-known as it deserves to be.
Interesting. How is that faster than just having the code running on the same machine as the DB? Guess it could be smarter about conflicts than random backoff.
Usually, you can. But occasionally you get mildly defective tools that require some directory to exist, even though it's empty. It's easier to add a gitkeep than fix them.
And i assume any large organisation running a monorepo has some vaguely equivalent tooling for making mass changes. Have any of them published about that?
You can write automated refactoring with clang tools if you need AST-level knowledge across your project (or monorepo).
I’m not sure if there’s other public examples leveraging this, but Chromium has this document [0] which has a few examples. And there’s also the clang-tidy docs [1].
This is a business that I suspect may not survive BABLR.
> Moderne's build plugins allow for LSTs to be serialized to disk. This makes the process of consuming and editing large quantities of them much more efficient. OpenRewrite's build plugins, on the other hand, store everything in memory and need to be reparsed every time there is a change.
So yeah I'm giving away open standards to everyone for free that do the thing they expect people to pay them for...
> The next-gen LR parser framework for creating elegant and efficient language tools
> BABLR is a new kind of thing that does not quite fit into any category of things that has existed before it. In purpose it is made to be an instrument of code literacy -- a unified toolchain for software developers that supports a new generation of richly visual interfaces for coding. In form BABLR is a collection of scripts and virtual machines written in plain Javascript that run in almost any modern web browser. BABLR is also a community and an ecosystem, including a small but rapidly growing collection of ready-to-use parsers for popular languages.
At first brush, everything about this sounds like overly ambitious vapourware. Is there a reason to think this is going to deliver? People involved, what's already shipped, etc?
I particularly loved this from their roadmap:
> Completed
> Shift operation
> Enables LR parsing of expressions like 2+2
Being able to parse 2 + 2 is definitely good!
And their thoughts on testing:
> How our project reaches production stability is a process that often surprises people. We don't write a lot of tests for example, and we often don't do much testing before we ship releases. Instead we test exhaustively after we ship releases, which is the only way we know of knowing for sure that the product we shipped does what we think it does. [...] We also don't (usually) practice TDD. If you look at the number of tests we have, it likely won't seem like it's anywhere near enough to keep a project of this size stable! The secret sauce here is that our key invariants aren't written in our test files, they're baked into the core of the implementation. Every time you use the code, you're essentially testing it. To gain confidence in our core, we simply try to use it to do a lot of real work.
Man, why did i not think of that, i could have got out of writing so many tests if i'd just baked the invariants into the core of the implementation!
In this case the tool is meant to parse programming languages, so once I write some parser grammars every valid code file in existence is a test case. Seen that way I have more test cases than I know what to do with.
We've come a ways from 2 + 2. This week my goal is to feed our own whole codebase through the JS parser, and I should be able to. I managed to parse a few hundred lines of real JS last week before running into Automatic Semicolon Insertion trouble that I needed to tinker with the core to fix.
While I get that our low profile smacks of vapor, we actually have working packages published: bablr and @bablr/cli. I'd consider them to be beta quality right now, having gone through many previous releases that I'd only consider alpha-quality, and even more releases before that.
It's not too hard to verify my central claim here which is that we're giving away what they charge money for. Their serialization format is secret, proprietary. Ours, CSTML, is open: https://docs.bablr.org/guides/cstml. Their free product make you re-parse the entire project with every code change you make. Ours is built with copy-on-write immutable data structures so that you can always build new things without losing old ones. Our way you can compose fragments of trees together with new code into new trees like you're playing with lego bricks.
> Nevertheless, detecting the holding of locks requires a careful and occasionally interprocedural analysis of the source code, and the other conditions, such as "in a completion handler", are not formally defined and require study of multiple files.
> Due to the complexity of the conditions governing the choice of new argument for usb_submit_urb, 71 of the 158 calls to this function were initially transformed incorrectly to use GFP_KERNEL instead of GFP_ATOMIC.
Okay, but how does Coccinelle help? Is it able to do this careful and not formally defined analysis? Or does it automate the undifferentiated heavy lifting and so make it easier for humans to do it?
reply