I've been doing a bit of research on and off for the past few years on decompilation and it's definitely challenging to decide how close you want to go to matching. If you can get the exact compiler and exact compilation settings, it's totally feasible to do matching decompilation, and if you're able to make this somehow incremental such that you can incrementally work up to 100% matching over time, it seems like a really good approach, but it requires a lot of groundwork and understanding how the compiler and linker really work. In the process of matching compilation of functions on a binary I was analyzing that was compiled with Visual Studio 2003, I realized that very subtle differences can cause e.g. different register allocation, even in an old compiler with dramatically less sophisticated optimization passes.
Anyway, I guess this tangent is really unrelated, but I think more people should be embarking on decompilation projects. It's very fun, and it's uniquely rewarding if you manage to get some non-trivial decompilation of code to work properly.
I had one odd use case for decompiling that was actually, as far as I know, completely licit: WebView2Loader. Microsoft distributed the WebView2 SDK as 3-BSD so that you could integrate it into your applications without worrying about licensing, but the glue logic that actually interacts with the WebView2 installation and instantiates the COM objects is closed source. But... since it is closed-source 3-BSD, without a EULA... we can reverse engineer it. It being a relatively small shim, I did just that[1]. This was an easy exercise armed with an interactive disassembler, and since it was relatively simple and very small I didn't need to bother with matching anything: I just roughly replicated the behavior instead. The use case for this was allowing people to make WebView2 bindings that didn't have any external dependencies; the OpenWebView2Loader code was ported to Pascal and Go by others, making it possible to have pure bindings that don't require any C code or external DLLs and can directly talk to the WebView2 installation. There's now a static copy of the WebView2Loader with the SDK, which obviates some of the use of this, but this is still a nice approach for Go where you can entirely avoid CGo or messing with weird object format conversion. (It's way better than my original approach for WebView2 in Go, which is to emulate the Windows linker to link and execute an entirely in-memory copy of the WebView2Loader DLL using a lot of unsafe code. That also works, but it is much more bug prone and frankly horrifying.)
Perfect decompilation definitely has its advantages, but that's something I simply don't have it in me to pull off. Tracking down derelict toolchains and SDKs, endlessly tweaking compiler options and source code to get that thousand instruction long function to match perfectly is not for everybody.
The trouble with decompilation projects is that there's hardly any tooling or literature available on the topic [1], so people usually end up developing custom tooling and methodology on their own to solve their issues. For example, personally I went down the path of delinking programs back into object files in my own project. While I find it quite nifty, it also isolates me on my own desert island and I know of other decompilation projects with totally different approaches in a similar situation.
Decompilation projects can be quite intellectually rewarding, but they are essentially R&D projects in a barely explored field. Wander off the thin strip of beaten path and you're basically on your own in an endless primordial jungle.
[1] At least until decomp.me came along, but that's geared for perfect decompilation only. Any other approach and you're still on your own.
Nice! I'm mainly touching Win32 stuff so I suppose it would need quite a lot of adapting (and a COFF implementation) but this is very intriguing, I'll need to dive into this.
Depending on what you want to do, most of the pieces might already be there.
I've designed my Ghidra extension so that the object file exporters are decoupled from the relocation analyzers through a generic relocation data model. I've already implemented a relocation synthesizer for 32-bit x86, so writing a COFF object file format exporter should be a fairly self-contained project that doesn't impact the rest of the extension.
Thanks for extending the offer! I am a bit busy but I really appreciate all of the information. I've got plenty of new things to dig into now, I'll have to seriously set aside some time to take another look, this could seriously help me with some of my reversing projects. I'm willing to put in tons of manual effort to bridge the gap, and it feels like if I could leverage this it could act as quite a force multiplier.
Anyway, I guess this tangent is really unrelated, but I think more people should be embarking on decompilation projects. It's very fun, and it's uniquely rewarding if you manage to get some non-trivial decompilation of code to work properly.
I had one odd use case for decompiling that was actually, as far as I know, completely licit: WebView2Loader. Microsoft distributed the WebView2 SDK as 3-BSD so that you could integrate it into your applications without worrying about licensing, but the glue logic that actually interacts with the WebView2 installation and instantiates the COM objects is closed source. But... since it is closed-source 3-BSD, without a EULA... we can reverse engineer it. It being a relatively small shim, I did just that[1]. This was an easy exercise armed with an interactive disassembler, and since it was relatively simple and very small I didn't need to bother with matching anything: I just roughly replicated the behavior instead. The use case for this was allowing people to make WebView2 bindings that didn't have any external dependencies; the OpenWebView2Loader code was ported to Pascal and Go by others, making it possible to have pure bindings that don't require any C code or external DLLs and can directly talk to the WebView2 installation. There's now a static copy of the WebView2Loader with the SDK, which obviates some of the use of this, but this is still a nice approach for Go where you can entirely avoid CGo or messing with weird object format conversion. (It's way better than my original approach for WebView2 in Go, which is to emulate the Windows linker to link and execute an entirely in-memory copy of the WebView2Loader DLL using a lot of unsafe code. That also works, but it is much more bug prone and frankly horrifying.)
[1]: https://github.com/jchv/OpenWebView2Loader