In my prompt I ask the LLM to write a short summary of how it solved the problem, run multiple instances of LLM concurrently, compare their summaries, and use the output of whichever LLM seems to have interpreted instructions the best, or arrived at the best solution.
And you trust that the summary matches what was actually done? Your experience with the level of LLMs understanding of code changes must significantly differ from mine.