SunSpider hasn't (ever) relied on setTimeout performance (and AWFY runs everything through JS shells, and the standalone harness doesn't rely upon any callback-providing host objects). The closeness in performance is simply because it practically doesn't do anything that challenging to optimize — the biggest challenge is getting the compile time/execution time trade-off right.
Kraken was developed fairly unrelated to SpiderMonkey, so I wouldn't say there's any deliberate bias there — V8 was explicitly optimized for the V8 benchmark suite as a deliberate aim, with several design decisions made to optimize for it, so it's advantage there is unsurprising.
Kraken was developed fairly unrelated to SpiderMonkey, so I wouldn't say there's any deliberate bias there — V8 was explicitly optimized for the V8 benchmark suite as a deliberate aim, with several design decisions made to optimize for it, so it's advantage there is unsurprising.