What Kind of Lithography Will Be Used at 7nm?

asmithmd1 · on March 10, 2016

At 7nm spacing you could draw 100,000 lines across the width of a human hair. About 70 silicon atoms lying next to each other would be 7nm wide. I know the end of Moore's law has been predicted before, but this time has to be different.

nhaehnle · on March 10, 2016

Keep in mind that most features at 7nm would be quite a bit larger than 7nm. For example, in the 22nm technology I have worked with, a standard wire on the lowest metal layer might be around 40nm wide (higher metal layers use even wider wires to benefit from their lower resistance to cross larger distances).

I do agree with your fundamental point that there will definitely be an end to Moore's law. It's just that the mere number of silicon atoms doesn't make <7nm infeasible.

esmi · on March 10, 2016

28nm was the last node in Moore's law. Transistors are smaller, but not cheaper. http://www.eetimes.com/author.asp?doc_id=1321536

I should also add, many people think 5nm will be technically feasible, although I don't think anyone knows how at scale, but there are serious question as to whether or not it makes economic sense. Thus the node may not be developed for some time, but because of business reasons.

hga · on March 10, 2016

That article is very interesting, it says what sounds like fundamental SRAM (cache) scaling limits being hit is a if not the major factor, so the prospect of EUV decreasing the number of masks required won't help as much as I thought it might in decreasing costs. And as noted in the OP, EUV still have to prove itself in power and uptime.

userbinator · on March 10, 2016

Is it like how inkjet printers advertise huge resolutions like 5760dpi, and the positioning can achieve that resolution, but the actual nozzles can't produce lines thinner than e.g. 720dpi?

ghaff · on March 10, 2016

No. There are process nodes that have real technical meaning but, especially as sizes shrink, what that means exactly in terms of the dimensions of specific features varies considerably.

avian · on March 10, 2016

The "Minimum feature size" slide included in this article shows some feature sizes for existing technologies. I think names like "14 nm" and "7 nm" are mostly marketing at this point.

http://www.anandtech.com/show/8367/intels-14nm-technology-in...

b1661727 · on March 10, 2016

Each dot can only have a single color. Multiple dots will form a single "pixel" with the desired color.

13of40 · on March 10, 2016

I think it's more like Elmo's head is four inches wide on the screen but the pixels are two millimeters.

mrob · on March 10, 2016

That's better than pure 720dpi. In GIMP, try using the Erode filter on ultra-light weight fonts. Although the minimum feature size is increased you still get the benefit of smoother curves. It looks better than a normal weight font that's been scaled up from low resolution.

xlayn · on March 9, 2016

Interesting how many chip-makers have now the power (read it as financial and human resources) to keep the race at what it seems the same level. Of them all, Apple is the more impressive as they have keep the core count down. On my books been able to squeeze another 4 cores to 12 cores doesn't make that much sense on cellphones and laptop/desktops.

derefr · on March 10, 2016

Apple have been squeezing more parallelism onto their SoCs—just in the form of more continuous on-die GPGPU silicon, rather than more discrete CPU cores. Most "apps" seem to break down into serial-bottleneck and embarrassingly-parallel subcomponents; this fits a "CPU for the serial part + GPGPU for the parallel part" design much better than it fits a 12-core CPU. With every release of iOS, more is run on the GPU and less on the CPU—things are being parallelized, just not the way we expected.

A tangent: really, the only place that multiple discrete CPU cores make any sense at all is on servers, because servers are frequently made to run multiple submitted workloads, each containing their own serial-bottleneck parts (vis. the old concept of a "time-sharing" computer.) And even then, these days, those cores may perhaps be useful only for the sake of serving as a more efficient substrate for virtualization; there's not much you can do with one four-core VM that you couldn't do better by treating it as four one-core VMs.

slashdev · on March 10, 2016

Database servers do much better with 4 cores than 4 VMs. In fact they do even better without any VMs at all.

derefr · on March 10, 2016

I knew someone would argue this point but I couldn't figure out how to correct the original post to be clearer.

"Virtual machine" has come to refer to a specific isolation and security model through, effectively, having a microkernel (the hypervisor) with hardware-accelerated microkernel RPC (hypercalls). But the abstract concept of a virtual machine doesn't require that isolation; it just requires partitioning of resources so that you can treat each partition as its own independent virtual Von Neumann machine.

A DB that pins each of its worker threads to a separate core, has a set amount of non-pageable memory reserved for each thread (and perhaps does SR-IOV to get network frames directly to the right thread) is effectively, in resource allocation terms, four independent virtual Von Neumann machines. Each pinned thread sees no context switches, gets no cache incoherency, suffers no NUMA-based latencies, etc. You've got four little machines, each with their own uncontested memory-bus and network bandwidth, that just happen to share die space (and thus have a cheap IPC fabric between them.)

I mean, you get 90% of what having a modern hypervisor-managed VM gives you just by using processes in the first place (as opposed to the old "single shared address space" model of the Apple II and IBM PC, where things like TSRs made sense.) But in the decades since the invention of the "process" concept, we've gone past the original conception of a "process" as its own virtual machine, and optimized for multi-user, heavily-multiprocessing time-sharing workloads where context switches are frequent, memory bandwidth is shared, IO hardware is shared (and thus has its access linearized through the kernel), etc. We've realized that most people don't need to be allocated a circuit, so scheduling packets will do; that most people don't need memory reservations, so an OOM killer will do; that most people don't write hard real-time software, so context switching will do; etc. (And we've even given up on the address-space isolation, with threads and other shared-memory IPCisms.)

Most of what we use hypervisor-managed VM setups for today is simply to enforce the resource partitioning that Operating Systems gave up on, so that we can have "processes" that actually do give us resource guarantees. If you're an ops person and you don't know what workload you're going to be running but want that enforcement there anyway, hypervisor-based VM management makes perfect sense. On the other hand, if you as the developer get to design your entire "embedded system" or "appliance" or "service" of OS+app yourself, you can throw out the hypervisor and get your VM model from the OS itself, by teaching your app to tweak the OS in all sorts of places ala Snabb Switch.

Either way, the goal is to partition your machine into several smaller independent ones that never have to wait in a queue behind one-another. You can get that from an OS, or from a hypervisor; but what you get is virtual Von Neumann machines either way.

gpderetta · on March 10, 2016

Not all parallelisable jobs can be profitably statically partitioned in independent threads or processes. Many (i.e. anything that is not embarrassingly parallel) require dynamic scheduling and load balancing, which is easier and faster to do on a single address space.

derefr · on March 10, 2016

I did hedge in the original comment with a "not much you can do that's faster" rather than a "nothing you can do faster." I write control-plane software in Erlang; I am well-aware of software that can take advantage of a 12- or 36- or 100-core CPU. :)

But, most software that people "parallelize" with pthreads is not that type of software, and would be much better served being split into independently-partitioned shared-nothing worker processes, ala Redis. Not only for performance sake, but also because that frees you from the operational constraint of needing a single big machine to run it on. (You still can run all your shared-nothing worker processes on the same piece of Big Iron, but you don't have to, and that flexibility is important for designing a solution in the face of unknown usage profiles.)

slashdev · on March 10, 2016

Differences between VMs and theoretical virtual constructs asside, with most databases you don't have a shared-nothing architecture. If you do then it doesn't make much difference whether you're on a 4-core VM or 4 1-core VMs. But the common case, for relational databases anyway, is all the data in one big memory space shared by all threads. That would be seriously hobbled by having to do IPC instead of just sharing the memory.

listic · on March 10, 2016

Maybe you mean "how few"?

AFAIK, the only companies able to produce modern chips are Intel, TSMC, Samsung and GlobalFoundries; and Intel only makes chips for very select third parties, such as Altera, which they did ultimately buy.

ant6n · on March 10, 2016

Good thing chip producing and chill design is splitting.

It's a bit like how separating railway infrastructure owner and railway operator is good for having more competition among the operators. If you need to build a whole network to run a few trains (compared to renting it on existing infrastructure), then that's a pretty big moat.

gozur88 · on March 10, 2016

Provided you get the performance you need, why would you care how many cores are in your device? Adding cores can allow for a device that runs cooler and uses less power.

yason · on March 10, 2016

Many times there one big thread running that can't scale to multiple cores. Thus, at least one core needs to be clocked high, and the higher the clocks the higher voltage is generally required. This correlates with power consumption and higher temperature. It doesn't matter if you have 15 other cores to spare unless the application can split its serial workload into parallel chunks.

b1661727 · on March 10, 2016

This is the reason why ARM SoCs have asymetric cores which differ in speed and powerefficiency.

https://en.wikipedia.org/wiki/ARM_big.LITTLE

gozur88 · on March 10, 2016

>Many times there one big thread running that can't scale to multiple cores.

The number of applications that can't be multithreaded is very small, and not the kind of thing you run on your phone.

yason · on March 11, 2016

Surprisingly many demanding applications such as high-end mobile games are essentially single-threaded (=there's one big main thread and then a number of helper threads with only a fractional load), and thus produce asymmetric loads and also very serialized execution between cpu and gpu. Thus, managing power becomes most difficult.

pnut · on March 9, 2016

Machine elves, obviously.