On-line community providing technology information, analysis, commentary and an online forum for engineers manufacturing and designing semiconductors. Semiconductor Manufacturing and Design (SemiMD.com) was created by engineers and journalists to shed light on some of the incredibly complex technology and business issues in manufacturing and designing semiconductors.
Judging by the presentations at the 2018 Symposium on VLSI Technology, held in Honolulu this summer, the semiconductor industry has a challenge ahead of it: how to develop the special low-power hardware needed to support artificial intelligence-enabled networks.
To meet society’s needs for low-power-consumption machine learning (ML), “we do need to turn our attention to this new type of computing,” said Naveen Verma, an associate professor of electrical engineering at Princeton University.”
While introducing intelligence into engineering systems has been what the semiconductor industry has been all about, Verma said machine learning represents a “quite distinct” inflection point. Accustomed as it is to fast-growing applications, machine learning is on a growth trajectory that Verma said is “unprecedented in our own industry” as ML algorithms have started to outperform human capabilities in a wide variety of fields.
Faster GPUs driven by Moore’s Law, and combining chips in packages by means of heterogenous computing, “won’t be enough as we proceed into the future. I would suggest we need to do something more, get engaged more deeply, affecting things done at all levels.”
Naresh Shanbhag, a professor at the University of Illinois at Urbana-Champaign, sounded a similar appeal at the VLSI symposium’s day-long workshop on machine learning. The semiconductor industry has taken a “back seat to the systems and algorithm researchers who are driving the AI revolution today,” he said.
Addressing several hundred device and circuit researchers, Shanbhag said their contributions to the AI revolution have been hampered by a self-limiting mindset, based on “our traditional role as component providers.”
Until last year, Shanbhag served as the director of a multi-university research effort, Systems on Nanoscale Information fabriCs (SONIC, www.sonic-center.org), which pursued new forms of low-power compute networks, including work on fault-tolerant computing. At the VLSI symposium he spoke on Deep In-Memory Architectures and other non-traditional approaches.
“Traditional solutions are running out of steam,” he said, noting the slowdown in scaling and the the “memory wall” in traditional von Neumann architectures that contributes to high power consumption. “We need to develop a systems-to-devices perspective in order to be a player in the world of AI,” he said.
Stanford’s Boris Murmann: mixed-signal for Edge devices
Boris Murmann, an associate professor in the Department of Electrical Engineering at Stanford University, described a low-power approach based on mixed signal-based processing, which can be tightly coupled to the sensor front-end of small-form-factor applications, such as an IoT edge camera or microphone.
“What if energy is the prime currency, how far can I push down the power consumption?” Murmann asked. By coupling analog-based computing to small-scale software macros, an edge camera could be awakened by a face-recognition algorithm. The “wake-up triggers” could alert more-powerful companion algorithms, in some cases sending data to the cloud.
Showing a test chip of mixed-signal processing circuits, Murmann said the Stanford effort “brings us a little bit closer to the physical medium here. We want to exploit mixed-signal techniques to reduce the data volume, keeping it close to its source.” In addition, mixed-signal computing could help lower energy consumption in wearables, or in convolutional neural networks (CNNs) in edge IoT devices.
In a remark that coincided with others’ views, Murmann said “falling back to non-deep learning techniques can be advantageous for basic classification tasks,” such as wake-up-type alerts. “There exist many examples of the benefits of analog processing in non-deep learning algorithms,” he said.
That theme – when deep learning competes with less power-hungry techniques – was taken up by Vivienne Sze, an associate professor at the Massachusetts Institute of Technology. By good fortune, Sze recently had two graduate students who designed similar facial recognition chips, one based on the Histograms of Oriented Gradients (HOG) method of feature recognition, and the other using the MIT-developed Eyeriss accelerator for CNNs (eyeriss.mit.edu). Both chips were implemented in the same foundry technology, with similar logic and memory densities, and put to work on facial recognition.
MIT’s Vivienne Sze: CNNs not always the best
Calling it a “good controlled experiment,” Sze described the energy consumption versus accuracy measurements, concluding that the Eyeriss machine-learning chip was twice as accurate on the AlexNet benchmark. However, that doubling in accuracy came at the price of a 300-times multiplier in energy, increasing to a 10,000-times energy penalty in some cases, as measured in nanojoules per pixel.
“The energy gap was much larger than the throughput gap,” Sze said. The Eyeriss CNNs require more energy because of the programmability factor, with weights of eight bits per pixel. “The question becomes are you willing to give up a 300x increase in energy, or even 10,000x, to get a 2X increase in accuracy? Are you willing to sacrifice that much battery life?”
“The main point — and it is really important — is that CNNs are not always the best solution. Some hand-crafted features perform better,” Sze said.
Two image processing chips created at M.I.T. resulted in sharply different energy consumption levels. Machine learning approaches, such as CNNs, are flexible, but often not as efficient as more hard-wired solutions. (Source: 2018 VLSI Symposium).
Two European consortia, CEA-Leti and Imec, were well represented at the VLSI symposium.
Denis Dutoit, a researcher at France’s CEA Tech center, described a deep learning core, PNeuro, designed for neural network processing chains.
The solution supports traditional image processing chains, such as filtering, without external image processing. The modular SIMD architecture can be sized to fit the best area/performance per application.
Dutoit said the energy consumption was much less than that of traditional cores from ARM and Nvidia on a benchmark application, recognizing faces from a database of 18,000 images at a recognition rate of 97 percent.
GPUs Vs. Custom Accelerators
The sharp uptake of AI in image and voice recognition, navigation systems, and digital assistants has come in part because the training cycles could be completed efficiently on massively parallel architectures, i.e., GPUs, said Bill Dally, chief scientist at Nvidia Inc. Alternatives to GPUs and CPUs are being developed that are faster, but less flexible. Dally conceded that creating a task-specific processor might result in a 20 percent performance gain, compared with a GPU or Transaction Processing Unit (TPU). However, “you would lose flexibility if the algorithm changes. It’s a continuum: (with GPUs) you give up a little efficiency while maximizing flexibility,” Dally said, predicting that “AI will dominate loads going forward.”
Joe Macri, a vice president at AMD, said that modern processors have high-speed interfaces with “lots of coherancy,” allowing dedicated processors and CPUs/GPUs to used shared memory. “It is not a question of an accelerator or a CPU. It’s both.”
Whether it is reconfigurable architectures, hard-wired circuits, and others, participants at the VLSI symposium agreed that AI is set to change lives around the globe. Macri pointed out that only 20 years ago, few people carried phones. Now, no one would even think of going out with their smart phone – it has become more important than carrying a billfold or purse, he noted. Twenty years from now, machine learning will be embedded into phones, homes, and factories, changing lives in ways few of us can foresee.
The exploding use of Artificial Intelligence (AI) is ushering in a new era for semiconductor devices that will bring many new opportunities but also many challenges. Speaking at the AI Design Forum hosted by Applied Materials and SEMI during SEMICON West in July, Dr. John E. Kelly, III, Senior Vice President, Cognitive Solutions and IBM Research, talked about how AI will dramatically change the world. “This is an era of computing which is at a scale that will dwarf the previous era, in ways that will change all of our businesses and all of our industries, and all of our lives,” he said. “This is the era that’s going to power our semiconductor industry forward. The number of opportunities is enormous.”
Also speaking at the event, Gary Dickerson, CEO of Applied Materials, said AI “needs innovation in the edge and in the cloud, in generating data on the edge, storing the data, and processing that data to unlock the value. At the same time Moore’s Law is slowing.” This creates the “perfect opportunity,” he said.
Ajit Manocha, President and CEO of SEMI, calls it a “rebirth” of the semiconductor industry. “Artificial Intelligence is changing everything – and bringing semiconductors back into the deserved spotlight,” he notes in a recent article. “AI’s potential market of hundreds of zettabytes and trillions of dollars relies on new semiconductor architectures and compute platforms. Making these AI semiconductor engines will require a wildly innovative range of new materials, equipment, and design methodologies.”
”Hardware is becoming sexy again,” said Dickerson. “In the last 18 months there’s been more money going into chip start ups than the previous 18 years.” In addition to AI chips from traditional IC companies such as Intel and Qualcomm, more than 45 start-ups are working to develop new AI chips, with VC investments of more than $1.5B — at least five of them have raised more than $100 million from investors. Tech giants such as Google, Facebook, Microsoft, Amazon, Baidu and Alibaba are also developing AI chips.
Dickerson said having the winning AI chip 12 months ahead of anyone else could be a $100 billion opportunity. “What we’re driving inside of Applied Materials is speed and time to market. What is one month worth? What is one minute worth?”
IBM’s Kelly said there’s $2 trillion of decision support opportunity for artificial intelligence on top of the existing $1.5-2 billion information technology industry. “Literally every industry in the world is going to be impacted and transformed by this,” he said.
AI needed to analyze unstructured data
Speaking at an Applied Materials event late last year during the International Electron Devices Meeting, Dr. Jeff Welser, Vice President and Director of IBM Research’s – Almaden lab, said the explosion in AI is being driven by the need to process vast amounts of unstructured data, noting that in just two days, we now generate as much data as was generated in total through 2003. “Somewhere around 2020, the estimate is maybe 50 zettabytes of data being produced. That’s 21 zeros,” he said.
Welser — who will be delivering the keynote talk at The ConFab 2019 in May — noted that 80% of all data is unstructured and growing 15 times the rate of structured data. “If you look at the growth, it’s really in a whole different type of data. Voice data, social media data, which includes a lot of images, videos, audio and text, but very unstructured text,” he said. And then there’s data from IoT-connected sensors.
There are various ways to crunch this data. CPUs work very well for structed floating point data, while GPUs work well for AI applications – but that doesn’t mean people aren’t using traditional CPUs for AI. In August, Intel said it sold $1 billion of artificial intelligence processor chips in 2017. Reuters reported that Navin Shenoy, its data center chief, said the company has been able to modify its CPUs to become more than 200 times better at artificial intelligence training over the past several years. This resulted in $1 billion in sales of its Xeon processors for such work in 2017, when the company’s overall revenue was $62.8 billion. Naveen Rao, head of Intel’s artificial intelligence products group, said the $1 billion estimate was derived from customersthat told Intel they were buying chips for AI and from calculations of how much of a customer’s data center is dedicated to such work.
Custom hardware for AI is not new. “Even as early as the ‘90s, they were starting to play around with ASICS and FPGAs, trying to find ways to do this better,” Welser said. Google’s Tensor Processing Unit (TPU), introduced in 2016, for example, is a custom ASIC chip built specifically for machine learning applications, allowing the chip to be more tolerant of reduced computational precision, which means it requires fewer transistors per operation.
It really was when the GPUs appeared in the 2008-2009 time period when people realized that in addition to the intended application – graphics processing – they were really good for doing the kind of math needed for neural nets. “Since then, we’ve seen a whole bunch of different architectures coming out to try to continue to improve our ability to run the neural net for training and for inferencing,” he said.
AI works by first “training” a neural network where weights are changed based on the output, followed by an “inferencing” aspect where the weights are fixed. This may mean two different kinds of chips are needed. “If you weren’t trying to do learning on it, you could potentially get something that’s much lower power, much faster, much more efficient when taking an already trained neural net and running it for whatever application. That turns out to be important in terms of where we see hardware going,” he said.
The problem with present day technology – whether it’s CPUs, GPUs, ASICs or FPGAs — is that there is still a huge gap between what processing power is required and what’s available now. “We have a 1,000x gap in performance per watt that we have to close,” said Applied Materials’ Dickerson.
There’s a need to reduce the amount of power used in AI processors not only at data centers, but for mobile applications such as automotive and security where decisions need to be made in real time versus in the cloud. This also could lead to a need for different kinds of AI chips.
An interesting case in point: IBM’s world-leading Summit supercomputer, employs 9,216 IBM processors boosted by 27,648 Nvidia GPUs – and takes a room the size of two tennis courts and as much power as a small town!
To get to the next level in performance/Watt, innovations being researched at the AI chip level include:
low precision computing
In one study, IBM artificially reduced the precision in a neural net and the results were surprising. “We found we could get down the floating point to 14 bit, and we really were getting exactly the same precision as you could with 16 bit or 32 bit or 64 bit,” Welser said. “It didn’t really matter at that point.”
This means that some parts of the neural net could be high precision and some parts that are low precision. “There’s a lot of tradeoffs you can make there, that could get you lower power or higher performance for that power, by giving up precision,” Welser said.
Old-school analog computing has even lower precision but may be well suited to AI. “Analog computing was extremely efficient at the time, it’s just you can’t control the errors or scale it in any way that makes sense if you’re trying to do high precision floating point,” Welser said. “But if what you really want is the ability to have a variable connection, say to neurons, then perhaps you could actually use an analog device.”
Resistive computing is a twist on analog computing that has the added advantage of eliminating the bottleneck between memory and compute. Welser said to think of it as layers of neurons, and the connections between those neurons would be an analog resistive memory. “By changing the level of that resistive memory, the amount of current that flows between one neuron and the next would be varied automatically. The next neuron down would decide how it’s going to fire based on the amount of current that flowed into it.
IBM experimented with phase change memory for this application. “Obviously phase change memory can go to a low resistance or a high resistance (i.e., a 1 or a 0) but there is no reason you can’t take it somewhere in between, and that’s exactly what we would want to take advantage of here,” Welser said.
“There is hope for taking analog devices and using them to actually be some of the elements and getting rid of the bottleneck for the memory as well as getting away from the precision/power that goes on with trying to get to high precision for those connections,” he added.
A successful resistive analog memory ultimately winds up being a materials challenge. “We’d like to have like a thousand levels for the storage capacity, and we’d like to have a very nice symmetry in turning it off and on, which is not something you’d normally think about,” Welser said. “One of the challenges for the industry is to think about how you can get materials that fit these needs better than just a straight memory of one bit on or off.”
Sundeep Bajikar, head of market intelligence at Applied Materials, writing in a blog, said “addressing the processor-to-memory access and bandwidth bottleneck will give rise to new memory architectures for AI, and could ultimately lead to convergence between logic and memory manufacturing process technologies. IBM’s TrueNorth inference chip is one such example of a new architecture in which each neuron has access to its own local memory and does not need to go off-chip to access memory. New memory devices such as ReRAM, FE-RAM and MRAM could catalyze innovation in the area of memory-centric computing. The traditional approach of separating process technologies for high-performance logic and high-performance memory may no longer be as relevant in a new AI world of reduced precision computing.”
After decades of R&D, two emerging memory types – the phase change memory-based 3D Xpoint, co-developed by Intel and Micron, and the embedded spin-torque transfer magnetic RAM (e-MRAM) from several foundries – are now coming to the market. One point of interest is that neither memory type relies on the charge-based SRAM and DRAM memory technologies that increasingly face difficult scaling challenges. Another is that both have inherent performance advantages that could extend their uses for decades to come.
3D XPoint is a storage class memory (SCM) based on phase-change that fits in between fast DRAM and non-volatile NAND; it is currently available in both SSDs and sampling in a DIMM form factor. David Kanter, an analyst at Real World Technologies (San Francisco) said the Optane SSDs are selling now but the DIMMs are shaping up to be “an early 2019 story” in terms of real adoption. “People are very excited about the DIMMs, including customers, software developers, the whole computer ecosystem. There is a lot of software development going on that is required to take advantage of it, and a lot of system companies are saying they can’t wait. They are telling Intel ‘give me the hardware.’”
“Intel is taking the long view” when it comes to 3D XPoint (the individual devices) and Optane (the SSDs and DIMMs), Kanter said. “This is a new technology and it is not a trivial thing to bring it to the market. It is a testament to Intel that they are taking their time to properly develop the ecosystem.”
However, Kanter said there is not enough public information about 3D XPoint DIMMs, including performance, price, power consumption, and other metrics. Companies that sell enterprise database systems, such as IBM, Microsoft, Oracle, SAP, and others, are willing to pay high prices for a storage-class memory solution that will improve their performance.The Optane DIMMs, according to Intel, are well-suited to “large-capacity in-memory database solutions.”
According to the Intel Web site, Optane DC persistent memory “is sampling today and will ship for revenue to select customers later this year, with broad availability in 2019.” It can be placed on a DDR4 module alongside DRAM, and matched up with next-generation Xeon processors. Intel is offering developers remote access to systems equipped with Optane memory for software development and testing.
Octane DIMM reaches ‘broad availability’ in 2019
Speaking at the Symposium on VLSI Technology in Honolulu, Gary Tressler, a distinguished engineer at IBM Systems, said “the reliability of 3D NAND impacts the enterprise,” and predicted that the Optane storage class memory will serve to improve enterprise-class systems in terms of reliability and performance.
The DRAM scaling picture is not particularly bright. Tressler said “it could be four years before we go beyond the 16-gigabit size in terms of DRAM density.” DRAM companies are eking out scaling improvements of 1nm increments,” an indication of the physical limitations facing the established DRAM makers.
Al Fazio, a senior fellow at Intel who participated in the memory-related evening panel at the VLSI symposia, and said that the early adopters of the Optane technology have seen significant benefits: one IT manager told Fazio that by adding a layer of Optane SSD-based memory he was able to rebuild a database in seconds versus 17 minutes previously. Fazio said he takes particular pride in the fact that, because of Optane, some doctors are now able to immediately read the results of magnetic resonance imaging (MRI) tests.
“An MRI now takes two minutes instead of 40 minutes to render,” Fazio said, adding that a second-generation of 3D Xpoint is being developed which he said draws upon “materials improvements” to enhance performance.
Chris Petti, a senior director of advanced technology at Western Digital, said DRAM pricing has been “flat for the last five to seven years,” making it more expensive to simply add more DRAM to overcome the latency gap between DRAM and flash. “DRAM is not scaling so there are a lot of opportunities for a new technology” such as Optane or the fast NAND technologies, he said. Samsung is working on a single-bit-per-cell form of Fast NAND.
In a Monday short course on emerging memory technologies at the Symposium on VLSI Circuits, Petti said the drawback to phase change memories (PCMs), such as 3D XPoint, is the relatively high write-energy-per-bit, which he estimated at 460 pJ/bit, compared with 250 pJ/bit for standard NAND (based on product spec sheets). In terms of cost, latency, and endurance, Petti judged the PCM memories to be in the “acceptable” range. While the price is five to six times the price-per-bit of standard NAND, Petti noted that the speed improves “because PCM (phase change memory) is inherently faster than charge storage.”
Source: Chris Petti, Western Digital, short course presentation at 2018 Symposium on VLSI Circuits
Phase-change materials, such as Ge2Sb2Te5, change between two different atomic structures, each of which has a different electronic state. A crystalline structure allows electrons to flow while an amorphous structure blocks the flow. The two states are changed by heating the PCM bit electrically.
Philip Wong, a Stanford University professor, said the available literature on PCM materials shows that they can be extremely fast; the latencies at the SSD and DIMM levels are largely governed by “protocols.” In 2016, a team of Stanford researchers said the fundamental properties of phase-change materials could be as much as a thousand times faster than DRAM.
In a keynote speech at the VLSI symposia, Scott DeBoer, executive vice president of technology development at Micron (Boise, Idaho), said “clearly the most successful of the emerging memories is 3D XPoint, where the technology performance has been proven and volume production is underway. 3D XPoint performance and density are midway between DRAM and NAND, which offers opportunities to greatly enhance system-level performance by augmenting existing memory technologies or even directly replacing them in some applications.”
Currently, the 3D XPoint products are made at a fab in Lehigh, Utah. The initial technology stores 128Gb per die across two stacked memory layers. Future generations can either add more memory layers or use lithographic pitch scaling to increase die capacity, according to Micron.
DeBoer noted that “significant system-level enablement is required to exploit the full value of 3D XPoint memory, and this ongoing effort will take time to fully mature.”
eMRAM Race Begins by Major Foundries
Magnetic RAM technology has been under serious development for three decades, resolving significant hurdles along the way with breakthroughs in MgO magnetic materials and device architecture. Everspin Technology has been shipping discrete MRAM devices for nearly a decade, and the three major foundries are readying embedded MRAM for SoCs, automotive ICs, and other products. The initial target is to replace NOR-type flash on devices, largely due to the large charge pumps required to program NOR devices which add multiple mask layers.
GlobalFoundries, which manufactures the Everspin discrete devices, has qualified eMRAM for its 22nm FD-SOI process, called 22FDX. TSMC also has eMRAM plans.
At the Symposium on VLSI Technology, Samsung Foundry (Giheung, Korea) senior manager Yong Kyu Lee described an embedded STT-MRAM in a 28-nm FDSOI logic process, aimed at high-speed industrial MCU and IoT applications.
Interestingly, Lee said compared with the bulk (non-SOI) 28-nm process, the FD-SOI technology “has superior RF performance, low power, and better analog characteristics than 28-nm bulk and 14-nm FinFET CMOS.” Lee indicated that the FD-SOI-based eMRAM would be production-ready later this year.
Samsung ported its STT perpendicular-MTJ (magnetic tunnel junction) eMRAM technology from its 28-nm bulk to its FD-SOI CMOS process. The company offers the eMRAM as a module, complementing an RF module. The “merged embedded STT MRAM and RF-CMOS process is compatible to the existing logic process, enabling reuse of IP,” he said.
Looking forward to the day when MRAM could complement or replace SRAM, Lee said “even though we have not included data in this paper, our MTJ shows a potential for storage working memory due to high endurance (>1E10) and fast writing (<30ns).
Beyond Embedded to Last Level Cache
As foundries and their customers gain confidence in eMRAM’s retention, power consumption, and reliability, it will begin to replace NOR flash at the 40-nm, 28-nm, and smaller nodes. However, future engineering improvements are needed to tackle the SRAM-replacement.
SRAM scaling is proving increasingly difficult, both in terms of the minimum voltages required and the size of the six-transistor-based bits. MRAM researchers are in hot pursuit of the ability to use replace some of the SRAM on processors with Last Level Cache (LLC) iterations of magnetic memory. These LLC MRAMs would be fabricated at the 7nm, 5nm, or beyond nodes.
Mahendra Pakala, senior director of memory and materials at the Applied Materials Advanced Product Technology Development group, said for eMRAM the main challenges now are achieving high yields with less shorting between the magnetic tunnel junctions (MTJs). “The big foundries have been working through those problems, and embedded MRAM is getting closer to reality, ramping up sometime this year,” he said.
For LLC applications, STT-MRAM has approached SRAM and DRAM performance levels for small sample sizes. At the VLSI symposium, researchers from Applied Materials, Qualcomm, Samsung, and TDK-Headway, all presented work on SRAM cache-type MRAM devices with high performance, tight pitches, and relatively low write currents.
Applied’s VLSI symposium presentation was by Lin Xue, who said the LLC-type MRAM performance is largely controlled by the quality of the PVD-deposited layers in the MTJ, while yields are governed by the ability to etch the MTJ pillars efficiently. Etching is extremely challenging for the tight pitches required for SRAM replacement, since the tight-pitch MTJ pillars must be etched without redepositing material on the sidewalls.
Caption: Lin Xue, et al, Applied Materials presentation at 2018 Symposium on VLSI Technology
Deposition is also difficult. The MTJ structures contain multiple stacks of cobalt and platinum, and the thickness of the multilayers must be reduced to meet the 7nm node requirements. Any roughness in the interfaces creates secondary effects which reduce perpendicular magnetic anisotropy (PMA). “The performance is coming from the interface, essentially. If you don’t make the interface sharp, you don’t end up with the expected improvement in PMA,” Pakala said.
Applied has optimized a PVD process for deposition of the 15-plus layers of many different materials required for the magnetic tunnel junctions. Pakala said the PVD technology can sputter more than 10 different materials. The Endura-based system uses a multi-cathode approach, enabling each chamber to have up to five targets. With a system of seven chambers, companies can deposit the required variety of materials and, if desired, increase throughput by doubling up on the targets.
The system would include a metrology capability, and because the materials are easily oxidized, the entire system operates at vacuum levels beyond the normal 10E-8 Torr level. For MRAM deposition, operating at 10 to minus 9 or even 10 to minus 10 Torr levels may be required.
“When we start talking about the 7 and 5 nanometer nodes for SRAM cache replacement, the cell size and distances between the bits becomes very small, less than 100 nm from one MTJ to another. When we get to such small distances, there are etching issues, mainly redepositing on the sidewalls. The challenge is: How do we etch at reduced pitch without shorting?” Pakala said.
“Integrated thermal treatment and metrology to measure the thicknesses, all of which has to be done at extremely low vacuum, are major requirements,” he said.
“At this point it is not a question of the basic physics. For MRAM, it is, as they say, ‘just engineering’ from here on out,” he said.
Applied Materials has introduced a set of processes that enable cobalt to be used instead of tungsten and copper for contacts and middle-of-line interconnects. Higher levels of metal, which typically have wider dimensions, will still employ copper as the material of choice, but at more advanced nodes, cobalt will likely be the best option as linewidths continue to shrink. Tungsten will still be used at the gate contact level.
To enable the use of cobalt, Applied has combined several materials engineering steps – pre-clean, PVD, ALD and CVD – on the Endura® platform. Moreover, Applied has defined an integrated cobalt suite that includes anneal on the Producer® platform, planarization on the Reflexion® LK Prime CMP platform and e-beam inspection on the PROVision™ platform. The process flow is shown in FIGURE 1.
While challenging to integrate, cobalt brings significant benefits to chips and chip making: lower resistance and variability at small dimensions; improved gapfill at very fine dimensions; and improved reliability. The move to cobalt, which is underway at Intel, GlobalFoundries and other semiconductor manufacturing companies, is the first major change in materials used as conductors since copper dual damascene replaced aluminum in 1997. “You don’t see inflections this large very often,” said Jonathan Bakke, global product manager, Metal Deposition Products at Applied Materials. “This is a complete metallization change.”
At IEDM last year, Intel said it would use cobalt for its 10nm logic process for several of the lower metal levels, including a cobalt fill at the trench contacts and cobalt M0 and M1 wiring levels. The result was much-improved resistivity– a 60 percent reduction in line resistance and a 1.5X reduction in contact resistance – and improved reliability.
Today, critical dimensions of contacts and interconnects are about 20 nm, plus or minus a few nanometers depending on the customer and how it’s defined. “As you get smaller – and you typically get about 30% smaller with each node — you’re running out of room for tungsten. Copper is also facing challenges in both gap fill and electromigration,” Bakke said.
As shown in FIGURE 2, cobalt has advantages over copper when dimensions shrink to about 10nm. They are presently at 30 nm. It’s not yet clear when that cross-over point will arrive, but decisions will be based on how much resistivity and electromigration improvement can be gained.
Applied Materials started developing cobalt-based processes in the mid-2000s, and released the Volta CVD Cobalt system in 2013, which was designed to encapsulate copper interconnects in cobalt, which helped improve gap fill and electromigration. “It was shortly thereafter that we started depositing thick CVD cobalt films for metalization. We quickly realized that there’s a lot of challenges with doing this kind of metalization using cobalt because of its unique properties,” Bakke said. Cobalt can be reflowed and recrystallized, which eliminates seams and leads to larger grain sizes, which reduces resistivity. “We started looking at things like interfaces, adhesion and microstructure of the cobalt to make sure that it was an efficient material and it had very low resistance and high yield for in-device manufacturers,” he added. One perfected, it took several years before the processes were fully qualified at customers. “This year is when we start to see proliferation and expect HDM manufacturing of real devices with cobalt,” Bakke said.