There was a reverberation throughout the manufacturing community this morning when Bloomberg published a story alleging that China implanted clandestine chips inside of motherboards destined for the servers of major U.S. companies. Apple and Amazon categorically denied that this happened at all, but the major question manufacturing insiders are asking is: could it have? The scary answer is yes, and the reality is that it’s likely not even as difficult as the Bloomberg article makes it out to be. Preventing an attack vector like this would be extremely difficult, and now that the idea is out there, manufacturing supply chains will need to put mitigations in place to protect themselves.
I’ve spent my entire professional career involved in the design and manufacture of devices, including six years as a design engineer at Apple. I’ve seen the inside of big and small companies, and how they work with vendors and trusted partners to design boards like those used in the servers mentioned in the Bloomberg article, but also for consumer, medical, and other devices. These processes are manual and involve deep collaboration between multiple parties: the design team, the manufacturing team, and oftentimes outside consultants. In well-resourced and secretive teams, it’s possible to maintain control of the most sensitive stuff, such as the schematic, which describes how everything should be connected, and the netlist, which describes how everything is connected. Keeping these design elements close ensures that no changes can be made that could impact the integrity or functionality of the design (such as adding clandestine components).
Under-resourced teams, however, heavily leverage their manufacturing partners to do a lot of the design and layout for them. Higher complexity products and external pressures to ship fast has increased this trend in the last decade. In the case that a company wants to build something like a server, the team might even just ask the manufacturer to use an off-the-shelf reference design with a few modifications, and not get very involved at all. Most of the time this works well, but it doesn’t mean that it’s without risk from bad actors. Clandestine parts or unintentional circuitry would be incredibly easy to inject and unlikely to be caught by harried teams who don’t have dedicated electrical engineering resources – which is common at small companies and even at well-resourced companies if it’s not a core competency of the business, such as an IT team building a server farm. “If you treat hardware like a commodity, and go with whatever’s cheapest, you lose control and oversight of the design,” says Saket Vora, a hardware leader who has designed circuit boards at Apple and multiple startups.
If an exploit is part of the original design, all bets are off for detecting it – because it’s now considered part of the golden reference. Even if the exploit isn’t in the official drawing provided by the manufacturer, parts could be made according to a similar, but different, internal-only design. This leads to a reality where you have a “customer-facing drawing” and a “factory drawing”. Quality engineers who have built with a variety of manufacturers have probably encountered this shady practice – whether it’s on circuit boards or enclosure parts – as a bit of an open secret of manufacturers who are changing specifications in attempts to improve their razor-thin margins. These same engineers have been fighting counterfeit parts in their supply chains for years, oftentimes not knowing if the parts their supplier purchased have all of the intended circuitry. While it’s possible to manually recognize a difference between a “customer-facing drawing” and what was built, Saket Vora weighed in that he thought it would be incredibly unlikely given the complexity, and again, limited team and engineering resources.
If the exploit is introduced after the design phase, the situation isn’t much better. Current automated optical inspection (AOI) technology used on circuit boards is designed to check the quality of parts that are supposed to be there, they are explicitly not looking for extra parts, so such parts would most likely remain undetected. The manufacturer is typically responsible for programming and maintaining this inspection equipment, so could also easily update the test list to avoid certain areas of the design.
While the clandestine chips were reportedly installed in 2015, the manufacturing world is still incredibly under-equipped to prevent or even to detect this kind of issue. The main method companies use is to build relationships with trusted suppliers, wherever they may be, but there are two other steps companies can take today to reduce the likelihood of something like this happening to them. The first is to think twice about how much of their design and oversight they outsource. Not everything requires the same level of sensitivity and oversight, but things that do, should be resourced appropriately, and that will cost money. Frankly, these costs may be untenable for most companies, especially as a “theoretical” issue. That makes the second step even more important: build digital transparency and traceability into their supply chains. This means more than audits, site visits, and spreadsheets: it means installing technology that captures a digital 100% track and trace record, via images and in some cases x-rays, of every sensitive circuit board, component, or finished unit that is built. Digital traceability with image data is the only way to have a record that is quickly searchable for parts (or issues) that weren’t part of the original plan. Only with a digital record can you go back in time. Only with a digital record can you find things that were added, versus just the stuff that is missing.
Regardless of whether the allegations against Supermicro are true, I bet that every customer they’ve sold to for the last couple of years is wondering: “Do we have clandestine parts in our server racks? Which ones?” If those companies had an image and x-ray of every server board they had made, they could quickly identify differences between boards and if something suspicious was found, quickly know what other boards were affected. This technology exists today and is used by leaders in quality, particularly in the consumer electronics manufacturing world – but not yet considered a standard across the industry. Certainly for high sensitivity electronics, like those destined for governmental or national security purposes, 100% traceability should be an outright requirement.
For years, the consumer electronics industry has been plagued by counterfeit parts (accidental or intentional), but the addition of intentional clandestine parts elevates this potential attack vector to a new level of seriousness. There needs to be a solution to identify differences, intentional and unintentional, in goods that are built. If companies are using manufacturing partners to build their units, they need to care about and control this information – independently from their manufacturers (although sharing it is a best practice). There needs to be a way to not just look at what is supposed to be there, but to look at what’s not supposed to be there. Machine learning and other artificial intelligence technologies will likely be part of the solution, but can only work in combination with the standardization of 100% track-and-trace practices. It’s my hope that these allegations bring to light the very real risks inside of the supply chains of everyday products, so that we can work together to solve them.