It’s been a while. Actually, a while is a bit of an understatement, because in the context of the typical rapid-fire product cycles for CPUs, it’s been more like an eternity. For years, Intel had long maintained those product cycles like a metronome, regular and precise in delivery of new processors and platforms, so much so they even had a name for their planning and scheduling of new generations: Tick-Tock. But in the case of its premier CPU line Xeon, recent delays have meant that until this month, major OEMs like Dell, HP, and Lenovo hadn’t been able to upgrade processor options for their Premium single-socket (1S, with Xeon W) workstations nor their dual-socket models (2S, with Xeon Scalable) for three full years. Finally, the wait is over, as Intel this month, March 2023, unveiled the anticipated Sapphire Rapids generation of Xeon for workstations, split into two lines: the Xeon W-2400 and Xeon W-3400 lines.
The delay had meant the upper end of the fixed workstation market, historically and virtually 100% based on Xeon, lost its luster in the very criterion upon which the segment is primarily judged: performance. Worse, since Intel’s Core brand was still pumping out new generations of processors over the same period (11th Generation Rocket Lake, 12th Generation Alder Lake, and most recently 13th Generation Raptor Lake), more economical Core-based Entry class workstations were outperforming some Premium 1S models, based on Xeon W, and making the vaunted 2S models, based on Xeon Scalable, less impressive by comparison. The combination of factors cut in half the market share of Xeon’s primary domain — the combined Premium 1S and 2S segments — since the start of 2019, down from nearly 50% of all fixed workstation units sold to just 25%. Clearly, Xeon workstations were no longer delivering premium performance, making the eventual appearance of Sapphire Rapids critical to reinforce the upper tiers of the fixed workstation market.
Fixed workstation market segmentation. Image source: Jon Peddie Research. Click image to enlarge.
(Worth noting for those who’ve paid close attention to Intel’s Xeon W line: there was a 2021 release of the W-3300 Ice Lake generation, intended to be the successor to 2019’s W-2200, but for several reasons it was a non-starter in the market, at least for major workstation vendors like Dell, HP, and Lenovo.)
First manifested in its 12th Generation Core (code-named Alder Lake), Intel’s reworked hybrid approach to CPU design leverages one common foundation of two fundamental core types: the P-Core and the E-Core. The P-Core pushes forward on maximizing performance, trimming power where possible, but without sacrificing throughput. The other, the E-Core, prioritizes computational efficiency, trading off some performance to minimize power consumption. In this first duo of E-Core and P-Core microarchitectures, the latter is called Golden Cove, introduced with 12th Generation Core and now appearing in Xeon W.
With this approach, Intel can produce a range of CPU parts, some with more E-Cores for lower demand applications where minimal power consumption is critical, like mobile devices, and some with more P-Cores for applications where performance more critical and power less so, as in line-powered fixed desktops and workstations. Leveraging this approach, Intel can design CPUs from applications with power budgets anywhere from 9 W to high-end desktop 125 W and more, built from the same two core flavors. Check out this previous column for a deeper look at Alder Lake and Intel’s new hybrid designs mixing P-Cores and E-Cores.
Targeting fixed workstations, the Xeon W-2400 and W-3400 lines slant priorities heavily in favor of performance, so much so in fact that they not only incorporate more P-Cores than E-cores, they don’t include E-Cores at all. And that’s the right call, as Xeon W is targeted for one thing only: maximum possible performance in single socket for a line-powered fixed workstation. There are plenty of great applications for E-Cores, but this isn’t one of them, so while Sapphire Rapids takes advantage of the Golden Cove P-Core, it’s no hybrid.
Xeon W9 | Xeon W7 | Xeon W5 | |
W-2400 series | 20 to 24 | 10 to 16 | 6 to 8 |
W-3400 series | 36 to 56 | 20 to 28 | 12 to 16 |
Core counts for Intel’s Sapphire Rapids generation Xeon W-2400 and W-3400 series … and they’re all P-Cores. Data source: Intel.
The core count is not the only criterion separating the W-3400 and W-2400 (and in fact the W5-3400 overlaps in count with the W7 and W9 W-2400 SKUs). While the latter supports four channels of memory and 56 lanes of PCI Express 5.0 I/O, the former doubles the bandwidth for both, with eight channels and 112 lanes. The bump for both makes sense to support the W-3400’s higher processing headroom.
For decades, the silicon industry (and Intel especially) has benefited from the long ride of Moore’s Law. Simply put, Moore’s Law stated that shrinking silicon dimensions would allow transistor counts to grow geometrically at roughly the same cost. It let a vendor count on a budget of twice as many transistors to gin up performance for the next generation. That progression proved incredibly fruitful, arguably the single most significant technological trend to influence progress across virtually every industry in the past fifty years. But while there is disagreement on if and when Moore’s Law ends (at least in the context of today’s silicon processes), few would deny it’s slowing. And that’s made it no longer feasible to simply count on twice the number of transistors to, say, double the core count of a high-performance CPU from one generation to the next.
Though not giving up on a Moore’s Law progression, Intel is as aware as any silicon vendor the need to search out new avenues to increase compute density beyond just stuffing more transistors on one silicon chip. Integrating more silicon die, or more common referred to as chiplets, into a single chip package accomplishes much of the same end-goal, and accordingly, every vendor of high-performance processors is adopting or at least considering chiplet approaches to maximize compute density per CPU socket.
Arch-rival AMD is one of those vendors, and its take on a chiplet approach it dubbed the Infinity Architecture was the linchpin to producing Threadripper PRO, a line of high-core-count workstation CPUs launched first in 2020 to take the core count crown (for a single socket) away from Intel. Covered here and here, Threadripper PRO scales up to 64 cores per socket, extending aggregate computes well beyond that aged 2019 generation Xeon W, it could legitimately challenge two of Intel’s Xeon Scalable processors in a dual socket workstation.
Intel has several technologies at its disposal to integrate multiple chiplets in a package. Anticipating the high core counts Sapphire Rapids was targeting, matching or at least approaching Threadripper PRO’s, it was a foregone conclusion Sapphire Rapids would leverage multi-chiplet design to get there. It did so with its embedded multi-die interconnect bridge (EMIB) technology, aggregating up to four tiles in its multi-tile implementation. (In the context of this discussion, it’s safe to equate a tile and a chiplet.)
Intel harnessed its proven EMIB technology to integrate up to four tiles (aka chiplets) to scale up to Xeon W9’s top end 56 cores. Image source: Intel. Click image to enlarge.
With EMIB enabling the pinnacle of the new Sapphire Rapids Xeon line, the W9-3495X, to scale up to 56 cores, Intel is now able to do to AMD what AMD did to Xeon in workstations back in 2020: provide a credible competitor at the top end of the market. The core count is close to Threadripper PRO’s maximum 64 cores (the 5995WX), but whether the W9-3495X could meet or exceed Threadripper PRO’s performance — or fall short — is ultimately a matter of real-world usage, or in my case, benchmarking.
Toward that end, I ran the most relevant heavily threaded benchmark for high-performance workstation CPUs, SPECworkstation, tapping the older 3.0.4 version so as to compare with past results from not only comparable AMD Threadripper PRO SKUs, but two previous-generation Xeon Scalable Platinum 8280 SKUs with 28 cores each. The latter should deliver the most aggregate computational performance Xeon could deliver in a workstation prior to the arrival of Sapphire Rapids. I aggregated the results from most every SPECworkstation CPU-specific test — including many common CAD-relevant workloads supporting applications like rendering, computational fluid dynamics (CFD) and finite element analysis (FEA) — averaging the scores normalized to the 56C Xeon W9-3459X. System configurations, with respect to supporting memory and storage were comparable, if not exact matches.
Multi-threaded performance (SPECworkstation 3.0.4) for comparable AMD Threadripper PRO SKUs, as well as a pair of previous-generation Xeon Scalable Platinum CPUs, all normalized to the top-end Xeon W9-3495X. Data source: Jon Peddie Research. Click image to enlarge.
In the end, AMD’s 64C Threadripper PRO (with eight more cores) scored 19% higher than the 56C Xeon W9-3495X, while the 32C Threadripper PRO (with 24 fewer cores) scored 6% lower. Perhaps most revealing is that even the previous generation dual-processor Xeon Scalable Platinum 8180 fell short of the new single-socket W9-3495X by 14%.
With respect to its marketing targets for Sapphire Rapids Xeon, Intel says the W-3400 series serves “Expert” workstations while the W-2400 serves “Mainstream” models. Remember what fills in to serve the rest of the market below the W-2400: those aforementioned Core brand CPUs, the same that serve high-end PCs for gaming and commercial applications and that have outshined Xeon lately at lower price points. The two new Xeon lines will again set OEMs premium workstation towers apart from the Entry class, pushing their multi-threaded performance well beyond the reach of Core. As a result, Entry 1S workstations’ erosion of Premium 1S Xeon workstations should come to an end, and I’d expect some degree of rebound in the Premium 1S segment.
But what happens to the very top end of the fixed market, the dual-socket (2S) segment, remains to be seen. Like Threadripper PRO 5995WX, the Xeon W9-3495X reaches core counts that the previous generation of Intel CPUs could only match in a more expensive 2S machine. Scaling up the processing power so much higher in a single socket via chiplets then begs the question as to how motivated a vendor will be to keep a 2S workstation platform going, or if it should keep it going at all.
Workstation OEMs and Intel’s rival IHV are already making product decisions signaling their directions on the issue. AMD clearly believes high-core count, chiplet-based CPUs in a single socket is the right call. While the vendor’s EPYC brand multi-socket, server-focused CPUs can be used in dual-socket workstations (which some smaller vendors are marketing), there are no signs the company is looking to aggressively market the configuration as such, positioning a single-socket Threadripper PRO as the ultimate workstation solution. And as of this writing, we’ve already seen HP and Lenovo adjust their fixed workstation positioning, trimming the 2S offerings in favor of additional Premium 1S models based on the new W-3400 Xeons.
Lenovo took the logical step of cutting their two 2S models down to one, the ThinkStation PX. Furthermore, to avoid models stepping on each other’s toes, it made sure the PX could manage a lot more Xeon Scalable cores — 2 x 60 for 120 total cores (all Golden Cove) — than its 1S Xeon W-3400 capable up-to-56C ThinkStation P7 (to go along with the Xeon W-2400 focused ThinkStation P5).
HP did something similar, turning one of its 2S workstations into a W-3400 Premium 1S model like the ThinkStation P7. But while the ThinkStation P7 can support the full range of W-3400, up to the 56C W9-3495X, HP maxed the Z6 out at 36 cores while adding a separate 1S Z8 Fury to showcase the highest core counts up to the 56C W9-3495X. As such, there’s not a lot of daylight — at least with respect to core counts — between HP’s now three Premium 1S workstation SKUs: the W-2400 based Z4, the W-3400 (up to 36 cores) Z6, and the W-3400 Z8 Fury (up to 56 cores). Possibly confusing matters on the branding front, HP kept on a 2S Z8 (minus the Fury suffix) with Xeon Scalable SKUs at up to 32 cores. That means a fully configured 2S Z8 with both sockets filled would only offer 8 more cores than the 1S Z8 Fury can manage alone.
Lastly, we have workstation market leader Dell, the last to announce their plans for Sapphire Rapids Xeon. Dell also is cutting down their 2S workstation model, converting one of their two 7000 series dual-Xeon Precisions to a 1S model based on Xeon W-3400, the Precision 7960 (up to 56 cores). That conversion naturally pairs the 7960 with the previously announced 1S 7865, Dell’s first workstation built around the Threadripper PRO. But what remains a question mark is what Dell plans — or doesn’t — for the 2S segment, as it has not yet announced any Sapphire Rapids Xeon Scalable model at all.
How Dell, Lenovo, and HP’s fixed workstations (with maximum CPU core counts) now fit in to market segments, and the primary CPUs that power them. Data sources: Intel, AMD, Jon Peddie Research. Click image to enlarge.
As covered in several previous articles addressing workstation form factors, power consumption and form factors go hand-in-hand. That is, if you want more aggregate performance, the machine needs more power, all else equal. As supporting evidence, I keep a running record of a range of workstation-caliber CPUs, how much power they are allowed to consume (by the system, set by the tolerance for thermal dissipation of those watts), and what level of performance they can produce with that power budget. With the chance to benchmark the 350 W Xeon W9-3495X, I can add another datapoint to that spectrum. Really, it's more an endpoint, as it represents the highest threshold of power consumption for a CPU I’ve had the chance to review. Including it along with a range of currently-marketed workstation CPUs, together with their respective SPECworkstation aggregate scores, further illustrates the performance-to-power correlation. The chart below shows the full breadth of range in CPUs available across the breadth of the workstation market today. Consider that each point on that chart, and along (or near) the interpolated curve, implies (or requires) a different, larger form factor.
MT performance versus power across a range of workstation class CPUs. Data sources: Intel, AMD, Jon Peddie Research. Click image to enlarge.
Note the power region designated for mobile workstations indicate they are all going to have their CPUs capped at 55 W (used to be 45 W, but Intel recently created a Core HX series specifically targeting maximum-performance mobile workstations). Moving beyond 55 W is the exclusive domain of fixed/line-powered workstations, starting with new ultra-small form factors like HP’s Z2 Mini, Lenovo’s Tiny and Ultra ThinkStations, and Dell’s Precision Compact that match up with a 65 W Intel Core (or AMD Ryzen Pro) CPU, or with more effective cooling (and more commensurate fan noise) stretching up to a 125 W SKU. Moving up, more traditional small form factor (SFF) and mini-towers are generally matched with 125 W CPUs. And then we make jumps up to 200+ W workstation-exclusive CPUs, like Xeon and Threadripper PRO. The point is clear, if you need higher performance for optimal execution of modern multi-threaded workstation-caliber workloads, you’ll need a bigger, line-powered fixed machine.
Read part 2 of this column, where I dig further into Sapphire Rapids Xeon capabilities, as well as HP’s Z8 Fury, a beast of a workstation optimized to house and showcase the top-end 56C Xeon W9-3495X … and much more. In May 2023's column, I'll cover more details of the HP Z8 Fury and take advantage of its capabilities to checkout another high-performace component, Nvidia's new top-of-the-line RTX 6000 Ada Generation GPU.