Cerebras Systems Launches ‘World’s Most Powerful’ AI Inference Platform
Cerebras Launches World’s Fastest AI System with 20x Performance of NVIDIA.
Cerebras Systems announced Cerebras Inference, the world’s fastest AI inference solution.
The Cerebras Inference cloud system is based on WSE-3 accelerators. These giant products, manufactured using TSMC’s 5-nm process technology, contain 4 trillion transistors, 900 thousand cores and 44 GB of SRAM. The total bandwidth of the integrated memory reaches 21 PB/s, and the internal interconnect – 214 Pbit/s. For comparison: one HBM3e chip in NVIDIA H200 can boast a bandwidth of “only” 4.8 TB/s.
According to Cerebras, the new inference platform delivers up to 20x higher performance than comparable NVIDIA-based solutions in hyperscaler services. Specifically, the performance is up to 1800 tokens per second per user for the Llama3.1 8B AI model and up to 450 tokens per second for Llama3.1 70B.
Cerebras Inference is available in three tiers: free, developer, and enterprise. The arrival of Cerebras and Groq could change the dynamics of the AI industry.