|
|
www.design-reuse-embedded.com |
Rebellions Builds Chiplet Roadmap, Merges with Sapeon
www.eetimes.com, Oct. 17, 2024 –
South Korean AI chip startup Rebellions has partnered with ADTechnology, Samsung Foundry and Arm to build chiplet-based data center AI accelerator products. Rebellions will integrate its forthcoming chiplet-based AI accelerator, Rebel, with ADTechnology's Arm-based CPU chiplet.
The ADTechnology CPU chiplet is based on Arm Neoverse compute subsystems V3. ADTechnology will design and implement the CPU chiplet on Samsung 2 nm.
Rebellions is already part of the Arm ecosystem, Rebellions CTO Jinwook Oh told EE Times. "We already have Arm IP [in existing chips]," he said. "Arm has helped us a lot with the Atom project."
The ease of emulating Arm CPU IP for validation and software stack development was another reason to choose Arm, Oh said.
Third generation
Rebellions, founded in 2020, has two generations of its chips already on the market.
The company's first-generation chip, Ion, is aimed at low-latency inference for the financial trading segment. It has a single instance of Rebellions' CGRA-like accelerator, offering 4 TFLOPS of FP16 compute, implemented in TSMC 7 nm.
Rebellions' second-generation chip, Atom, was designed as a power-efficient inference engine for the data center. Atom is on the market today, offered as cards, servers or rack-scale systems.
Developed before the explosion in LLM popularity, Atom uses GDDR6 external memory but can nevertheless be used to serve power-efficient LLM inference, Oh said.
"Atom is actively being used for computer vision and LLMs," he said, noting that a 7B LLM can be spread across eight cards to provide 40 tokens per second per user (1,000 to 2,000 tokens per second total throughput). "We are working on another card with four Atom chips on one card so we can build denser, more compute-capable servers for customers."
The 8-Atom configuration can support Llama-70B. Oh said Atom customers, mainly Korean telcos, CSPs, enterprises and the Korean public sector, are using it for custom RAG, agentic AI and chatbots.
Each Atom chip has eight neural engines offering 32-TFLOPS FP16 compute with 64-MB SRAM. The neural engines are arranged in two clusters, each with a local network-on-chip and L1 cache. Atom can be configured to consume 75 to 130 W (equivalent to 0.5 INT8 TOPS/W).