Sharing Inference Engines at Scale - Elastic Inference
I don't know what physical technology they are using to support Elastic Inference, but it's one of the cooler services I've overviewed.
One of the essential attributes of the public cloud is optimizing the sharing of specialized resources. In the data center, we have resources such as GPUs that are difficult to share across multiple systems. AWS Elastic Inference is a production use case of Amazon sharing inference engines at scale.
After selecting Elastic Inference as an option for any of your instances, AWS monitors your instances for one of the known ML frameworks. Once detected, AWS attached an inference engine to the instance across the customer’s VPC network.
This sounds a lot like the promise of CXL.