inferentia

Here are 9 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu hpu mlops xpu llm inferentia llmops llm-serving trainium

Updated Nov 14, 2024
Python

PygmalionAI / aphrodite-engine

Star

Large-scale LLM inference engine

machine-learning cuda intel api-rest lora rocm inference-engine tpu inferentia speculative-decoding

Updated Nov 13, 2024
Python

aws-samples / foundation-model-benchmarking-tool

Star

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking benchmark p5 bedrock evaluation-metrics sagemaker g6 p4d g5 foundation-models inferentia generative-ai llama2 trainium llama3 g6e

Updated Nov 14, 2024
Jupyter Notebook

aws-solutions-library-samples / guidance-for-machine-learning-inference-on-aws

Star

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways you can pack thousands of unique PyTorch deep learning (DL) models into a scalable architecture and evaluate performance

ml eks-cluster mlops-workflow do-framework inferentia graviton3

Updated Nov 6, 2024
Shell

aws-samples / aws-inferentia-huggingface-workshop

Star

CMP314 Optimizing NLP models with Amazon EC2 Inf1 instances in Amazon Sagemaker

nlp sagemaker inferentia

Updated Dec 20, 2023
Jupyter Notebook

aws-samples / awsome-fmops

Star

Collection of bet practices, reference architectures, examples, and utilities for foundation model development and deployment on AWS.

kubernetes gpu terraform pytorch eks kserve karpenter inferentia generative-ai llm-training llm-inference

Updated Oct 31, 2024
HCL

daekeun-ml / aws-inferentia

Star

This repository provides an easy hands-on way to get started with AWS Inferentia. A demonstration of this hands-on can be seen in the AWS Innovate 2023 - AIML Edition session.

inferentia