This repository benchmarks the performance of Groq's Llama3 language model running on their specialized Language Processing Units (LPUs) against OpenAI's GPT-3.5 Turbo model on GPUs. By comparing response times, it highlights Groq's speed advantage for low-latency large language model inference.
Before running the notebook, ensure you have the following:
- Access to Google Colab or a Jupyter Notebook environment
- Groq and OpenAI API keys
-
Open the notebook from this repository.
-
Follow the instructions in the notebook to install the required Python packages.
-
When prompted, enter your Groq and OpenAI API keys.
-
Run the notebook cells sequentially.
-
The notebook will initialize the Groq LPU with the Llama3 model and an OpenAI model.
-
It will then send the same prompt to both models and measure the response times.
-
Optionally, feel free to change the prompts or input your own text to see how the response times vary with different inputs.
-
The response times for Groq's Llama3 model and the OpenAI model will be displayed, allowing you to directly compare the inference speed.
Contributions to this project are welcome. If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.