Context Length Experiment: Evaluating Gemini 1.5 Flash

This repository contains an experiment to test Gemini 1.5 Flash's ability to answer questions about up to 1 million tokens of context. The experiment uses a Kaggle dataset containing App Store data to systematically evaluate the model's performance across varying context lengths to see if it degrades with increasing context length.

Unlike traditional Needle-in-a-Haystack (NIAH) tests, this experiment uses a real-world dataset where all information is potentially relevant. It tests the model's ability to synthesize and reason across a large, cohesive body of information, rather than simply retrieving irrelevant pieces of data. When the needles are irrelevant to the haystack, the challenge becomes more of an anomaly detection problem. This experiment is meant to give us a sense of how confidently we can trust answers from an LLM in long context use cases, like asking questions about your company data.

Overview

The main notebook, Context_Length_AppStoreV2.ipynb, includes:

Data collection and preparation using the App Store Apple Data Set
Implementation of evaluation and prediction functions
Experiment setup to test Gemini 1.5 Flash across increasing context lengths
Visualization of test results

Requirements

To run this experiment, you'll need:

A Langsmith account and API Key
A Google AI Studio API key
An OpenAI API key

Setup

Clone this repository
Create a copy of the .env.sample file and save it as .env
Add your API keys to the .env file
Install the required libraries:

pip install -qU pandas tiktoken langchain langchain-openai langchain-google-genai matplotlib langsmith python-dotenv seaborn

Running the Experiment

Open the Context_Length_AppStoreV2.ipynb notebook and run the cells sequentially. The notebook will guide you through:

Data preparation
Setting up the evaluation dataset
Implementing the evaluation and prediction functions
Running the experiment across different context lengths
Visualizing the results

Results

Gemini 1.5 performed at 100% accuracy across all context lengths tested!

View Test Results on LangSmith

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.env.sample		.env.sample
.gitattributes		.gitattributes
.gitignore		.gitignore
Context_Length_AppStoreV2.ipynb		Context_Length_AppStoreV2.ipynb
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Length Experiment: Evaluating Gemini 1.5 Flash

Overview

Requirements

Setup

Running the Experiment

Results

About

Languages

camronh/ContextLength-Experiment

Folders and files

Latest commit

History

Repository files navigation

Context Length Experiment: Evaluating Gemini 1.5 Flash

Overview

Requirements

Setup

Running the Experiment

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages