template.html

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <title>GPT-4o Checkup</title>

        <link rel="icon" href="./assets/logomark.png" />
        <link rel="stylesheet" type="text/css" href="./styles.css" media="screen" />
        <link href="https://fonts.googleapis.com/css2?family=Space+Mono" rel="stylesheet" type="text/css" />

        <meta name="description" content="A collection of experiments measuring the performance of GPT-4o on vision tasks over time." />
        <meta name="og:title" content="GPT-4o Checkup" />
        <meta name="og:description" content="A collection of experiments measuring the performance of GPT-4o on vision tasks over time." />
        <meta name="og:image" content="https://gptcheckup.com/banner.png" />

        <meta name="twitter:card" content="summary_large_image" />

        <!-- Google tag (gtag.js) -->
        <script async src="https://www.googletagmanager.com/gtag/js?id=G-S0F5Y25KSC"></script>
        <!-- Font Awesome Icons -->
        <script src="https://kit.fontawesome.com/1728f0d465.js" crossorigin="anonymous"></script>

        <script>
            window.dataLayer = window.dataLayer || [];
            function gtag() {
                dataLayer.push(arguments);
            }
            gtag("js", new Date());
            gtag("config", "G-S0F5Y25KSC");
        </script>
    </head>
    <body>
        <div class="graph_paper">
            <header>
                <h1>How's GPT-4o Doing?</h1>
                <div class="header_text">
                    <p>This website measures how <a href="https://platform.openai.com/docs/models/gpt-4o">GPT-4o</a> performs across a range of experiments.</p>
                    <p>We test tasks we know GPT-4o performs well at (i.e. classification) to measure regressions, as well as tasks GPT-4o struggles with (i.e. odometer OCR) to measure performance improvements and changes.</p>
                    <p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
                </div>
                <div class="header_subtitle">
                    <p>Tests are run every day at 1am PT. Last updated {{ date }}.</p>
                    <p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
                </div>
                <div class="header_cta">
                    <div class="button_row">
                        <a href="#methodology" class="button"><i class="far fa-tools"></i>Learn about our methodology</a>
                        <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute" class="button"><i class="far fa-plus"></i>Contribute a test</a>
                    </div>
                    <a class="github-button" href="https://github.com/roboflow/gpt-checkup" data-icon="octicon-star" aria-label="Star roboflow/gpt4v-monitor on GitHub">Star</a>
                </div>
            </header>
            <main>
                <div class="main_content">
                    <section class="feature_card_wide">
                        <img src="./assets/lenny.svg" id="lenny" />
                        <div class="feature_header" style="min-height: auto">
                            <div class="feature_header_text" style="gap: var(--spacing-sizing-4)">
                                <h2>Response Time</h2>
                                <p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>{{info['average_time']}} seconds</b> per request.</p>
                                <p class="subtitle">This number only accounts for requests made by this application.</p>
                            </div>
                            <div class="chart">
                                <div class="chart_box chart_box_green">
                                    <p>{{info["average_time"]}} s</p>
                                </div>
                            </div>
                        </div>
                    </section>
                    <section class="tests_failing">
                        <div class="test_group_header">
                            <h1><i class="fad fa-exclamation-circle fa-spin" style="--fa-primary-color: #ef4444; --fa-secondary-color: #ef4444; --fa-secondary-opacity: 0.3"></i> Today's Failing Tests</h1>
                        </div>
                        <section class="feature_cards" id="failing_cards">
                            {% for test_id, test_data in results.items() %} {% if current_results[test_id].success == False %}
                            <div class="feature_card">
                                <div class="feature_header">
                                    <div class="feature_header_text">
                                        <h2>{{ test_data.name }}</h2>
                                        <p>{{ test_data.question }}</p>
                                    </div>
                                    <div class="chart">
                                        <div class="chart_box chart_box_red">
                                            <p>Fail</p>
                                        </div>
                                    </div>
                                </div>
                                <div class="result_summary">
                                    <div class="summary_row">
                                        <b class="summary_title">Last {{test_data["seven_day"]["score"]|length}}-Day Performance</b>
                                        <div class="summary_squares">
                                            {% for item in test_data["seven_day"]["success"] %}
                                            <div class="summary_square {% if item %}summary_square_green{% else %}summary_square_red{% endif %}"></div>
                                            {% endfor %}
                                        </div>
                                    </div>
                                    <p class="result_text">Of the last {{test_data["seven_day"]["score"]|length}} tests, conducted daily, this test has passed <b>{{ test_data["seven_day"]["success_percent"] }}%</b> of the time.</p>
                                    <p class="request_price"><i class="far fa-coins"></i>Today's request cost ${{current_results[test_id].price|round(3)}}</p>
                                </div>
                                <div class="explainer_dropdown">
                                    <button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
                                    <div class="explainer">
                                        <h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
                                        <pre class="test_method">{{ test_data.method }}</pre>
                                        <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
                                        <pre class="prompt">
                                            {{ test_data.prompt }}
                                        </pre>
                                        <h3><span class="explainer_icon far fa-image"></span>Image</h3>
                                        <img class="test_image" src="{{ test_data.image }}" alt="Image of the input into GPT-4" />
                                        <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
                                        <pre>{{current_results[test_id].result}}</pre>
                                        <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="{{ test_data.author_url }}" target="_blank">{{ test_data.author_name }}</a></p>
                                    </div>
                                </div>
                            </div>
                            {% endif %} {% endfor %}
                        </section>
                    </section>
                    <section class="tests_passing">
                        <div class="test_group_header">
                            <h1><i class="fad fa-check-circle" style="--fa-primary-color: #10b981; --fa-secondary-color: #10b981; --fa-secondary-opacity: 0.3"></i> Today's Passing Tests</h1>
                            <button type="button" id="show_passing_btn">Hide</button>
                        </div>
                        <section class="feature_cards" id="passing_cards">
                            {% for test_id, test_data in results.items() %} {% if current_results[test_id].success == True %}
                            <div class="feature_card">
                                <div class="feature_header">
                                    <div class="feature_header_text">
                                        <h2>{{ test_data.name }}</h2>
                                        <p>{{ test_data.question }}</p>
                                    </div>
                                    <div class="chart">
                                        <div class="chart_box chart_box_green">
                                            <p>Pass</p>
                                        </div>
                                    </div>
                                </div>
                                <div class="result_summary">
                                    <div class="summary_row">
                                        <b class="summary_title">Last {{test_data["seven_day"]["score"]|length}}-Day Performance</b>
                                        <div class="summary_squares">
                                            {% for item in test_data["seven_day"]["success"] %}
                                            <div class="summary_square {% if item %}summary_square_green{% else %}summary_square_red{% endif %}"></div>
                                            {% endfor %}
                                        </div>
                                    </div>
                                    <p class="result_text">Of the last {{test_data["seven_day"]["score"]|length}} tests, conducted daily, this test has passed <b>{{ test_data["seven_day"]["success_percent"] }}%</b> of the time.</p>
                                    <p class="request_price"><i class="far fa-coins"></i>Today's request cost ${{current_results[test_id].price|round(3)}}</p>
                                </div>
                                <div class="explainer_dropdown">
                                    <button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
                                    <div class="explainer">
                                        <h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
                                        <pre class="test_method">{{ test_data.method }}</pre>
                                        <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
                                        <pre class="prompt">
                                            {{ test_data.prompt }}
                                        </pre>
                                        <h3><span class="explainer_icon far fa-image"></span>Image</h3>
                                        <img class="test_image" src="{{ test_data.image }}" alt="Image of the input into GPT-4" />
                                        <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
                                        <pre>{{current_results[test_id].result}}</pre>
                                        <p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="{{ test_data.author_url }}" target="_blank">{{ test_data.author_name }}</a></p>
                                    </div>
                                </div>
                            </div>
                            {% endif %} {% endfor %}
                        </section>
                    </section>
                    <section class="feature_card_wide" id="methodology">
                        <div class="feature_header_text">
                            <h2>Methodology</h2>
                            <p>How we built this project</p>
                        </div>
                        <div class="methodology_text">
                            <p>Every day, we run a set of tests to evaluate how GPT-4o, a multimodal model with vision capabilities, performs over time at vision tasks. These tests are designed to monitor core features of GPT-4o.</p>
                            <p>Each test runs the same prompt and image through GPT-4o and compares the result to a human-written result. While making this website, we experimented with prompts and chose the prompt that gave the most accurate results.</p>
                            <p>There may be other prompts that can solve a given query. With that said, we cannot test every possible prompt. This site is designed to act as a reference; different prompts may achieve better or worse results.</p>
                            <p>Tests are run at 1am PT every day. This site is updated when all tests are complete.</p>
                            <h3>Model Changes</h3>
                            <p>From December 2023 to July 8th, 2024, this project tracked GPT-4 with Vision (GPT-4V). On July 8th, 2024, we transitioned to tracking GPT-4o, so we can track the latest multimodal model from OpenAI.</p>
                        </div>
                    </section>
                    <section class="feature_card_wide">
                        <div class="feature_header_text">
                            <h2>Related Links</h2>
                            <p>Want to see more interesting projects using GPT-4o and its predecessor, GPT-4 with Vision?</p>
                        </div>
                        <div class="link_cards">
                            <a href="https://blog.roboflow.com/gpt-4o-vision-use-cases/" target="_blank" class="link_card">
                                <img src="https://blog.roboflow.com/content/images/size/w2000/2024/05/Blog-Image-Template--75-.jpg" alt="GPT-4o: The Comprehensive Guide and Explanation" />
                                <h3>GPT-4o: The Comprehensive Guide and Explanation</h3>
                                <p class="subtitle">Learn what GPT-4o is, how it differs from previous models, evaluate its performance, and use cases for GPT-4o.</p>
                                <div class="tag"><i class="fad fa-star fa-xs"></i>Start Here</div>
                            </a>
                            <a href="https://blog.roboflow.com/gpt-4-vision/" target="_blank" class="link_card">
                                <img src="./images/GPT-4_with_Vision.jpeg" alt="GPT-4 with Vision: Complete Guide and Evaluation" />
                                <h3>GPT-4 with Vision: Complete Guide and Evaluation</h3>
                                <p class="subtitle">In this guide, we share our first impressions with the GPT-4 image input feature and vision API. We run through a series of experiments to test the functionality of GPT-4 with vision, showing where the model performs well and where it struggles.</p>
                            </a>
                            <a href="https://blog.roboflow.com/GPT-4o-object-detection/" target="_blank" class="link_card">
                                <img src="./images/GPT-4V Object Detection.jpg" alt="Experiments with GPT-4o for Object Detection" />
                                <h3>Experiments with GPT-4 with Vision for Object Detection</h3>
                                <p class="subtitle">In this guide, we show our results experimenting with GPT-4o for object detection. We also talk about why fine-tuned models are more appropriate for object detection, providing more context into the question “how will GPT impact object detection?”</p>
                            </a>
                        </div>
                    </section>
                </div>
            </main>
            <footer>
                <p>This project is not affiliated with OpenAI.</p>
            </footer>
            <script async defer src="https://buttons.github.io/buttons.js"></script>
            <script type="text/javascript" src="index.js"></script>
        </div>
    </body>
</html>