An application leveraging Generative AI (Large Language Models) to assist users in finding relevant information within large documents.
The AIE_demo_playground project demonstrates how Generative AI (Large Language Models) can help users navigate and extract information from comprehensive documents. Through multiple approaches, the application supports both pre-loaded reference documents and user-uploaded materials, making document exploration more intuitive and efficient.
This project explores how Generative AI technology can assist in navigating and extracting information from large documents. Through Large Language Models (LLMs), we investigate methods for enhancing content discovery within comprehensive documents, showcasing how AI can identify and present relevant information in accessible formats. By grounding the LLM's responses in the source document's content, we demonstrate an approach that maintains accuracy while exploring new ways to interact with extensive documentation.
Four different demos were created to showcase different capabilities: (1) RAG Mistral 7B: This demo was created with a chatbot that could answer questions on the sample dataset by grounding an open-source LLM, Mistral 7B, with RAG (Retrieval-Augmented Generation). (2) Custom Document RAG: This demo is identical to the first one, but it allowed users to upload their own documents for RAG, instead of just using the sample dataset. (3) Accessibility Features: This variant leveraged AI to enable accessibility through text to speech and speech to text functionality, allowing more people to interact with the model. (4) LLM without Refinement: This demo featured an LLM operating without RAG on documents, providing a baseline for comparison.
The UI was made with Gradio and Plotly Dash, and evaluation metrics on the response and speed of the model through a LLM-assisted tool named Trulens. This was all while getting familiar with developing applications on the new KMP AWS environment, so that the team would be better prepared for future projects. Collaboratively working with KMP also led opportunities to improve instances being used on AWS.
The Centers for Medicare and Medicaid Services (CMS) has a mission to provide quality health care coverage and promote effective care for Medicare beneficiaries.
AI Explorers aims to: Up-skill and Inform CMS Personnel on AI Capabilities, Create Opportunities for CMS Components to Explore AI, Create Shared Repositories and Best Practices, and Deliver Proof-Of-Concept Implementations to Validate Business Use Case.
A full list of contributors can be found on https://github.com/CMS-Enterprise/AIEresearch/graphs/contributors.
See the README file in the web_app
directory to see how to run the models.
The project is organized into a folders to encapsulate different functionalities, as illustrated below. chat_logs
captures json files created when users interact with the chatbot. data
includes the sample dataset used in the RAG implementation, and can be used to contain other files as well. models
includes files to run models saved in saved_models
. web_app
includes the Plotly Dash website and its associated pages and components.
├── chat_logs ├── data ├── models ├── saved_models └── web_app ├── assets └── pages ├── chatbot ├── doc_upload ├── qa └── tts
Please see aie_demo_playground/requirements.txt
for library requirements and aie_demo_playground/web_app/README
for detailed instructions for running the application. This was run on the following operation system:
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
Each application has its own linting and testing guidelines. Lint and code tests are run on each commit, so linters and tests should be run locally before committing. We have followed 'https://google.github.io/styleguide/pyguide.html' in spirit but have not used a linter to assure adherence.
Thank you for considering contributing to an Open Source project of the US Government! For more information about our contribution guidelines, see CONTRIBUTING.md.
The contents of this repository are managed by OIT\AI Explorers. Those responsible for the code and documentation in this repository can be found in CODEOWNERS.md.
The AIE_demo_playground team is taking a community-first and open source approach to the product development of this tool. We believe government software should be made in the open and be built and licensed such that anyone can download the code, run it themselves without paying money to third parties or using proprietary software, and use it as they will.
We know that we can learn from a wide variety of communities, including those who will use or will be impacted by the tool, who are experts in technology, or who have experience with similar technologies deployed in other spaces. We are dedicated to creating forums for continuous conversation and feedback to help shape the design and development of the tool.
We also recognize capacity building as a key part of involving a diverse open source community. We are doing our best to use accessible language, provide technical and process documents, and offer support to community members with a wide variety of backgrounds and skillsets.
Principles and guidelines for participating in our open source community are can be found in COMMUNITY_GUIDELINES.md. Please read them before joining or starting a conversation in this repo or one of the channels listed below. All community members and participants are expected to adhere to the community guidelines and code of conduct when participating in community spaces including: code repositories, communication channels and venues, and events.
We adhere to the CMS Open Source Policy. If you have any questions, just shoot us an email.
Submit a vulnerability: Vulnerability reports can be submitted through Bugcrowd. Reports may be submitted anonymously. If you share contact information, we will acknowledge receipt of your report within 3 business days.
For more information about our Security, Vulnerability, and Responsible Disclosure Policies, see SECURITY.md.
A Software Bill of Materials (SBOM) is a formal record containing the details and supply chain relationships of various components used in building software.
In the spirit of Executive Order 14028 - Improving the Nation’s Cyber Security, a SBOM for this repository is provided here: https://github.com/CMS-Enterprise/AIEresearch/network/dependencies.
For more information and resources about SBOMs, visit: https://www.cisa.gov/sbom.
This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication as indicated in LICENSE.
All contributions to this project will be released under the CC0 dedication. By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.
- This project used Python version 3.11.7
- This project was run on AWS using an Amazon Linux instance. OpenSSL 1.1 had to be installed.
- An OpenAI API key is needed to run TrueLens as well as the text-to-speech and speech-to-text models. There are sections in the scripts for users to input their own key.
- A Hugging Face token is needed to use the Mistral model.