This project automates human-like interactions with web pages using Node.js and Google Chrome. It simulates user behavior to navigate pages, input data, handle clicks, and interprets the page's accessibility tree to create a language model-friendly representation.
-
Install Dependencies:
npm install
-
Run the Script:
node index.js
-
Requirements:
- Google Chrome installed on your system.
- Text Input Simulation: Inputs text character by character, mimicking human typing patterns for natural interaction.
- Standard Click Events: Performs click actions without artificial delays, as clicks are instantaneous in human behavior.
- Preventing New Tabs: Intercepts attempts to open new tabs or windows, forcing all URLs to load in the current tab for a linear navigation flow.
- DevTools Accessibility Tree: Utilizes Chrome's DevTools Protocol to access the page's accessibility tree.
- LLM-friendly Representation: Parses the tree to extract a structured, semantic representation of the page suitable for language models.
An excerpt demonstrating the accessibility tree structure:
RootWebArea[1](focusable, url=https://roame.travel/) Roame.Travel | Limited time award travel deals
generic
main
link[60](focusable, url=https://roame.travel/)
link[106](focusable, url=https://roame.travel/skyview) SkyView Pro
StaticText SkyView
StaticText Pro
link[3](focusable, url=https://roame.travel/discover) Discover
StaticText Discover
...
- Roles and Properties: Elements like
link
,StaticText
, and attributes such as[focusable]
,url
. - Hierarchy: Indentation reflects parent-child relationships between elements.
- Assistant Role: Navigates web pages and performs actions based on the accessibility tree, simulating user interactions to achieve specific tasks.
- Moderator Role: Critiques each action and the overall interaction trajectory, providing feedback to improve the Assistant's performance.
- Message Analysis: Reviews the Assistant's inputs and actions for correctness and efficiency.
- Trajectory Evaluation: Assesses the logic and efficiency of navigation paths.
- Feedback Loop: Communicates improvements to refine future interactions.