O.P.E.R.A.T.O.R
OPERATOR is Dexter’s forthcoming browser and device automation agent, slated for launch in Q3 2025
The Future of Task Automation & Computer Interaction
Table of Contents
Introduction
Core Concepts
Getting Started
User Guide
Core Architecture
Core Features
Data
Troubleshooting
FAQ
Roadmap
1. Introduction
What is O.P.E.R.A.T.O.R.?
O.P.E.R.A.T.O.R. is a pioneering Device and Browser use task automation agent, enabling cross platform tasking through natural language.
Why O.P.E.R.A.T.O.R.?
Task Automation in our daily lives is inevitable at this point. In today's digital landscape, we face several challenges:
Application Overload: Dozens of specialized tools, each with its daunting navigation process
Cognitive Overhead: Constant context switching between applications
Automation Complexity: Traditional automation requires programming knowledge
Data Silos: Information trapped in different applications or devices strictly
O.P.E.R.A.T.O.R. solves these challenges by leveraging Visual Grounding Technology provided through Visual Language Models to enable users to go beyond their normal tasking AI chatbox. Providing a unified, intelligent interface to your entire digital world. Allowing you to write repeatable scripts to carry out tasks, regardless of the platform or device to interact with.
This is all powered by the ground-breaking advancement of computer vision models like UI-TARS and CUA which are the foundations which our agent was built on.
2. Core Concepts
Natural Language Versus GUI Interfacing
O.P.E.R.A.T.O.R. relies on both LLM (Large Language Models) and VLM (Visual Language Models) to run tasks, both model classes accept natural language input and turn it into actionable commands behind the scenes.
LLMs are model like GPT-4o which accept large data input and support reasoning, including visual data.
VLMs are Visually Grounded Models that operate based on automated GUI interactions. For example the popular one, UI-TARS model developed by ByteDance. It's a free and open-source AI model designed for computer control and automation (from official site).
This is new technology and still being improved by the day. But there are already sufficient tools to incorporate in the development process and experiment with making production ready products.
Example Commands:
"Schedule a meeting with the design team tomorrow at 2pm"
"Find all documents related to Project Phoenix from last month"
"Create a presentation about our Q2 results using the latest sales data"
Neural Flow Visualization
The Neural Flow interface provides a real-time, interactive visualization of task execution. Users can:
Monitor task progress visually
Understand how different components interact
Debug and optimize workflows
Gain insights into system performance
Multi-Model AI Integration
O.P.E.R.A.T.O.R. leverages multiple AI models, each selected based on the specific requirements of the task at hand:
GPT-4o
General reasoning, creative tasks
Advanced comprehension, code generation
Gemini
Multimodal tasks, visual understanding
Image and text processing
Claude
Complex analysis, document processing
Large context windows, strong reasoning
Qwen
Multilingual support, coding tasks
Strong performance in non-English languages
3. Getting Started
O.PE.R.A.T.O.R is launching a demo app to showcase the platform. This a semi-production ready AI Agent designed to offer a familiar interface whilst incorporating browser and device use.
Check out the Pitch Deck for a great visual outline.
To use the App, you can access the link when available directly from our main website:
To use our hosted app - You need a simple website and password to create a account
To use it from local hosting - Download the package and npm install it before npm run dev --force
4. User Guide
O.PE.R.A.T.O.R comes with an extensive Guide Modal in the app. Go check it out.
5. Core Architecture
System Overview
O.P.E.R.A.T.O.R. is built on a micro services architecture designed for scalability and reliability. The system is composed of several interconnected components that work together to process user requests and execute tasks.
Component Diagram
┌─────────────────────────────────────────────────────────────┐
│ User Interface Layer │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Command │ │ Neural Flow │ │ Settings & │ │
│ │ Center │ │ Visualization│ │ Configuration │ │
│ └─────────────┘ └──────────────┘ └────────────────┘ │
└───────────────┬───────────────────────┬─────────────────────┘
│ │
v v
┌─────────────────────────────────────────────────────────── ─┐
│ Application Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐ │
│ │ Task │ │ Context │ │ Model │ │
│ │ Orchestrator│ │ Manager │ │ Router │ │
│ └─────────────┘ └─────────────┘ └────────────────┘ │
└───────────────┬───────────────────────┬─────────────────────┘
│ │
v v
┌─────────────────────────────────────────────────────────────┐
│ Service Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐ │
│ │ AI │ │ Storage │ │ Integration │ │
│ │ Services │ │ Services │ │ Services │ │
│ └─────────────┘ └─────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
6. Features
Core functionality
1. Chat
Basic chat support
2. Task Execution
Handles task orchestration. turning a natural language command into a Task with a goal and trackers for the: steps planned, current intent, current page and resources, current navigable elements and information in view, last step results, and more.
Beyond orchestration it keeps track of all data and saves it, including sending user websocket events on the task execution in real time.
Here user settings on "Preferred Execution Mode" and "Preferred Model" which you set up in the settings modal, are respected.
3. Browser Control
It integrates Puppeteer and Chromium to enable tools for remote browser use through Visual Language Models
It uses Visual understanding to operate whatever page or window it has open in the browser sessions. Using browser actions and queries, then passing back results to the Task Execution layer or orchestrator.
4. Android Control
Connects any Android Phone through Android ADB or Android Studio SDK and combined with JS, allows the Agent to send commands and receive feedback from the connected device.
Connections are possible through USB cable or Network (IP Address remote or local network)
The connection is OPERATOR to Android Device not the other way round., and is currently in Beta, supported in localhost only due to Sandboxed environment issues and security concerns. We are currently working on a local app to enable public domain remote access through our Web App.
5. PC Control
Connects your local PC to access natively installed applications and run commands in them. Support is limited to a few programs and the technology is highly unstable currently, this will be pushed last.
No connections are needed for this, users simply have to run it in localhost or Dev mode for Beta testing.
6. Reporting
There is a dedicated layer for action logs, query logs, plan logs, step logs, screenshot logs, result logs, and more.
All this data is collected per task in real time until task execution is complete, and two final reports are generated using the compiled logs.
Reports are immediately generated and made available to users on task completion event. These are both available for download to your local machine.
7. YAML Maps
The use cases for cross platform automation are vast, each platform has specific needs which though normally handled by the agent. To achieve more precise and less costly results, users can create simple files which outline steps the agent should execute and save it. Then run it. It will always achieve the same outcome offering high reliability in task automation for a specific task.
The market for automation scripts will be big, both corporate or individuals can benefit from purchasing ready made programs to feed their agents and get things done with no need to program or explain much.
8. History
All chats and task history is saved between the AI agent and the User, with a limit to the size of chats.
All completed task reports are also saved and are easily available from the History Modal. From there, users can scroll their entire report history and choose which one to open, download or re-run.
7. Data Flow
User input (text/voice)
Intent recognition and parsing
Task planning and model selection
Execution across integrated services
Result processing and presentation
Feedback collection and learning
8. Troubleshooting
Common Issues
Authentication Problems
Verify your API keys are correctly configured
Check your internet connection
Ensure your account has the necessary permissions
Task Failures
Check the task logs for error messages
Verify all required parameters were provided
Ensure you have the necessary permissions for the requested operation
Performance Issues
Check system resource usage
Verify your internet connection speed
Clear the application cache if needed
9. Frequently Asked Questions
General
Q: Is my data secure? A: Yes, we use industry-standard encryption and security practices to protect your data.
Q: Can I use O.P.E.R.A.T.O.R. offline? A: No.
Q: Is the code base or GitHub repo public? A: Not yet, full code-base will be available early Q3
Q: How do we know its NOT Vaporware? A: Run any task in the App and wait for reports. Open reports and analyze or watch a first person view of the automation.
Q: Why cant i see a browser page opening? A: In production puppeteer runs in headful mode or head = 'new' this is because we are running the Agent from a Docker Container. It can not access the same hardware as it would on local PC, to see a browser pop-up and magic navigate - download the full app and run it locally. This has no effect of the outcome of results it only affects the user experience.
Q: Can I use O.P.E.R.A.T.O.R. on my phone? A: Yes, you can run the web app from your mobile browser to execute automated browsing tasks, with very limited functionality. The web app interface is meant for PC with full functionality unlocked in localhost currently.
Technical
Q: What language is it written in? A: NodeJs server side and React frontend.
Q: How do I integrate with my existing tools? A: Google and some of the useful service plugins from KATZ! Telegram bot are coming to the Command Center.
10. Roadmap
Q3 2024
Token Launch, Web App launch.
Package and SDK release
Plugins - External Tools from KATZ!
Android device connection Integration completed
Map Store Release
Q4 2024
PC Control local agent
O.P.E.R.A.T.O.R Android application building
Collaboration features
2025
Android App Google Store listing
Main Stream Marketing for everyday use
Expanded language support
Major Bug Squashing for post production
Native Windows App Building
11. SDK Preview
Core Features
Task Execution API: Programmatic access to automation capabilities.
Workflow Builder: Tools for creating and managing custom workflows.
Integration Framework: APIs for connecting third-party applications.
Monitoring & Analytics: Tools for tracking and analyzing automation performance.
Simple interface to avoid rewriting complex engine layers and focus on simple calls like below.
Sample Code
// Initialize the O.P.E.R.A.T.O.R. SDK
const operator = new OperatorClient({
apiKey: 'your-api-key',
models: ['QwenVL', 'gemini'],
context: {
userPreferences: {
language: 'en',
timezone: 'UTC+2'
}
}
});
// Create a new task
const task = await operator.createTask({
description: "Go Kart shopping",
task: [
{ command: "go to Amazon and shop for cheap go karts with good reviews. Save 3 products in excel"},
{ mode: "auto-plan"},
]
});
// Monitor task progress
operator.on('taskUpdate', (update) => {
console.log(`Task progress: ${update.progress}%`);
});
10. Why It Matters
Competitive Advantage
Time Efficiency: Reduces manual task time by 70%
Error Reduction: Minimizes human errors through automated execution.
Scalability: Handles complex workflows across multiple applications.
Enterprise Ready: Built with security and compliance in mind.
Market Opportunity
Business Users: Automated workflows for repetitive tasks.
Developers: Streamlined development and deployment processes.
Enterprise: Custom automation solutions for complex operations.
Creative Professionals: Automated asset management and processing. Imagine scheduling a Video editing job.
Last updated