Skip to content

LeonGaoHaining/opencowork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

OpenCowork

An open-source desktop AI work system for browser automation, reusable task runs, templates, MCP-native tooling, and real local execution.

stars release license issues website

Why OpenCowork

OpenCowork is built for people who want an agent that does more than chat. It can open websites, operate a headed browser, call CLI tools, run reusable skills, persist task history, and now connect to or expose standard MCP servers.

It is designed for fast iteration on real desktop workflows: research, operations, internal tools, demos, browser automation, and repeatable task execution.

Compared with many "chat-first" agent demos, OpenCowork is moving toward a result-first workflow:

  • every serious task should produce a reusable run record,
  • successful work should be reviewable as a result,
  • useful work should become a template,
  • repeated work should be schedulable or triggerable from IM.

Current Product Direction

The current work stream is converging around a result-centric task model:

  • task runs are recorded as reusable TaskRun records,
  • completed work persists into TaskResult,
  • history is shifting toward outcomes, artifacts, and rerun links,
  • templates can be created from successful runs and executed with parameters,
  • scheduler and IM surfaces now reuse the same task/result semantics.

What's New in v0.12.5

  • Added a first working Hybrid CUA browser runtime with explicit visual execution support.
  • Added a dedicated visual_browser agent tool for complex UI tasks that are not stable with DOM selectors alone.
  • Added approval-aware visual execution with approve-and-continue and takeover flows.
  • Added a visual debug entry point in the desktop UI.
  • Added visual trace review in execution steps, result delivery, task run details, and history.
  • Added regression tests for visual routing, approval continuation, and visual trace rendering.

Highlights in v0.10.10

  • Standard MCP client support for remote streamable-http endpoints such as LangChain Docs MCP.
  • Standard MCP server mode with a /mcp endpoint, while keeping legacy /tools compatibility.
  • A clearer MCP UI split into Clients and Server Mode.
  • Better follow-up continuity across agent turns using thread reuse.
  • Safer long-running conversations by preventing screenshot payloads from blowing up model context.
  • Improved browser search flows with pressEnter support for input actions.
  • Stronger memory, task history, and restore foundations for real multi-step work.

Core Capabilities

Capability What it enables
Desktop Agent Multi-step task execution through a ReAct-style agent
Browser Automation Navigate, click, type, extract, wait, and capture screenshots
Skills Install and run reusable capabilities like ppt-creator
MCP Client Connect external MCP tools and use them inside the agent
MCP Server Expose OpenCowork capabilities to other MCP clients
Task History Persist task results, steps, and recovery state
Task Templates Save successful work as reusable, parameterized task flows
IM File Workflow Send tasks and files through Feishu and receive result files
Vision Analysis OCR and multimodal understanding for local images
Human-in-the-loop Pause, resume, interrupt, and take over tasks
International UI English-first UI with Chinese support

Who This Is For

OpenCowork is a good fit if you are:

  • building a desktop AI copilot with real browser and local execution,
  • evaluating MCP-native agent UX beyond CLI-only demos,
  • automating recurring research, operations, or reporting workflows,
  • experimenting with reusable agent templates and result-centric history,
  • contributing to an open-source desktop agent stack that is still moving fast.

Quick Start

Requirements

  • Node.js 18+
  • npm 9+
  • Python 3.8+ for selected skills
  • A valid LLM API configuration in config/llm.json

Install

git clone https://github.com/LeonGaoHaining/opencowork.git
cd opencowork
npm install

Configure your model

Create config/llm.json:

{
  "provider": "openai",
  "model": "gpt-5.4-mini",
  "apiKey": "your-api-key",
  "baseUrl": "https://api.openai.com/v1",
  "timeout": 60000,
  "maxRetries": 3
}

For image analysis through IM, use a model deployment that supports image input on chat/completions.

Local config safety

  • Keep config/ local to your machine.
  • config/ is git-ignored and should never be committed.
  • Feishu credentials such as config/feishu.json must not be pushed to GitHub.

Run the desktop app

npm run electron:dev

Example Prompts

Open Baidu, search for a company, and summarize what it does.
Create a company overview PPT from the information on the page.
Connect an MCP tool and use it to fetch LangChain docs examples.
Open the generated PPT file.
Turn the successful task into a reusable template and schedule it weekly.

MCP Support

OpenCowork now supports both sides of MCP:

  • As an MCP client, it can connect to standard remote MCP servers.
  • As an MCP server, it can expose tools through a standard /mcp endpoint.

Examples:

  • Connect to https://docs.langchain.com/mcp from the MCP client panel.
  • Enable server mode and expose selected OpenCowork tools to external clients.

Documentation

  • CHANGELOG.md — release history
  • USER_GUIDE.md — product usage guide
  • docs/ARCHITECTURE.md — architecture overview
  • docs/ROADMAP.md — product direction
  • CONTRIBUTING.md — contribution workflow
  • SECURITY.md — security reporting policy

Development

# Main desktop development flow
npm run electron:dev

# Build all targets
npm run build

# Test
npm run test:run

# Lint and format
npm run lint
npm run format

Open Source Status

OpenCowork is moving from an internal fast-iteration agent into a stronger open-source developer product. The current release is best suited for builders who want:

  • a desktop automation foundation,
  • an MCP-native local agent shell,
  • a skill-based extensibility layer,
  • a result-centric task system with reusable templates,
  • and a project that is actively shipping core agent infrastructure.

Current Release Notes

v0.12.5 is the current recommended tag.

  • v0.12.0 introduced the task-result-template workflow convergence.
  • v0.12.1 fixed the missing overview panel files from that release.
  • v0.12.2 adds follow-up stabilization for result delivery, i18n, run scoping, overview safety, and reusable workflow UX.
  • v0.12.3 adds bidirectional Feishu file workflows and real image analysis for IM-driven tasks.
  • v0.12.4 fixes Feishu IM reply routing.
  • v0.12.5 introduces the first working Hybrid CUA feature slice with explicit visual browser execution and persisted visual trace review.

Community

License

Apache-2.0. See LICENSE.

About

OpenCowork- A mordern workspace automation platform with IM intergration

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors