← WorkAI Agent Infrastructure · 2025

Thirteen Win32 actions over a private Tailscale tunnel — so AI agents can operate a Windows desktop.

Hermes Agent

13desktop actions — screenshot to drag-and-drop

0public ports exposed

Challenge

Cloud AI agents have no native way to interact with a Windows desktop. Tasks that require a GUI are completely out of reach for server-side agents — filling in an enterprise portal, reading a locally-installed dashboard, or clicking through a Windows-only workflow. The macOS computer-use driver existed; Windows had no equivalent.

Exposing a local Windows machine to a remote EC2 server is a networking problem with no clean solution. Port forwarding requires router access and a static IP. SSH tunnels are fragile. Any approach that opened a port directly on the public internet was a non-starter. The connection layer needed to be encrypted, authenticated, and NAT-traversal capable with zero router configuration.

AI computer-use is only as good as its targeting precision. A screenshot plus a best-guess pixel coordinate fails on dense, dynamically-laid-out UIs. The agent needed to see interactive elements by name and role — not just pixels — so it could click a button by label rather than by coordinate, and handle layouts that change between runs.

Before the Windows agent

AI agent runs on EC2 server
Cannot interact with Windows desktop at all
GUI tasks completely out of reach
No secure tunnel to the local machine
Only API-accessible software automatable

0Windows tasks any AI agent could complete

With the Hermes Windows agent

13 desktop actions as tool calls
Tailscale tunnel — zero public portsencrypted
Click by label, not pixel coordinate

13 actionsacross any enrolled Windows desktop

Solution

Any Hermes agent conversation can now reach a Windows desktop. The agent issues a tool call — screenshot, click, type, scroll, focus window — and the action executes on the target machine within milliseconds, routed through a Tailscale tunnel that requires no router configuration, no port forwarding, and exposes zero public ports. The Windows machine is completely unreachable from the public internet; only enrolled Tailscale peers can connect.

The agent sees what a user would see. Every screenshot request can return a full list of interactive UI elements — their names, roles, and on-screen positions — so the agent clicks a button by its label rather than hunting for pixel coordinates. Non-ASCII text types correctly. Windows that are minimised or hidden behind other windows still capture. Drags animate across multiple steps to prevent frame-skip failures.

Thirteen actions are available as first-class tools in any Hermes conversation: capture, screenshot, click, double-click, right-click, type, key, scroll, drag, list apps, list windows, focus app, and set-of-marks element targeting. The agent can open applications, navigate interfaces, fill forms, and extract information from Windows desktop software with no public API — making the class of automatable tasks dramatically larger.

In any Hermes agent conversation, the tool enables

Capture & See

Full-screen screenshot
UI element tree (14 types)
Element names & bounding boxes
Works on hidden windows

Click & Interact

Click, double-click, right-click
Type text (Unicode-safe)
Key combos (Ctrl+C, Alt+Tab)
Scroll and drag-and-drop

Navigate & Manage

List running applications
Focus any window by title
Set-of-Marks element targeting
Tailscale-only — zero public ports

Results

13Desktop actionsscreenshot · click · type · scroll · drag · window management · element targeting

0Public portsagent binds to Tailscale interface only — no firewall rules, no port forwarding

SOMElement targeting14 UI control types, 15 levels deep — click by label, not coordinate

2.2kLines of PythonWin32 input, UI Automation, Tailscale, HTTP server, and CLI

What we built

Python 3.11+Win32 SendInput (ctypes)Microsoft UI Automation (COM)mss (screen capture)Pillow (JPEG encoding)Tailscale / WireGuardpsutilClick (CLI framework)Hermes Agent (tool registry)

Want to find the safest AI workflow in your business?

Book an AI discovery call ↗