WorkAI Agent Infrastructure · 2025

Thirteen Win32 actions over a private Tailscale tunnel — so AI agents can operate a Windows desktop.

Hermes Agent

13desktop actions — screenshot to drag-and-drop
0public ports exposed

Cloud AI agents have no native way to interact with a Windows desktop. Tasks that require a GUI are completely out of reach for server-side agents — filling in an enterprise portal, reading a locally-installed dashboard, or clicking through a Windows-only workflow. The macOS computer-use driver existed; Windows had no equivalent.

Exposing a local Windows machine to a remote EC2 server is a networking problem with no clean solution. Port forwarding requires router access and a static IP. SSH tunnels are fragile. Any approach that opened a port directly on the public internet was a non-starter. The connection layer needed to be encrypted, authenticated, and NAT-traversal capable with zero router configuration.

AI computer-use is only as good as its targeting precision. A screenshot plus a best-guess pixel coordinate fails on dense, dynamically-laid-out UIs. The agent needed to see interactive elements by name and role — not just pixels — so it could click a button by label rather than by coordinate, and handle layouts that change between runs.

Before the Windows agent
  • AI agent runs on EC2 server
  • Cannot interact with Windows desktop at all
  • GUI tasks completely out of reach
  • No secure tunnel to the local machine
  • Only API-accessible software automatable
0Windows tasks any AI agent could complete
With the Hermes Windows agent
  • 13 desktop actions as tool calls
  • Tailscale tunnel — zero public portsencrypted
  • Click by label, not pixel coordinate
13 actionsacross any enrolled Windows desktop

Any Hermes agent conversation can now reach a Windows desktop. The agent issues a tool call — screenshot, click, type, scroll, focus window — and the action executes on the target machine within milliseconds, routed through a Tailscale tunnel that requires no router configuration, no port forwarding, and exposes zero public ports. The Windows machine is completely unreachable from the public internet; only enrolled Tailscale peers can connect.

The agent sees what a user would see. Every screenshot request can return a full list of interactive UI elements — their names, roles, and on-screen positions — so the agent clicks a button by its label rather than hunting for pixel coordinates. Non-ASCII text types correctly. Windows that are minimised or hidden behind other windows still capture. Drags animate across multiple steps to prevent frame-skip failures.

Thirteen actions are available as first-class tools in any Hermes conversation: capture, screenshot, click, double-click, right-click, type, key, scroll, drag, list apps, list windows, focus app, and set-of-marks element targeting. The agent can open applications, navigate interfaces, fill forms, and extract information from Windows desktop software with no public API — making the class of automatable tasks dramatically larger.

In any Hermes agent conversation, the tool enables
Capture & See
  • Full-screen screenshot
  • UI element tree (14 types)
  • Element names & bounding boxes
  • Works on hidden windows
Click & Interact
  • Click, double-click, right-click
  • Type text (Unicode-safe)
  • Key combos (Ctrl+C, Alt+Tab)
  • Scroll and drag-and-drop
Navigate & Manage
  • List running applications
  • Focus any window by title
  • Set-of-Marks element targeting
  • Tailscale-only — zero public ports
13Desktop actionsscreenshot · click · type · scroll · drag · window management · element targeting
0Public portsagent binds to Tailscale interface only — no firewall rules, no port forwarding
SOMElement targeting14 UI control types, 15 levels deep — click by label, not coordinate
2.2kLines of PythonWin32 input, UI Automation, Tailscale, HTTP server, and CLI
Python 3.11+Win32 SendInput (ctypes)Microsoft UI Automation (COM)mss (screen capture)Pillow (JPEG encoding)Tailscale / WireGuardpsutilClick (CLI framework)Hermes Agent (tool registry)

Ready to build something similar?

Book a discovery meeting