swift-mcp-gui

Swift MCP GUI Server: Control macOS with AI models via mouse, keyboard, and scrolling automation.

swift-mcp-gui
swift-mcp-gui Capabilities Showcase

swift-mcp-gui Solution Overview

swift-mcp-gui is an MCP server designed to empower AI models with the ability to interact with macOS applications. By leveraging SwiftAutoGUI, this server provides tools for programmatically controlling the mouse and keyboard, enabling AI to automate tasks within the macOS environment.

Key features include mouse movement and clicks, keyboard input, and scrolling, all accessible through simple MCP tool calls. This allows developers to seamlessly integrate AI-driven automation into macOS workflows. For example, an AI model can use swift-mcp-gui to navigate menus, fill out forms, or interact with graphical interfaces.

The server utilizes a standard MCP client-server architecture and integrates easily with any MCP-compliant client. Installation is straightforward using Swift Package Manager. By providing a secure and reliable interface for AI interaction, swift-mcp-gui unlocks new possibilities for AI-powered automation on macOS.

swift-mcp-gui Key Capabilities

Programmatic Mouse Control

The swift-mcp-gui server provides precise programmatic control over the mouse cursor on macOS. Through the moveMouse tool, AI models can specify exact x and y coordinates to move the cursor to any location on the screen. This functionality is crucial for tasks requiring accurate pointing and clicking, such as interacting with graphical user interfaces or manipulating objects within applications. The server translates these coordinates into native macOS automation commands via SwiftAutoGUI, ensuring seamless and reliable cursor movement.

For example, an AI model could use this feature to automate the process of clicking a specific button in a software application. By first analyzing the screen to determine the button's coordinates and then using the moveMouse tool to position the cursor, the model can then execute a click using the mouseClick tool. This enables end-to-end automation of tasks that would otherwise require manual user interaction. The underlying implementation leverages Swift's capabilities for direct access to macOS accessibility APIs, ensuring efficient and low-latency control.

Automated Keyboard Input

The sendKeys tool enables AI models to send a sequence of keystrokes to the operating system, effectively simulating keyboard input. This feature supports a wide range of keys, including alphanumeric characters, special keys like "command," "control," "option," and function keys. The server interprets the input as an array of strings, allowing for complex key combinations and shortcuts to be executed programmatically. This is essential for tasks such as filling out forms, entering commands, or interacting with applications that rely heavily on keyboard input.

Consider an AI model designed to automate data entry into a spreadsheet application. Using the sendKeys tool, the model can input data into specific cells, navigate between cells using arrow keys, and execute commands like "save" or "copy-paste" using keyboard shortcuts. This eliminates the need for manual data entry, significantly improving efficiency and reducing the risk of human error. The implementation utilizes SwiftAutoGUI to translate the key sequences into native macOS keyboard events, ensuring compatibility with a wide range of applications.

Programmatic Scrolling Control

The scroll tool allows AI models to programmatically control scrolling actions on macOS. This feature supports scrolling in four directions: up, down, left, and right, with the ability to specify the number of "clicks" to scroll. This is particularly useful for interacting with applications that display large amounts of content, such as web browsers, document editors, or code editors. By controlling the scrolling behavior, AI models can navigate through content, locate specific information, and interact with elements that are not immediately visible on the screen.

For instance, an AI model could use this feature to automate the process of searching for a specific keyword within a long document. The model can scroll through the document, analyzing each section for the target keyword. Once the keyword is found, the model can then use other tools, such as mouse movement and clicks, to interact with the surrounding content. The implementation leverages SwiftAutoGUI to simulate scrolling actions, providing a reliable and consistent way to control scrolling behavior across different applications.