The first agent-readable screen recorder

Screen recordings your AI agent can act on

Someone records a bug, a request, a walkthrough — and shares one link. A teammate watches it. Their agent reads it: the summary, the exact moments they pointed at (with frames), and the transcript. Then it fixes the thing.

1Record
EXPORT

“when I click this, it breaks”

2Share one link

clipy.online/video/abc123

clipy.online/video/abc123.md

Same link. Humans get the video, agents get the document.

3Agent reads it
Summary — export spinner never resolves
Frame 0:06 — the exact button, click at 63%, 41%
[0:06] “for example this section…”
4Fix ships

$ agent: located ExportButton.tsx

$ fixed stuck spinner state

✓ PR #124 opened — you review

Record share agent acts you verify. No tickets, no re-explaining.

Zero setup: every link works on agents

Append .md to any public Clipy watch link and it serves a markdown context document — no MCP install, no API key. Paste it into Claude, Cursor, ChatGPT, or any agent that can fetch a URL.

https://clipy.online/video/<id>.md

# UI Recording Feedback
Screen recording · 1:13 · watch: https://clipy.online/video/cey8aix0…

## Summary
The speaker reviews a recording interface, pointing out a duplicated
timer and a settings section that should be collapsed…

## Key moments
```json
[{"t_ms":6400,"caption":"points to 'this section' on screen",
  "frame_url":"https://cdn.clipy.online/key-moments/…/6400.jpg",
  "x":0.63,"y":0.41,"source":"fused","confidence":0.9}, …]
```
### 0:06 — points to 'this section' on screen
![Frame at 0:06](https://cdn.clipy.online/key-moments/…/6400.jpg)

## Transcript
```
[0:01] Hey great. So there are a few things I'd like you to update.
[0:06] For example this section…
```

Key moments: what the speaker actually pointed at

Transcripts alone can't resolve “when I click this button”. Clipy finds every spoken pointer, extracts the video frame at that instant, and — when a click track exists (Mac app recordings, and Chrome-extension tab recordings) — fuses the real click coordinates. Agents see the frame as an image and know exactly which button “this” was.

  • What was said

    “clicks the export button; the spinner never stops” — the caption tells the agent what to look for.

  • What was shown

    The frame at that moment, delivered as an actual image the agent's vision reads.

  • Where they pointed

    Click coordinates as frame fractions (Mac app + extension tab recordings) — no ambiguity, even with three Export buttons.

The MCP server: full access for wired agents

For Claude Code, Claude Desktop, Cursor, Cline, and any MCP client — search your library, read any recording, and get frames inline. Read-only, authenticated with a personal API key.

npx -y @clipy/mcp   # CLIPY_API_KEY from clipy.online/settings/api-keys
ToolWhat it returns
get_agent_contextOne-call bundle: metadata + summary + key moments (frames inline as images) + transcript. Start here.
get_key_momentsThe visual pointers: what the speaker pointed at, the frame at that instant, click coordinates when a click track exists (Mac app + Chrome-extension tab recordings).
get_transcriptFull timestamped transcript (segments + plaintext).
get_summaryAI summary: TL;DR, key points, action items.
search_recordingsFind recordings by keyword.
list_recordingsList recent recordings.
get_recordingMetadata + processing status for one recording.
wait_for_artifactsPoll until the transcript/summary are ready.
download_recordingPull the MP4 locally to clip or extract frames with your own tools.

Full setup + reference at clipy.online/docs/mcp. Package: @clipy/mcp on npm.

Why not just send the agent the video file?

Because today's agents don't watch video. Claude, GPT, and most agent frameworks accept text and still images — not hour-long MP4s. A raw recording is a dead end in an agent's context window. Clipy does the translation server-side: speech becomes a timestamped transcript, the visual pointers become frames (which agents can see), and the whole thing is packaged as text + images — the two formats every agent understands. That's the entire trick, and it's why a plain Clipy link works where a video file doesn't.

What teams use it for

  • Bug reports that fix themselves

    QA records “when I click THIS, it breaks”. The developer's agent reads the frames, greps the codebase for the button it can literally see, and opens the fix — no ticket-writing round trip.

  • PM feedback → implemented changes

    A product manager talks through the UI pointing at what to change. The agent classifies each request, produces a spec, and implements the mechanical parts directly.

  • Customer support escalations

    A customer records their problem once. Support's agent reads the recording, checks it against known issues, and drafts the response — with the exact moment referenced.

  • Code reviews & async standups

    Walkthrough recordings become searchable, quotable artifacts. An agent can answer “what did she say about the auth flow?” with the timestamp and the frame.

Common questions

How do I let Claude or Cursor read a screen recording?

Two ways. Zero-setup: record with Clipy, share the link, and append .md — any agent that can fetch a URL gets the summary, key moments with frames, and transcript. Full access: install the Clipy MCP server (npx -y @clipy/mcp with an API key from clipy.online/settings/api-keys) and the agent can search your whole library, with frames delivered as inline images.

Does this work with private recordings?

Yes — with the right credentials. The .md document follows the exact same access rules as the watch page: public recordings are readable by anyone with the link; private and restricted ones require a signed-in session with access. The MCP server is authenticated with your personal API key and only ever sees your own recordings.

Which agents and tools are supported?

The .md link works with anything that can fetch a URL — Claude, ChatGPT, Perplexity, custom scripts. The MCP server works with every Model Context Protocol client: Claude Code, Claude Desktop, Cursor, Cline, Windsurf, and the growing MCP ecosystem.

What exactly are “key moments”?

Timestamped instants where the speaker referenced something visible — “this button”, “this error”, “watch what happens”. For each one, Clipy extracts the video frame at that moment; on Mac app recordings (and Chrome-extension tab recordings) it also fuses the real click coordinates. It's how an agent resolves “this” to actual pixels.

How much does it cost?

The recorder, share links, transcripts, summaries, key moments, the .md documents, and the MCP server are all free. No watermark, no signup wall for viewers.

Is my recording used to train AI models?

No. Recordings are processed to produce your transcript, summary, and key moments — that's it. See the Clipy Pledge at clipy.online/pledge.

The workflow this unlocks

  1. 1. Anyone records. A PM, a QA, a customer — “when I click THIS, it breaks”. No ticket-writing, no screenshots, no repro steps.
  2. 2. One link travels. The same URL works in Slack for humans and in an agent's context window as .md.
  3. 3. The agent acts. It reads the summary, looks at the pointed-at frames, greps the codebase for the button it can literally see, and ships the fix.