
The integration tax on agent media generation, and how MCP unmakes it
The MCP server I shipped this week is small. It's around 800 lines of TypeScript. It exposes three tools: music generation, image generation, video generation. Each one accepts a prompt and a few common parameters and returns a hosted URL when the artifact is ready. From the agent's perspective there is no provider concept at all. The agent says "generate a cinematic 5-second video of a dragon in a city street" and gets a video. It doesn't pick Kling versus Hailuo versus Seedance, it doesn't choose fast or quality variants, it doesn't deal with polling.
Read More

















