OpenClaw v2026.4.23: Native Image Generation and Reference-Image Editing Arrive for AI Agents

Q: How does reference-image editing work technically?

Reference-image editing passes a base64-encoded image alongside your prompt to the provider's editing endpoint. OpenClaw handles the encoding, MIME type detection, and multipart form construction. The agent can request specific edits like style transfer, object removal, or variation generation while maintaining visual consistency with the source image. Results return as URLs or base64 data depending on your output format hint.

Q: What happens if an image generation times out?

OpenClaw v2026.4.23 introduces optional per-call timeoutMs for generation tools. If a request exceeds this limit, the tool returns a timeout error to the agent, which can decide to retry, reduce quality settings, or fall back to alternative providers. The default system timeout remains 30 seconds, but you can extend to 300 seconds for high-quality generations that need more processing time.

Q: Is there a cost difference between OpenAI and OpenRouter for image generation?

Yes. OpenAI charges per image based on resolution and quality tier (standard vs HD), typically $0.02 to $0.08 per image. OpenRouter aggregates multiple providers with varying prices, some offering lower rates for similar quality. OpenRouter also supports open-source models that run on cheaper infrastructure. Check current pricing before deploying high-volume agents, as image costs accumulate faster than text tokens.

OpenClaw v2026.4.23 shipped on April 23, 2026, bringing native image generation and reference-image editing directly into the agent framework through OpenAI and OpenRouter providers. You no longer need separate API keys or external service wrappers to generate images. The update enables agents to call openai/gpt-image-2 via Codex OAuth without an OPENAI_API_KEY environment variable, and OpenRouter image models work natively with your existing OPENROUTER_API_KEY. Beyond the multimodal capabilities, this release introduces forked context inheritance for subagent sessions, configurable timeout controls for generation tools, and tunable local embedding context sizes. These changes eliminate friction between text reasoning and visual creation, letting your agents generate UI mockups, data visualizations, and edited assets as part of their standard tool chain.

What Exactly Changed in OpenClaw v2026.4.23?

The changelog for v2026.4.23 focuses heavily on visual capabilities. OpenAI provider integration now supports image generation and reference-image editing through Codex OAuth, fixing issue #70703. This means you can configure openai/gpt-image-2 as a model, and the framework handles authentication automatically if you have Codex access enabled. OpenRouter received similar treatment, with image generation and editing support added via the image_generate tool, resolving issues #55066 and #67668. The tool itself gained new parameters for quality hints, output formats, and OpenAI-specific options like background transparency and moderation levels. Agents can now spawn subagents with forked context to inherit conversation history while running parallel image workflows. Memory-constrained hosts benefit from configurable memorySearch.local.contextSize settings, defaulting to 4096 tokens but tunable downward.

This release represents a significant advancement in OpenClaw’s multimodal capabilities, moving beyond text-centric operations to fully integrate visual content creation and manipulation. These improvements allow for more sophisticated agent designs, where visual elements are not just outputs but integral parts of the agent’s reasoning and interaction loop. The unified approach to authentication for OpenAI and OpenRouter simplifies deployment and management for developers, paving the way for more seamless integration of AI-driven visual tasks into existing and new agent applications.

Why Native Image Generation Changes Your Agent Architecture

Before this release, agents that needed images had to shell out to Python scripts, manage separate HTTP clients, or integrate third-party services outside the OpenClaw ecosystem. This created context fragmentation. Your agent would lose state when handing off to external image generators, and you’d need custom code to feed those images back into the conversation. Now the image_generate tool operates within the same session context as your text-based reasoning. An agent can generate a chart, analyze it, request modifications, and store the final result without leaving the OpenClaw runtime. This tight integration enables workflows like automated UI testing screenshots, dynamic social media asset creation, and iterative design refinement where the agent acts as both creative director and production assistant.

The ability to generate images natively within the agent framework streamlines complex processes that previously required multiple disparate tools and manual intervention. For instance, a marketing agent could generate several variations of an ad banner, analyze their potential engagement using a separate vision model, and then refine the most promising options, all within a single, continuous workflow. This reduces latency, improves efficiency, and allows for more dynamic and adaptive agent behaviors, transforming agents from mere text processors into versatile creative partners.

How OpenAI Codex OAuth Eliminates API Key Management

The most significant technical shift involves authentication. Previously, using OpenAI’s image models required an OPENAI_API_KEY environment variable with proper billing setup. OpenClaw v2026.4.23 leverages the existing Codex OAuth flow, meaning if your agent already uses Codex for code execution, you can generate images through the same authentication channel. The framework exchanges tokens internally, refreshing them as needed. This reduces credential surface area and removes the need to manage multiple API keys for different OpenAI capabilities. You configure the provider once in your claw.config.json or environment, and both text and image models work through that single authentication pathway. This fixes the long-standing issue #70703 that blocked teams from using image generation in Codex-enabled deployments.

This unified authentication approach simplifies security and operational overhead. Developers no longer need to worry about securely storing and rotating multiple API keys, reducing the risk of credentials compromise. It also makes it easier to onboard new projects and scale existing ones, as the authentication configuration for multimodal capabilities is consistent with existing Codex deployments. This enhancement is particularly beneficial for enterprise users who prioritize streamlined security practices and reduced administrative burden.

OpenRouter Integration: One Key for Text and Images

OpenRouter users get equal treatment in this release. The image_generate tool now works with your existing OPENROUTER_API_KEY, supporting any image model available through the OpenRouter platform. This includes GPT-Image-2, Stability AI’s models, and various open-source diffusion models. The implementation respects OpenRouter’s unified API structure, meaning you can switch between image providers by changing the model string in your agent configuration without touching authentication logic. Contributor @notamicrodose pushed the changes through PR #67668, ensuring that OpenRouter’s image endpoints integrate cleanly with OpenClaw’s tool schema. You get the same quality hints and format controls regardless of which underlying provider OpenRouter routes to.

The seamless integration with OpenRouter offers unparalleled flexibility for developers. Agents can leverage a diverse ecosystem of image generation models, from proprietary high-fidelity options to cost-effective open-source alternatives, all managed under a single API key. This allows for experimentation with different visual styles and capabilities without significant changes to the agent’s core logic or infrastructure. The ability to dynamically select models based on project requirements, cost constraints, or specific artistic needs empowers developers to build more adaptable and resource-efficient AI agents.

Reference-Image Editing: Technical Implementation Details

Reference-image editing allows agents to upload existing images and request modifications. Technically, this works by base64-encoding the source image and passing it alongside text prompts to the provider’s editing endpoint. OpenClaw handles MIME type detection, ensuring PNGs and JPEGs route correctly. The agent can specify operations like style transfer, object replacement, or background removal through natural language prompts. For OpenAI, this uses the GPT-Image-2 editing capabilities. OpenRouter passes the request to whatever editing endpoint the selected model supports. The tool returns either a URL to the hosted image or base64 data depending on your output_format hint. This enables workflows like brand asset variation generation, where an agent takes a logo and produces sized variants for different platforms while maintaining color accuracy.

This capability unlocks a new dimension of creative control for AI agents. Instead of generating images from scratch, agents can now intelligently modify existing visual assets, making them invaluable for tasks such as graphic design, content localization, and personalized marketing. The underlying technical mechanism ensures that the original image’s context and visual integrity are preserved during the editing process, allowing for precise and consistent modifications. This feature significantly enhances the utility of OpenClaw agents in visually intensive domains.

Comparing Provider Support: OpenAI vs OpenRouter

Choosing between OpenAI and OpenRouter for image generation depends on specific project requirements, cost considerations, and desired control levels. Both offer robust capabilities, but with distinct advantages. OpenAI provides a premium, tightly integrated experience, especially for users already within the Codex ecosystem, while OpenRouter offers broader model access and greater cost flexibility.

Feature	OpenAI (Codex OAuth)	OpenRouter (API Key)	Considerations
Authentication	Codex OAuth flow	OPENROUTER_API_KEY	OpenAI offers single sign-on for existing Codex users, simplifying credential management. OpenRouter requires a dedicated API key.
Primary Model	gpt-image-2	Multiple (gpt-image-2, SD3, etc.)	OpenAI focuses on its flagship model. OpenRouter provides access to a wider array of models, including open-source options.
Quality Hints	standard, hd	Provider-dependent	OpenAI offers distinct quality tiers. OpenRouter’s quality options vary based on the specific model selected.
Format Support	png, jpeg, webp	Varies by model	Basic image formats are universally supported, but specific model capabilities may offer more niche formats through OpenRouter.
Background Control	transparent, opaque, auto	Limited	OpenAI provides fine-grained control over background transparency. OpenRouter’s capabilities depend on the chosen model.
Moderation	Built-in levels	Varies	OpenAI includes robust, configurable content moderation. OpenRouter’s moderation capabilities are model-dependent and may require additional filtering.
Reference Editing	Full support	Model-dependent	Both support reference image editing, but OpenAI’s implementation is consistent, while OpenRouter’s varies by the underlying model.
Pricing	$0.02-$0.08/image	Varies (often lower)	OpenAI has a clear, per-image pricing structure. OpenRouter’s aggregated pricing can offer more cost-effective options, especially with open-source models.
Ease of Use	High	Medium to High	OpenAI is straightforward if already using Codex. OpenRouter requires careful model selection and understanding of different model capabilities.
Flexibility	Medium	High	OpenAI offers a consistent experience. OpenRouter provides greater flexibility to experiment with different models and pricing structures.

OpenAI offers tighter integration with background and moderation controls, while OpenRouter provides flexibility and potentially lower costs through open-source alternatives. Choose OpenAI for production workflows requiring consistent quality and safety filters and where existing Codex integration is a factor. Use OpenRouter when experimenting with different artistic styles, exploring a broader range of models, or when cost optimization matters more than uniform output across all generations.

The image_generate Tool: Parameter Deep Dive

The updated image_generate tool accepts several new parameters that give agents fine-grained control over the output. Understanding these parameters is crucial for maximizing the utility of the new image generation capabilities. The quality hint accepts “standard” or “hd” (high definition), affecting inference steps and detail level, with “hd” typically yielding more intricate and visually appealing results at a higher cost and longer generation time. The format parameter specifies “png”, “jpeg”, or “webp” output, allowing agents to optimize for quality, file size, or web compatibility.

OpenAI-specific options include background for transparency control, allowing images to be generated with transparent backgrounds suitable for overlays or compositing. The moderation parameter enables agents to specify content safety strictness, which is crucial for public-facing applications. The compression option can be used for file size optimization, balancing image quality with storage and bandwidth requirements. Agents can also pass a user identifier for abuse tracking when required by provider terms, enhancing accountability and compliance. Here’s how an agent might call the tool for a complex visual asset:

{
  "tool": "image_generate",
  "parameters": {
    "prompt": "A minimalist dashboard UI with blue accent colors, dark mode, showing real-time stock market data with a subtle glow effect.",
    "quality": "hd",
    "format": "png",
    "background": "transparent",
    "moderation": "strict",
    "compression": "high",
    "user": "design_team_agent_001",
    "timeoutMs": 60000
  }
}

Each parameter plays a role in customizing the generated image to meet precise requirements, enabling agents to produce highly specific and production-ready visual content.

Handling Timeouts for Long-Running Generations

Image generation takes longer than text completion. Complex prompts, higher quality settings, and larger resolutions can significantly extend processing times. OpenClaw v2026.4.23 introduces optional timeoutMs support for image, video, music, and TTS generation tools. The default system timeout remains 30000ms (30 seconds), but you can extend specific calls to 300000ms (5 minutes) for high-quality generations that require extensive diffusion steps. This prevents agents from hanging indefinitely while waiting for complex renders, but gives you flexibility when quality matters more than speed.

The timeoutMs parameter applies per-call, so you can judiciously use shorter timeouts for quick thumbnails or preliminary sketches and longer ones for detailed illustrations or final production assets within the same agent session. When timeouts occur, the tool returns a specific error code that agents can catch and handle through retry logic, reducing quality settings, or falling back to alternative providers. This robust error handling mechanism ensures that agents can operate reliably even when interacting with potentially long-running external services, improving the overall resilience of your OpenClaw applications.

Forked Context for Parallel Image Workflows

Subagents in OpenClaw traditionally run with isolated contexts to prevent prompt injection and maintain clean state boundaries. Version 2026.4.23 introduces optional forked context for sessions_spawn runs. When enabled, child agents inherit the requester’s transcript history, allowing them to understand the full conversation context while operating in parallel. This is particularly valuable for image workflows where you want one subagent generating a hero image while another creates thumbnails from the same brief, or perhaps generates variations in different styles or aspect ratios.

The parent agent can spawn multiple children with forked context, each handling different aspects of a visual project, then compare results or integrate them into a composite output. For example, a design agent could spawn subagents to generate a product image, a background scene, and a text overlay, all informed by the same initial design brief. The default behavior remains isolated sessions for security and independence, but the fork option opens new patterns for batch visual asset generation, parallel creative exploration, and complex multimodal project management, significantly enhancing the collaborative potential of OpenClaw agents.

Memory Optimization for Constrained Hosts

Local embedding models in OpenClaw previously used fixed context windows, which could lead to out-of-memory errors on resource-constrained devices like Raspberry Pi or older laptops. Release v2026.4.23 addresses this by adding memorySearch.local.contextSize configuration. This setting defaults to 4096 tokens but is now tunable downward to as low as 512 or 1024 tokens as needed. This enhancement, contributed by @aalekh-sarvam in issue #70544, makes it possible to run memory-enabled agents on edge hardware without requiring extensive system modifications or patching the memory host.

When you reduce the context size, the system intelligently truncates long documents, preserving the most relevant chunks for retrieval. This involves a trade-off between recall accuracy and RAM usage, but it makes OpenClaw viable for a broader range of deployments, including IoT devices, embedded systems, and local-first architectures where cloud dependency is not an option or is undesirable due to latency or privacy concerns. This optimization extends OpenClaw’s reach and enables more distributed and versatile AI agent applications.

Pi 0.70.0 and gpt-5.5 Catalog Updates

The bundled Pi packages updated to version 0.70.0, incorporating upstream metadata for the GPT-5.5 model family. OpenClaw now uses Pi’s official catalog definitions for OpenAI and OpenAI Codex providers rather than maintaining separate local mappings. This standardization ensures greater consistency and reduces the potential for discrepancies between OpenClaw’s internal model definitions and the actual capabilities offered by providers. The framework keeps only local forward-compatibility handling for gpt-5.5-pro to ensure existing configurations don’t break during the transition, providing a smooth upgrade path.

This dependency update brings several benefits, including performance improvements to the context engine, better token counting accuracy for newer model variants, and enhanced compatibility with future OpenAI releases. If you maintain custom Pi configurations or frequently interact with the model catalog, it is recommended to review the new catalog structure to ensure your overrides align with the updated schema and to take advantage of the latest features and optimizations.

Codex Harness Improvements and Debugging

The Codex harness, a critical component for code execution within OpenClaw, gains structured debug logging for embedded harness selection decisions. While the /status endpoint remains simple for health checks, gateway logs now provide detailed explanations of why the system chose a particular harness or fell back to Pi. This granular logging is invaluable for diagnosing complex authentication issues or model routing problems in production environments, significantly reducing troubleshooting time.

The release also fixes routing for native request_user_input prompts, ensuring they return to the originating chat rather than getting lost in subagent contexts, improving the user experience for interactive agents. Queued follow-up answers now preserve correctly, enhancing the reliability of multi-turn conversations, and the system honors newer app-server command approval amendments, ensuring compliance with updated security protocols. Additionally, context-engine assembly gets redaction for sensitive data, preventing accidental logging of Personally Identifiable Information (PII) during image generation workflows that might include user-uploaded content, thereby bolstering data privacy and security.

What This Means for Existing OpenClaw Projects

If you run OpenClaw agents in production, v2026.4.23 introduces no breaking changes for existing functionality. Your current text-based agents will continue working unchanged, allowing for a phased adoption of the new visual capabilities. The image generation features are opt-in, requiring explicit tool configuration and provider setup, so they will not inadvertently impact existing workflows.

However, you should audit any external image generation scripts or custom integrations you’ve written. Native integration offers significant advantages, including better error handling, automatic retries, unified logging, and a more consistent API compared to ad-hoc shell scripts or external service calls. Consider migrating these external image calls to the image_generate tool to reduce external dependencies, simplify your codebase, and leverage OpenClaw’s robust framework features. Teams using Codex for other functionalities should also verify their OAuth scopes include image generation permissions, as some older Codex tokens may need regeneration to access the new GPT-Image-2 capabilities. This ensures a smooth transition and full utilization of the expanded multimodal features.

Migration Guide: Enabling Image Generation

Enabling image generation in your OpenClaw agents involves a few straightforward steps, ensuring a smooth integration of these powerful new capabilities.

Update OpenClaw: First, update your OpenClaw installation to v2026.4.23 using your preferred package manager (e.g., npm update openclaw or docker pull clawbot/openclaw:v2026.4.23) or container registry. This ensures you have access to all the new features and bug fixes.
Configure Your Provider:
- For OpenAI: Ensure your Codex OAuth is enabled and that the associated scopes include permissions for image generation. This might require re-authenticating or adjusting settings in your OpenAI developer dashboard. OpenClaw handles the token exchange automatically.
- For OpenRouter: Verify that your OPENROUTER_API_KEY is correctly set in your environment variables or claw.config.json and that it has the necessary permissions for image generation models.

Add image_generate Tool to Agent: Include the image_generate tool in your agent’s tool registry within your claw.config.json file. This makes the tool available for your agent to call. An example configuration might look like this:

{
  "agent": {
    "tools": [
      {
        "name": "image_generate",
        "description": "Generates or edits images based on a text prompt or reference image.",
        "parameters": {
          "type": "object",
          "properties": {
            "prompt": { "type": "string", "description": "The text description of the image to generate or edit." },
            "quality": { "type": "string", "enum": ["standard", "hd"], "default": "standard" },
            "format": { "type": "string", "enum": ["png", "jpeg", "webp"], "default": "png" },
            "background": { "type": "string", "enum": ["transparent", "opaque", "auto"] },
            "moderation": { "type": "string", "enum": ["low", "medium", "strict"] },
            "compression": { "type": "string", "enum": ["none", "low", "medium", "high"] },
            "user": { "type": "string", "description": "An identifier for the user or agent making the request." },
            "timeoutMs": { "type": "number", "description": "Maximum time in milliseconds to wait for image generation." },
            "reference_image_url": { "type": "string", "description": "URL of an image to use as a reference for editing." },
            "reference_image_base64": { "type": "string", "description": "Base64 encoded image to use as a reference for editing." }
          },
          "required": ["prompt"]
        }
      }
    ]
  },
  "providers": {
    "openai": {
      "models": {
        "gpt-image-2": {
          "type": "image_generation",
          "endpoint": "https://api.openai.com/v1/images/generations",
          "authentication": "oauth" 
        }
      }
    },
    "openrouter": {
        "apiKey": "${OPENROUTER_API_KEY}",
        "models": {
            "stabilityai/stable-diffusion-xl": {
                "type": "image_generation",
                "endpoint": "https://openrouter.ai/api/v1/chat/completions"
            }
        }
    }
  }
}

Test with a Simple Prompt: Before implementing complex workflows, test the image_generate tool with a simple prompt to confirm it’s working correctly and that your authentication is configured properly.
Implement Reference-Image Editing: If using reference-image editing, ensure your agent handles base64 encoding or URL passing correctly for the input image. The agent needs to pass the reference_image_url or reference_image_base64 parameter along with the editing prompt.
Set Appropriate Timeouts: Configure timeoutMs values based on your expected quality and complexity of generated images. Start with standard quality and shorter timeouts for initial testing (e.g., 30000ms), then scale to HD and longer timeouts (e.g., 300000ms) for production assets or highly detailed generations.

By following these steps, you can effectively integrate OpenClaw’s new image generation and editing capabilities into your AI agents, unlocking a new realm of multimodal possibilities.

Security Considerations for Visual Agents

The introduction of image generation and editing capabilities into AI agents, while powerful, also presents new security considerations that developers must address. These concerns extend beyond traditional text-based AI risks and require a proactive approach to mitigate potential vulnerabilities.

Prompt Injection and Inappropriate Content: Visual agents are susceptible to prompt injection attacks, where malicious prompts can lead to the generation of inappropriate, offensive, or harmful content.
- Mitigation: Configure strict moderation levels for user-facing agents, especially when using OpenAI. Implement robust output filtering mechanisms that analyze generated images for objectionable content before displaying them to users or integrating them into other systems. Consider using external content moderation APIs or human review for critical applications.
Cost Attacks: Image generation is significantly more resource-intensive and costly than text token generation. Malicious actors could exploit this by submitting numerous complex image generation requests, leading to unexpected and substantial billing charges.
- Mitigation: Implement per-user rate limits on image generation requests. Set maximum generation quotas per user or per agent session. Monitor API usage closely for anomalous patterns and set up billing alerts with your cloud provider or OpenRouter.
Data Privacy and Sensitive Information: Generated images, or reference images used for editing, may contain sensitive data, including Personally Identifiable Information (PII) or confidential business information.
- Mitigation: Ensure your storage backend encrypts generated and reference assets at rest. Implement strict access controls for stored images. Consider automatic expiration and deletion policies for temporary images. Before passing reference images to external providers, implement client-side or server-side redaction of any sensitive information within the image if it’s not essential for the editing task.
Supply Chain Risks: Relying on external image generation models introduces dependencies on third-party services. Vulnerabilities or compromises in these services could impact your agent’s functionality or data security.
- Mitigation: Choose reputable providers with strong security track records. Stay informed about security updates and patches from your chosen providers. Implement robust error handling and fallback mechanisms in your agents to gracefully manage service outages or security incidents.
Image Parsing Vulnerabilities (Reference-Image Editing): When using reference-image editing, ensure that input images are validated for malicious content or malformed data before being passed to the provider’s editing endpoint. Exploits could potentially leverage vulnerabilities in image parsers.
- Mitigation: Use secure image processing libraries for any pre-processing of reference images. Implement strict validation of image file types and sizes. Consider running input images through antivirus or malware scanners if the source is untrusted.

By proactively addressing these security considerations, developers can build more resilient, secure, and trustworthy AI agents that leverage the full potential of visual content generation and editing.

The Roadmap: What’s Next for Multimodal Agents

Version 2026.4.23 establishes a robust foundation for fully multimodal agents within the OpenClaw ecosystem. The current release is a stepping stone towards even more sophisticated visual and auditory capabilities. The timeout infrastructure added for images also inherently supports video and music generation tools, strongly indicating that these modalities will receive similar native treatment in upcoming releases. This suggests a future where agents can not only create static images but dynamically compose entire multimedia experiences.

Expect to see the emergence of advanced cross-modal reasoning capabilities, where agents can analyze generated images, videos, or audio and iterate on them without constant human intervention. This could lead to agents designing, producing, and refining entire advertising campaigns, film sequences, or musical compositions autonomously. The forked context feature, already introduced, hints at sophisticated workflows where agents manage entire creative pipelines, spawning specialized subagents for asset creation, review, and optimization, behaving like a distributed creative studio.

Community plugins for local image models like Stable Diffusion XL and Flux are highly anticipated now that the image_generate tool interface is standardized. This will democratize access to powerful generative AI and enable offline or privacy-sensitive applications. Furthermore, expect tighter integration with vector databases for semantic image search and retrieval-augmented generation using visual content. This would allow agents to understand and retrieve images based on their conceptual meaning, rather than just keywords, opening doors for visual knowledge bases and more intelligent content recommendation systems. The future of OpenClaw is undeniably multimodal, promising agents that can interact with and create content across all sensory dimensions.

Frequently Asked Questions

Do I need an OpenAI API key to use image generation in OpenClaw v2026.4.23?

No. OpenClaw v2026.4.23 uses Codex OAuth to authenticate with OpenAI’s image generation APIs. You can run openai/gpt-image-2 without setting OPENAI_API_KEY in your environment. The framework handles token exchange internally when you use the image_generate tool with the OpenAI provider configured for Codex access. This streamlines authentication and reduces the overhead of managing multiple API keys for different OpenAI services within your OpenClaw agents.

Can I use local image models with the new image_generate tool?

Not yet. The v2026.4.23 release focuses on cloud providers (OpenAI via Codex OAuth and OpenRouter via API key) to ensure broad compatibility and access to powerful models. Local image generation support is on the roadmap but requires additional infrastructure for GPU scheduling and model management, which is a complex endeavor. For now, if you wish to use open-source image models like Stable Diffusion 3, you can access them via OpenRouter’s API, which aggregates various models, providing a flexible alternative to direct local execution.

How does reference-image editing work technically?

Reference-image editing functions by passing a base64-encoded image alongside your text prompt to the provider’s dedicated editing endpoint. OpenClaw intelligently handles the technical details, including MIME type detection for various image formats (like PNG, JPEG) and constructing multipart forms required by the API. The agent can specify desired edits such as style transfer, object removal, or variation generation through natural language, and the provider’s model applies these modifications while maintaining visual consistency with the source image. Results are returned either as a URL to the hosted edited image or as base64 data, depending on your specified output_format hint.

What happens if an image generation times out?

OpenClaw v2026.4.23 introduces optional per-call timeoutMs for generation tools, including image_generate. If an image generation request exceeds this specified limit, the tool will return a timeout error to the agent. The agent can then programmatically decide on a course of action, such as retrying the request, reducing the quality settings to potentially speed up generation, or falling back to an alternative provider if configured. The default system timeout for OpenClaw remains 30 seconds, but you have the flexibility to extend it up to 300 seconds (5 minutes) for high-quality or complex generations that inherently require more processing time.

Is there a cost difference between OpenAI and OpenRouter for image generation?

Yes, there is a notable cost difference. OpenAI charges per image generated, with pricing typically ranging from $0.02 to $0.08 per image, depending on the resolution and quality tier (standard vs. HD). OpenRouter, on the other hand, acts as an aggregator for multiple providers and models, often including open-source options. This aggregation means OpenRouter’s pricing varies significantly by the chosen model, with some offerings potentially providing lower rates for comparable quality. OpenRouter’s support for open-source models can also leverage cheaper infrastructure, leading to more cost-effective solutions. It is crucial to check the current pricing details on both platforms before deploying high-volume agents, as image generation costs can accumulate much faster than text token costs.

Conclusion

OpenClaw v2026.4.23 introduces native image generation and reference-image editing via OpenAI and OpenRouter, eliminating API key friction for AI agents.