OpenClaw 2026.4.15 Release: Claude Opus 4.7 and Google Gemini TTS Integration

Q: How do I enable Claude Opus 4.7 in OpenClaw 2026.4.15?

Set `model: opus` or `model: claude-opus-4.7` in your agent manifest. The release adds automatic aliases and CLI defaults that route to Opus 4.7 without manual configuration. Image understanding is now bundled, so no additional vision flags are required for multimodal tasks.

Q: What audio formats does Google Gemini TTS support?

OpenClaw 2026.4.15 supports WAV for standard reply output and PCM for telephony integrations. Configure the format in your TTS provider settings based on your downstream audio pipeline requirements.

Q: Is the MEDIA tool passthrough fix backward compatible?

Yes, but review your gateway configurations. The fix anchors trusted local MEDIA passthrough to exact registered built-in tool names, rejecting client tool definitions with conflicting normalized names. Existing agents using standard tool naming remain unaffected.

Q: Can I use Google Gemini TTS with local LLMs?

Yes. The TTS provider operates independently from your LLM backend. You can route agent logic through local models while offloading voice synthesis to Google's API, or use Gemini for both reasoning and speech.

Q: Where can I find the setup documentation for the new TTS features?

The bundled google plugin includes updated setup guidance in the official docs. Check the provider registration section for voice selection parameters and audio output configuration examples.

What Changed in OpenClaw 2026.4.15?

OpenClaw 2026.4.15-beta.2, shipped on April 15, introduces two substantial enhancements to the open-source AI agent framework. The release provides native support for Anthropic’s Claude Opus 4.7, including default model selections, simplified opus aliases, and bundled image understanding capabilities that eliminate manual multimodal configuration. Simultaneously, the update integrates Google Gemini text-to-speech directly into the bundled google plugin, enabling agents to generate voice output in WAV format for standard replies and PCM format for telephony applications. A critical security fix also landed in the gateway layer, anchoring trusted local MEDIA: tool-result passthrough to exact registered built-in tool names while rejecting client tool definitions with conflicting normalized identifiers. These changes reflect OpenClaw’s continued push toward production-ready autonomy, combining advanced reasoning capabilities with multimodal output options that expand how agents interact with human operators and external systems. This version strengthens the foundation for complex AI agent deployments.

Why Claude Opus 4.7 Matters for Production Agents

Claude Opus 4.7 represents a significant leap in reasoning capabilities for autonomous systems. The model demonstrates enhanced performance on complex coding tasks, extended context handling up to 200K tokens, and improved instruction following for multi-step agentic workflows. For OpenClaw builders, this translates to agents that can maintain coherence across longer execution traces, debug their own code more effectively, and handle nuanced tool selection scenarios without human intervention. The 4.7 release specifically addresses edge cases in structured output generation that plagued earlier versions, reducing JSON parsing errors by approximately 40% in internal benchmarks. Production deployments benefit from reduced retry logic and fewer fallback handlers, leading to more robust and efficient operations. When your agent needs to analyze a 50-file codebase, generate a patch, verify the fix, and commit changes, Opus 4.7 maintains context consistency where previous models lost track of dependencies. This stability is essential for 24/7 autonomous operations where every hallucinated tool call costs compute cycles and API dollars, making Opus 4.7 a crucial upgrade for reliability.

Understanding the New `opus` Aliases and CLI Defaults

The 2026.4.15 release simplifies model selection through intelligent aliasing. You can now specify model: opus in your agent configuration without versioning anxiety. The framework automatically resolves this to the latest stable Opus variant, currently 4.7, while maintaining backward compatibility for pinned versions. CLI defaults have shifted accordingly. Running openclaw init now generates configurations defaulting to Anthropic models where previously it required explicit provider selection, streamlining initial setup. The alias system works through the manifest resolution layer, checking ~/.openclaw/models.json for canonical mappings before API calls. This abstraction prevents configuration drift when Anthropic releases minor updates. For teams managing multiple agents, the alias reduces manifest maintenance overhead. Instead of updating version strings across twenty agent definitions, you update a single mapping file. The CLI also adds --opus-quickstart flags that bypass provider selection prompts entirely, streamlining the onboarding path for new developers who want immediate access to high-performance reasoning without navigating complex provider configuration menus.

Image Understanding Bundled: Multimodal Without Configuration

Prior to 2026.4.15, enabling vision capabilities required explicit feature flags and separate API endpoints, adding complexity to multimodal agent development. Now, image understanding is bundled directly into the Claude Opus 4.7 integration. When your agent receives a base64-encoded image or a local file path in the MEDIA: namespace, OpenClaw automatically routes it through the multimodal pipeline without additional manifest declarations. This architectural decision significantly removes friction from computer vision workflows, allowing developers to focus on agent logic rather than infrastructure. Your agent can analyze screenshots, parse diagrams, or verify UI states using the same tool-calling patterns as text-based operations. The bundled approach also optimizes token usage. Instead of separate API calls for vision and text processing, OpenClaw batches multimodal inputs into single requests, reducing latency by 15-20% in high-throughput scenarios, which is critical for real-time applications. The change affects how you structure tool results, too. Functions returning image data can now pipe directly into Claude’s context window without intermediate conversion steps, enabling chains where an agent screenshots an error, analyzes the visual output, and generates a fix in one continuous reasoning trace.

Google Gemini TTS: Voice Output for AI Agents

Text-to-speech capabilities enter OpenClaw through native Google Gemini integration, marking the framework’s first official voice synthesis support. The feature enables agents to vocalize responses, alerts, and status updates through the same plugin architecture that handles LLM inference. Unlike external TTS wrappers that require shelling out to separate processes, the Gemini provider operates within OpenClaw’s execution graph, maintaining state consistency between reasoning and vocalization. This is particularly important for real-time applications where responsiveness is key. An agent monitoring server logs can now audibly announce critical errors through your workstation speakers or phone system without latency-heavy context switching, improving situational awareness. The integration supports dynamic voice selection based on content type, allowing different personas for error messages versus status updates, adding nuance to agent communication. Configuration happens at the provider level, meaning you define voice characteristics once and apply them across all agent instances using that provider profile. For accessibility, this feature transforms text-heavy agents into voice-interactive assistants suitable for hands-free environments like vehicles or manufacturing floors, significantly broadening their utility.

Provider Registration and Voice Selection Architecture

The Google TTS implementation follows OpenClaw’s established provider registry pattern, maintaining consistency with existing LLM and tool configurations. Registration requires adding the google provider to your providers.yaml with TTS-specific credentials, which might be distinct from your Gemini API keys if you operate separate Google Cloud projects for different services. Voice selection operates through a hierarchical resolution system. You can set a default voice at the provider level, override it per-agent in the manifest, or specify it dynamically in tool calls using the voice_id parameter, offering granular control over agent vocalizations. The current release supports Gemini’s standard voice library, including studio-quality neural voices optimized for long-form content and compressed variants for low-bandwidth scenarios, catering to diverse application needs. Provider initialization performs capability negotiation, verifying your Google Cloud project has Text-to-Speech API access before agent startup. This fail-fast approach prevents runtime errors mid-conversation, ensuring a smoother user experience. The architecture also supports voice cloning through Gemini’s premium tier, though this requires additional IAM permissions and billing verification checks during the registration handshake for advanced use cases.

WAV Reply Output vs PCM Telephony Output

OpenClaw 2026.4.15 supports two distinct audio output formats, each serving different infrastructure needs. WAV reply output generates standard RIFF/WAV files suitable for playback on desktop clients, web browsers, and mobile applications. This format includes headers specifying sample rate (default 24kHz) and bit depth, making it immediately playable without client-side decoding logic, simplifying integration into common multimedia environments. PCM telephony output provides raw pulse-code modulation streams optimized for integration with VoIP systems, PBX gateways, and PSTN bridges. The PCM variant strips headers and metadata, delivering raw audio buffers that conform to telephony standards (typically 8kHz or 16kHz, 16-bit linear). When building phone-based agents, you want PCM to minimize transcoding overhead and reduce latency between speech generation and network transmission, which is crucial for natural conversation flow. For chatbots with optional voice features, WAV offers better compatibility with HTML5 audio elements and general media players. The format selection happens at the tool-call level, allowing a single agent to generate WAV for web dashboard notifications and PCM for SIP trunk integration within the same execution context, providing flexibility for mixed environments.

The Gateway Security Fix: Anchoring MEDIA Tool Passthrough

The 2026.4.15 release addresses a subtle vulnerability in how OpenClaw handles MEDIA: tool results through the gateway layer. Previously, the system trusted local media passthrough based on normalized tool names, creating a potential injection vector where maliciously named client tools could intercept or modify media streams. This could lead to data leakage or manipulation in a compromised environment. The fix anchors trusted passthrough to the exact raw names of registered built-in tools, rejecting any client tool definitions whose names normalize to match built-in identifiers. This change significantly hardens the boundary between agent-generated content and system-level media handling, preventing unauthorized access. For example, if your agent registers a custom tool named media_processor, the gateway now explicitly distinguishes this from the built-in MEDIA: handler, preventing privilege escalation where custom code might access restricted audio or image pipelines. The validation occurs at tool registration time, throwing configuration errors during agent startup rather than runtime, ensuring dangerous configurations fail loudly and immediately. Review your tool naming conventions when upgrading, as this stricter validation may reject previously accepted configurations that relied on name normalization.

Configuring Claude Opus 4.7 in Your Agent Manifest

Upgrading to Opus 4.7 requires minimal configuration changes, emphasizing ease of adoption while leveraging the latest model capabilities. You need to update your agent.yaml to utilize the new aliases and benefit from the bundled features.

model:
  provider: anthropic
  name: opus  # Resolves to claude-opus-4.7
  
features:
  vision: true  # Now optional, defaults to true for Opus models with multimodal capabilities
  
cli:
  defaults:
    model: opus
    max_tokens: 4096

The vision: true flag remains supported for explicit documentation and clarity but defaults to enabled for Opus models, simplifying manifest files. If you previously pinned specific versions, remove the patch number to leverage automatic updates and ensure you are always using the most current stable Opus model.

# Before: Explicitly pinned version
name: claude-opus-4.6.2

# After: Using the new alias for automatic updates
name: opus

For multimodal agents, ensure your tool schemas are prepared to accept image inputs, as the bundled vision capabilities make this a seamless integration point. This involves defining the expected input format within your tool definitions.

tools:
  - name: analyze_screenshot
    input_schema:
      type: object
      properties:
        image:
          type: string
          format: base64 # Expects base64-encoded image data

These configuration adjustments ensure your agents are optimized for Opus 4.7’s advanced features, including its powerful multimodal processing.

Implementing Google Gemini TTS in Your Agent

To enable voice synthesis capabilities in your OpenClaw agents, you must configure the Google provider with the appropriate Text-to-Speech (TTS) settings. This involves defining the TTS parameters within your providers.yaml file and then integrating the speak tool into your agent’s logic.

providers:
  google:
    type: google
    api_key: ${GOOGLE_API_KEY} # Ensure this API key has TTS permissions
    tts:
      default_voice: en-US-Standard-A # Specify a default voice for consistency
      output_format: wav  # or pcm, based on your primary use case
      
tools:
  - name: speak
    handler: google.tts.synthesize # The handler for synthesizing speech
    config:
      voice_selector: dynamic # Allows runtime voice selection

Once the provider is configured, you can trigger speech synthesis within your agent’s logic. This example demonstrates how to dynamically select a voice and output format based on the message’s urgency and the agent’s context (e.g., telephony integration).

async def notify_user(self, message: str, urgency: str):
    # Select a voice based on the urgency level
    voice = "en-US-Studio-O" if urgency == "high" else "en-US-Standard-C"
    
    # Determine output format based on context (telephony or standard)
    output_format = "pcm" if self.context.telephony else "wav"
    
    await self.tools.speak(
        text=message,
        voice_id=voice,
        format=output_format
    )

The format parameter in the speak tool call overrides the default specified in the provider configuration, allowing runtime selection based on the delivery channel or specific notification requirements. Audio data returns as base64 strings embedded in tool results, which OpenClaw then routes to your configured output device or HTTP endpoint, completing the speech synthesis pipeline. This flexible approach ensures your agents can communicate effectively across various platforms and scenarios.

Performance Benchmarks: Opus 4.7 vs Previous Versions

The release of Claude Opus 4.7 brings notable performance improvements across several key metrics compared to its predecessor, Opus 4.6. These enhancements are crucial for developers building high-performance and reliable AI agents. The following table summarizes the benchmark results:

Metric	Opus 4.6	Opus 4.7	Delta	Description of Improvement
Context Window	200K	200K	0%	Maintained large context window for complex tasks.
JSON Parsing Accuracy	87%	94%	+7%	Reduced errors in structured output, leading to fewer retries.
Code Generation Speed	45 tok/s	52 tok/s	+15%	Faster generation of code, improving developer productivity.
Tool Selection Accuracy	91%	96%	+5%	More reliable tool use, reducing incorrect tool calls.
Multimodal Latency	1.2s	0.9s	-25%	Quicker processing of image and text inputs, enhancing real-time interactions.

The multimodal latency improvement stems primarily from bundled image processing, which eliminates redundant API round trips and optimizes the data flow. Code generation speed increases result from optimized token prediction in 4.7’s architectural design, leading to more efficient code synthesis. For agents processing high volumes of structured data, the 7% improvement in JSON parsing accuracy significantly reduces retry loops and validation failures, directly impacting operational efficiency and cost. These numbers reflect 1000-iteration averages on the OpenClaw internal benchmark suite, running against live Anthropic endpoints to provide realistic performance indicators. These improvements underscore Opus 4.7’s suitability for demanding production environments.

Migration Guide: Upgrading from 2026.4.14

Upgrading your OpenClaw environment to version 2026.4.15 is a straightforward process, primarily involving package updates and manifest adjustments. Follow these steps to ensure a smooth transition and leverage the new features.

First, upgrade through your preferred Python package manager:

pip install openclaw==2026.4.15-beta.2

After installation, verify that the correct version is active in your environment:

openclaw --version
# Expected output: 2026.4.15-beta.2

Next, update your agent manifests to leverage the new features and aliases:

Replace explicit version pins with opus aliases: This ensures you always use the latest Opus model without manual updates.
Remove redundant vision: true flags for Opus models: Vision is now bundled and enabled by default for Opus, simplifying your configurations.
Add Google TTS provider configuration: If you plan to use voice features, set up the google provider with TTS capabilities in your providers.yaml.
Audit custom tool names: Ensure no conflicts with MEDIA: normalization due to the new security fix. Rename any tools that might cause collisions.

Test multimodal capabilities with a dedicated command to confirm image processing is functional:

openclaw test --vision --model opus --image ./test.png

Should you need to revert, the rollback procedure remains standard:

pip install openclaw==2026.4.14

No database migrations are required for this release, simplifying the upgrade process significantly.

Use Cases: When to Use PCM vs WAV Output

Choosing between PCM and WAV audio output formats for your OpenClaw agent depends critically on the target environment and specific application requirements. Each format excels in different scenarios, offering distinct advantages.

Choose PCM when building telephony integrations. If your agent connects to systems like Twilio, Asterisk, or custom SIP infrastructure, PCM is the optimal choice. It eliminates transcoding overhead, as the raw audio streams integrate directly with G.711 codecs common in PSTN networks. Latency is a primary concern in telephony, and PCM generation skips WAV header writing, shaving 10-15 milliseconds off synthesis time, which is critical for interactive voice response systems where users expect immediate feedback and minimal delay. This raw format ensures maximum compatibility and efficiency for voice communication channels.

Select WAV for web applications and desktop notifications. Browser Audio APIs prefer containerized formats with explicit metadata. WAV files self-describe their sample rate and bit depth, preventing client-side configuration errors and ensuring consistent playback. Use WAV when agents generate voice memos, podcast segments, or accessibility audio for content management systems, where file integrity and metadata are important. The format also simplifies debugging; you can save WAV outputs directly to disk and inspect them in standard audio editors without needing specialized tools to interpret raw PCM data.

Hybrid approaches are also highly effective. For instance, a customer service agent might use PCM for live call handling to ensure low latency during conversations and then generate WAV files for creating follow-up voicemail attachments or sending audio summaries via email. This allows the agent to adapt its communication strategy based on the specific interaction context and delivery mechanism.

Security Implications of the Gateway Tool Fix

The MEDIA tool passthrough fix implemented in OpenClaw 2026.4.15 closes a significant class of injection attacks specific to multimodal agents. This enhancement is crucial for maintaining the integrity and confidentiality of data processed by your AI agents. Previously, a malicious actor could register a custom tool with a name that, after normalization, matched the built-in MEDIA: prefix. This could allow the attacker to intercept or modify sensitive image data intended for vision models, potentially compromising confidentiality when agents processed sensitive screenshots or documents.

The 2026.4.15 validation now requires exact string matching on raw tool names, not normalized versions. This stricter validation prevents such collisions and ensures that only legitimate, built-in MEDIA: handlers can access and process media streams.

To bolster your security posture, consider these actions:

Audit all custom tools: Thoroughly review your existing custom tools for names containing “media” or “MEDIA” to identify any potential conflicts.
Verify client-supplied tool definitions: Ensure no client-supplied tool definitions inadvertently shadow built-in handlers.
Check gateway logs: After upgrading, monitor gateway logs for any rejected registration attempts, which will indicate name collisions.

This fix also significantly improves audit trails. Gateway rejections now log the exact name collision, making security reviews easier and more transparent. If you maintain a ClawShield or Rampart security layer, update your policies to reflect these stricter validation rules. This change aligns with OpenClaw’s zero-trust approach to tool execution, where even local media handling requires explicit authorization, thereby enhancing the overall security of your AI agent deployments.

Community Spotlight: @barronlroth and the TTS Contribution

The groundbreaking Google Gemini TTS integration within OpenClaw 2026.4.15 is a direct result of a significant community contribution from @barronlroth, who submitted the comprehensive provider implementation through Pull Request #67515. This exemplifies the power of open-source collaboration and how community efforts can profoundly advance the framework’s capabilities.

@barronlroth’s contribution goes beyond core synthesis functionality. It includes detailed documentation, sophisticated voice selection logic, and the dual-format output system supporting both WAV and PCM streams, catering to a wide range of use cases. The implementation meticulously adheres to OpenClaw’s provider contract specifications, maintaining architectural consistency with existing Anthropic and OpenAI integrations, which is vital for a unified developer experience.

Community contributions like this are instrumental in accelerating OpenClaw’s capability matrix without overburdening the core maintainer backlog. The TTS feature arrived production-ready, complete with robust error handling, intelligent rate limit management, and provider registration logic that meets enterprise standards for reliability and scalability. If you are building extensions for OpenClaw, studying this PR is highly recommended as a reference implementation. It demonstrates proper separation between provider configuration and tool execution, showcases clean async/await patterns for efficient audio streaming, and includes thorough test coverage, including mocked API responses, ensuring stability. The contribution also included setup documentation that significantly reduces onboarding friction for developers new to Google Cloud’s TTS APIs, making the feature immediately accessible and usable for a broad audience.

Breaking Changes in 2026.4.15

While OpenClaw 2026.4.15 is largely additive, introducing many new features, it’s important to be aware of a few changes that might affect existing configurations and workflows. These are primarily minor but require attention during the upgrade process.

The most notable breaking change is in tool validation. Custom tools that previously accepted names normalizing to match built-in handlers will now trigger a configuration error at startup. For instance, if your agents define tools like Media_Handler or media-processor, these will need to be renamed to avoid conflicts with the stricter validation for the MEDIA: tool. Renaming these tools before upgrading is crucial to ensure a smooth transition.

The Claude CLI defaults have also shifted. New projects initialized with openclaw init will now default to Anthropic as the primary provider, rather than requiring explicit selection. This changes the behavior of openclaw init for teams that rely on scripted project generation. To maintain previous initialization behavior, where a different provider was the default, you will need to explicitly add --provider openai or --provider local to your openclaw init commands.

Finally, Google TTS requires specific IAM permissions beyond standard Gemini API access. If you enable the Google TTS feature and encounter authentication errors, it is highly probable that your Google Cloud service account lacks the necessary permissions. Verify that your service account includes the roles/cloudtexttospeech.client role in addition to any standard Vertex AI roles you might already have assigned. Without this specific role, TTS synthesis requests will be denied.

Addressing these points during your upgrade planning will ensure a seamless transition to OpenClaw 2026.4.15.

Testing Multimodal Agents with the New Release

Thorough testing is crucial after upgrading OpenClaw to 2026.4.15, especially to validate the new multimodal and TTS capabilities. Running targeted tests will confirm that your agents are correctly configured and functioning as expected with the updated features.

First, verify that the vision capabilities are working correctly with Claude Opus 4.7:

openclaw agent run --vision-test --model opus

This command sends a predefined test image through the vision pipeline, verifying base64 encoding, correct API routing, and accurate response parsing from the model. This ensures your agent can effectively interpret visual data.

Next, test the Google Gemini TTS integration. You can synthesize a simple phrase to check the basic functionality:

openclaw tools test google.tts.synthesize --text "System operational" --format wav --output-file output.wav

After running this command, check the integrity of the generated audio file:

file output.wav
# Expected output similar to: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 24000 Hz

This confirms the WAV file was correctly generated and formatted. For telephony agents or applications requiring raw audio streams, test PCM streaming programmatically:

import asyncio
from openclaw.testing import TTSTest

async def test_pcm_stream():
    tester = TTSTest(format="pcm")
    # Stream a test message and get audio chunks
    chunks = await tester.stream("Test message for PCM output")
    # Assert the size of the first chunk, typical for 16kHz PCM (20ms chunk)
    assert len(chunks[0]) == 640  
    print(f"Successfully received {len(chunks)} PCM audio chunks.")

# Run the test
asyncio.run(test_pcm_stream())

This Python snippet provides a basic example for verifying PCM output, crucial for real-time audio applications. Always run comprehensive integration tests against your specific use cases and agent logic before deploying to production environments to ensure full compatibility and stability.

Roadmap: What’s Next for OpenClaw?

The 2026.4.15 release represents a significant milestone, laying crucial groundwork for an exciting future for OpenClaw. The path forward includes expanding multimodal capabilities, diversifying speech providers, and continuously strengthening the framework’s security and stability.

The bundled multimodal capabilities introduced with Claude Opus 4.7 are a precursor to deeper video understanding integration, which is a major focus planned for the upcoming 2026.5.0 release. This will enable agents to process and reason over video streams, opening up new possibilities for advanced monitoring, analysis, and interaction. The TTS architecture, established with Google Gemini, also sets the stage for integrating additional speech providers. ElevenLabs and Azure Speech Service integrations are currently in community review, promising more options for voice synthesis and potentially specialized voice models.

Gateway security enhancements will continue to be a priority, further hardening the tool execution boundary. Expect stricter validation for tool result schemas and the exploration of automated sandboxing for media processing operations, providing an even more secure environment for agent interactions. The core team is also actively exploring local TTS options to complement cloud-based Gemini synthesis. This initiative aims to address privacy concerns for sensitive voice applications and offer more control over data residency.

For OpenClaw builders, the immediate focus should be on stabilizing Opus 4.7 deployments and experimenting with the new voice interfaces to discover novel applications. The framework is steadily approaching semantic versioning stability, with 2026.5.0 targeted as the first Long Term Support (LTS) release. This means developers can prepare their production agents for API contract freezes and extended maintenance windows, ensuring a predictable and stable development lifecycle. The future of OpenClaw is geared towards more intelligent, versatile, and secure autonomous agents.

Frequently Asked Questions

How do I enable Claude Opus 4.7 in OpenClaw 2026.4.15?

To enable Claude Opus 4.7, simply set model: opus or model: claude-opus-4.7 in your agent manifest (agent.yaml). The 2026.4.15 release includes automatic aliases and updates CLI defaults, ensuring that opus automatically routes to the latest 4.7 version without requiring manual configuration. Furthermore, image understanding capabilities are now bundled directly into the Opus 4.7 integration, meaning you no longer need to specify additional vision flags for multimodal tasks; it’s enabled by default.

What audio formats does Google Gemini TTS support?

OpenClaw 2026.4.15 supports two primary audio output formats for Google Gemini TTS, each tailored for different use cases. It supports WAV for standard reply output, which is suitable for playback on desktop clients, web browsers, and mobile applications due to its widespread compatibility and metadata. Additionally, it supports PCM (Pulse-Code Modulation) for telephony integrations, offering raw audio streams optimized for VoIP systems, PBX gateways, and PSTN bridges where low latency and direct codec compatibility are essential. You can configure the desired format in your TTS provider settings based on your downstream audio pipeline requirements.

Is the MEDIA tool passthrough fix backward compatible?

The MEDIA tool passthrough fix introduces a stricter validation mechanism, anchoring trusted local MEDIA passthrough to exact registered built-in tool names. While existing agents using standard tool naming conventions should remain unaffected, you should review your gateway configurations and any custom tool definitions. The fix explicitly rejects client tool definitions with conflicting normalized names that might have previously been accepted. This change is designed to enhance security by preventing potential injection vectors, so auditing your tool names for potential collisions is a recommended step during the upgrade process.

Can I use Google Gemini TTS with local LLMs?

Yes, you can absolutely use Google Gemini TTS in conjunction with local LLMs within OpenClaw. The TTS provider operates independently from your LLM backend, offering maximum flexibility. This architectural separation means you can leverage the reasoning capabilities of local models (e.g., for privacy, cost, or specific performance needs) while offloading the high-quality voice synthesis to Google’s powerful Gemini TTS API. Alternatively, you can use Gemini for both reasoning and speech generation, depending on your project’s requirements and resource availability. This modular design allows you to mix and match components to build optimized AI agents.

Where can I find the setup documentation for the new TTS features?

Comprehensive setup documentation for the new Google Gemini TTS features is included within the bundled google plugin. You can access the detailed guidance in the official OpenClaw documentation, specifically within the provider registration section. This documentation covers all aspects of configuring the TTS capabilities, including how to set up your Google Cloud project credentials, define voice selection parameters, and configure audio output options (WAV vs. PCM) with practical examples. It also addresses advanced topics like dynamic voice selection and integration with agent logic.

Conclusion

OpenClaw 2026.4.15 adds Claude Opus 4.7 support with bundled image understanding and Google Gemini TTS for AI agents. Here's what changed.