AWS Bedrock Kimi K2/K2.5 Service Fix Guide

Select Language:

If you’re working with Amazon Bedrock in the us-east-1 region and notice that your models aren’t behaving as expected, here’s a straightforward guide to troubleshoot and find a solution.

Recently, some users have experienced a problem where their models, specifically moonshotai.kimi-k2.5 and moonshot.kimi-k2-thinking, are returning only a string of exclamation marks (“!!!!…”). No matter what type of prompt they send, the models use up all their token limit and simply produce padding characters instead of meaningful responses. This issue is especially frustrating because the models were working just fine earlier in the day.

To best understand the problem, try running a simple test directly through the AWS CLI without integrating into your application. Use the converse command with the following parameters:

Model ID: moonshotai.kimi-k2.5 or moonshot.kimi-k2-thinking.
Message: Ask the model to return valid JSON, for example, {"word": "HELLO"}.
Inference settings: Set maxTokens to around 200 and keep the temperature low (like 0.1).

When you do this, observe the response. Normally, you should get a structured JSON output such as { "word": "HELLO" }. But instead, you’ll likely see a response filled with exclamation points, and the API will confirm that it stopped because of reaching the maximum token limit. This indicates the model is running but not producing any useful text.

Initially, you might think of changing some settings to fix this. For example, disabling “thinking” or reducing the reasoning effort. However, these tweaks often don’t work. Many users have tried disabling features or setting parameters like reasoning_effort to lower levels, but the models still only return padding characters. Some of these parameter changes are accepted without error but do not solve the issue.

There’s evidence that the models were functioning correctly earlier on the same day, with successful outputs recorded at around 09:53 UTC. This suggests a regression or bug appeared later, causing the models to malfunction.

The impact of this issue is severe for production applications. It can lead to user-facing errors, such as “The model couldn’t produce a valid response,” and cause timeout errors in your Lambda functions because Bedrock hangs before returning a proper reply. All requests to kimi-k2 and kimi-k2.5 models are affected, and fallback to alternative models is necessary until the problem is resolved.

In summary, if your models return only exclamation points and stop at maximum tokens, it’s likely a bug or regression on the service side. Currently, typical parameter adjustments don’t fix the problem. Keeping an eye on updates from AWS or reaching out to support for assistance is recommended until a fix is deployed.