Skip to content

Adjust LLM Node Model Parameters

This guide walks you through adjusting the LLM model parameters on an LLM node in your chatbot pipeline. It assumes you already have a chatbot or pipeline open for editing.

Before you start

Choose your LLM model first — see Choose an LLM Model. The parameters available in the settings panel depend on the model you select.

Step 1 — Open the node settings

  1. Open your pipeline or chatbot for editing.
  2. Click the LLM node you want to configure.
  3. Select "Advanced" to expand the settings panel.

Step 2 — Select a model

  1. In the node settings, locate the LLM Model dropdown.
  2. Select the model you want to use.

Step 3 — Adjust temperature or effort

Once you select an LLM model, Open Chat Studio shows only the parameter settings that apply to it. There may be one or more settings.

If the model shows a Temperature setting

Temperature controls how creative or varied the responses are.

  • Slide toward 0 for more consistent, predictable answers (good for factual Q&A or classification).
  • Slide toward 1 for more varied, creative answers (good for creative writing or friendly conversation).
  • The default value of 0.7 is a reasonable starting point for most chatbots.

If the model shows an Effort setting

Effort (or Reasoning Effort) controls how much internal reasoning the model does before answering. It is available on reasoning models.

  • Select a level from the dropdown — low, medium, high, or max.
  • For a description of each level, see Effort in the parameter reference below.

Tip

Start with medium and only raise it if answers are not thorough enough. Higher effort levels cost more tokens and take longer to respond.

Step 4 — Adjust max output tokens (if shown)

The Max output tokens field caps how long the model's response can be. Leave it at the default unless you have a specific reason to limit response length. If you are using a reasoning model, see Max output tokens in the parameter reference below.

Step 5 — Configure adaptive thinking (if shown)

See Adaptive thinking in the parameter reference below for details.

Step 6 — Save your changes

  1. Click outside the edit settings dialog to save.
  2. Test your chatbot to confirm the output looks as expected.

Troubleshooting

The model I want does not appear in the list. Your team may not have that provider configured. Ask your team administrator or see LLM Providers.

Responses are being cut off. The max output token limit may be too low. Raise it in the node settings. If you are using a reasoning model, this is especially common — see Max output tokens in the parameter reference below.


Parameter reference

This section covers the precise behaviour of each parameter. It is intended for advanced users.

Temperature

Value Behaviour
0.0 Fully deterministic.
0.1–0.4 Low randomness. Consistent, predictable outputs.
0.7 Default. Balanced between coherence and variety.
0.8–1.0 High randomness. More varied and creative outputs.

Provider notes: Temperature is supported by most "general-purpose" chat models.

Effort (reasoning effort)

Level What it does
low Fastest and cheapest. Short reasoning — good for routine questions.
medium Balanced default for most tasks.
high More thorough reasoning — useful for complex analysis or multi-step problems.
max Maximum reasoning budget. Slowest and most expensive.

Set the effort level to guide how much the model reasons; set Max output tokens to enforce a hard token cap.

Provider notes: Different providers expose this slightly differently for each of their model versions, so visit the provider websites for details.

Max output tokens

Distinct from the max token limit

This is a different limit from the model's max token limit. Both limits apply simultaneously.

This is a hard cap on generated output tokens only — it does not affect input consumption. If reached, the output is truncated mid-sentence and OCS may not display an explicit error.

OCS provides a default based on the LLM provider default (this varies by model and may be conservative).

Reasoning models — shared budget

On reasoning models, thinking tokens and visible-reply tokens draw from the same max output tokens budget. If the thinking phase exhausts the budget, the model silently produces no visible output.

For high or max effort level, a safe starting point is 2–4× your expected reply length.

Adaptive thinking

When enabled, the model dynamically allocates its reasoning budget per message rather than spending a fixed amount every time — guided by the effort level you set. Easy turns finish quickly; complex ones get more thinking tokens.

Refer to Claude documentation for more on adaptive thinking and what models support it.