Adjust LLM Node Model Parameters¶

This guide walks you through adjusting the LLM model parameters on an LLM node in your chatbot pipeline. It assumes you already have a chatbot or pipeline open for editing.

Before you start

Choose your LLM model first — see Choose an LLM Model. The parameters available in the settings panel depend on the model you select.

Step 1 — Open the node settings¶

Open your pipeline or chatbot for editing.
Click the LLM node you want to configure.
Select "Advanced" to expand the settings panel.

Step 2 — Select a model¶

In the node settings, locate the LLM Model dropdown.
Select the model you want to use.

Step 3 — Adjust temperature or effort¶

Once you select an LLM model, Open Chat Studio shows only the parameter settings that apply to it. There may be one or more settings.

If the model shows a Temperature setting¶

Temperature controls how creative or varied the responses are.

Slide toward 0 for more consistent, predictable answers (good for factual Q&A or classification).
Slide toward 1 for more varied, creative answers (good for creative writing or friendly conversation).
The default value of 0.7 is a reasonable starting point for most chatbots.

If the model shows an Effort setting¶

Effort (or Reasoning Effort) controls how much internal reasoning the model does before answering. It is available on reasoning models.

Select a level from the dropdown — low, medium, high, or max.
For a description of each level, see Effort in the parameter reference below.

Tip

Start with medium and only raise it if answers are not thorough enough. Higher effort levels cost more tokens and take longer to respond.

Step 4 — Adjust max output tokens (if shown)¶

The Max output tokens field caps how long the model's response can be. Leave it at the default unless you have a specific reason to limit response length. If you are using a reasoning model, see Max output tokens in the parameter reference below.

Step 5 — Configure adaptive thinking (if shown)¶

See Adaptive thinking in the parameter reference below for details.

Step 6 — Save your changes¶

Click outside the edit settings dialog to save.
Test your chatbot to confirm the output looks as expected.

Troubleshooting¶

The model I want does not appear in the list. Your team may not have that provider configured. Ask your team administrator or see LLM Providers.

Responses are being cut off. The max output token limit may be too low. Raise it in the node settings. If you are using a reasoning model, this is especially common — see Max output tokens in the parameter reference below.

Choose an LLM Model — guidance on picking the right model for your use case
Large Language Models — conceptual overview of temperature and effort
LLM Providers — configuring provider credentials

Parameter reference¶

This section covers the precise behaviour of each parameter. It is intended for advanced users.

Temperature¶

Value	Behaviour
`0.0`	Fully deterministic.
`0.1–0.4`	Low randomness. Consistent, predictable outputs.
`0.7`	Default. Balanced between coherence and variety.
`0.8–1.0`	High randomness. More varied and creative outputs.

Provider notes: Temperature is supported by most "general-purpose" chat models.

Effort (reasoning effort)¶

Level	What it does
`low`	Fastest and cheapest. Short reasoning — good for routine questions.
`medium`	Balanced default for most tasks.
`high`	More thorough reasoning — useful for complex analysis or multi-step problems.
`max`	Maximum reasoning budget. Slowest and most expensive.

Set the effort level to guide how much the model reasons; set Max output tokens to enforce a hard token cap.

Provider notes: Different providers expose this slightly differently for each of their model versions, so visit the provider websites for details.

Max output tokens¶

Distinct from the max token limit

This is a different limit from the model's max token limit. Both limits apply simultaneously.

This is a hard cap on generated output tokens only — it does not affect input consumption. If reached, the output is truncated mid-sentence and OCS may not display an explicit error.

OCS provides a default based on the LLM provider default (this varies by model and may be conservative).

Reasoning models — shared budget

On reasoning models, thinking tokens and visible-reply tokens draw from the same max output tokens budget. If the thinking phase exhausts the budget, the model silently produces no visible output.

For high or max effort level, a safe starting point is 2–4× your expected reply length.

Adaptive thinking¶

When enabled, the model dynamically allocates its reasoning budget per message rather than spending a fixed amount every time — guided by the effort level you set. Easy turns finish quickly; complex ones get more thinking tokens.

Refer to Claude documentation for more on adaptive thinking and what models support it.