Configuration params

Influence your model's output with configuration parameters

Each LLM exposes a set of configuration parameters that can affect the model's output. Such configuration parameters are invoked at inference time and give you control over various things such as output creativity, max number of tokens in the completion, model's likelihood to repeat the same line, etc.

Let's Check a few examples and their usage

  • max_tokens - The maximum number of tokens in the completion

  • temperature - This affects the model's creativity. The value must be integer between 0 and 1. Higher values like 0.9 will make the output more random, while lower values like 0.1 will generate more focused and deterministic completion. Changing the temperature actually alters the predictions that the model will make.

  • top_p - Alternative to sampling with temperature. Basically, you can limit the random sampling to the predictions - e.g. 0.1 means only the tokens comprising the top 10% probability mass are considered.

* Usually, manipulation of either temperature or top_p is recommended, but NOT both.

  • n - you can specify how many chat completion choices are to be generated for each prompt.

These are only a few of the most frequently used parameters. A full list and detailed explanation of all exposed by OpenAI parameters may be found in the official docs - https://platform.openai.com/docs/api-reference/chat/create.

Check an example of how to transform your request and start making changes at inference.

# Request to OpenAI and modifying output at inference to be more creative and setting a max_tokens limit
# This example is for v1+ of the openai: https://pypi.org/project/openai/
from openai import OpenAI

client = OpenAI( 
    base_url = "https://turbo.gptboost.io/v1",
    api_key = os.getenv("OPENAI_API_KEY"),
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Tell me an interesting fact about zebras"},
    ], 
    temperature=0.9,
    max_tokens=256
)

print(response.choices[0].message.content)

Last updated