GPTBoost Documentation
  • Welcome
  • First Steps
    • Create Account
    • Authorize Keys
    • OpenAI Integration
    • Azure Integration
    • Streaming
    • Analyze Logs
  • Features
    • Summary & Metrics
    • Request logs
    • Error Logs
    • Models Usage
    • Annotation Agents
    • User Feedback & Voting
      • Integration
      • The Value of User Feedback
  • Advanced
    • Proxy Overview
    • Configuration params
    • Omit Logging
    • GPTBoost Props
    • Namespaces
    • Function Usage
  • Security
    • IP Security
      • Allow Only IPs
      • Block Only IPs
  • Collaborate
    • About Teams
    • Create a Team
    • Invite the Crew
Powered by GitBook
On this page

Was this helpful?

  1. Advanced

Configuration params

Influence your model's output with configuration parameters

PreviousProxy OverviewNextOmit Logging

Last updated 1 year ago

Was this helpful?

Each LLM exposes a set of configuration parameters that can affect the model's output. Such configuration parameters are invoked at inference time and give you control over various things such as output creativity, max number of tokens in the completion, model's likelihood to repeat the same line, etc.

Let's Check a few examples and their usage

  • max_tokens - The maximum number of tokens in the completion

  • temperature - This affects the model's creativity. The value must be integer between 0 and 1. Higher values like 0.9 will make the output more random, while lower values like 0.1 will generate more focused and deterministic completion. Changing the temperature actually alters the predictions that the model will make.

  • top_p - Alternative to sampling with temperature. Basically, you can limit the random sampling to the predictions - e.g. 0.1 means only the tokens comprising the top 10% probability mass are considered.

* Usually, manipulation of either temperature or top_p is recommended, but NOT both.

  • n - you can specify how many chat completion choices are to be generated for each prompt.

These are only a few of the most frequently used parameters. A full list and detailed explanation of all exposed by OpenAI parameters may be found in the official docs - .

Check an example of how to transform your request and start making changes at inference.

# Request to OpenAI and modifying output at inference to be more creative and setting a max_tokens limit
# This example is for v1+ of the openai: https://pypi.org/project/openai/
from openai import OpenAI

client = OpenAI( 
    base_url = "https://turbo.gptboost.io/v1",
    api_key = os.getenv("OPENAI_API_KEY"),
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Tell me an interesting fact about zebras"},
    ], 
    temperature=0.9,
    max_tokens=256
)

print(response.choices[0].message.content)
curl --location 'https://turbo.gptboost.io/v1/chat/completions' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
            "role": "user",
            "content": "How to cheerfully greet a girl in Spanish!"
        }
    ],
    "temperature": 0.9,
    "max_tokens": 33,
    "frequency_penalty": 0,
    "presence_penalty": 0
}'
// This code is for v4 of the openai package: npmjs.com/package/openai
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://turbo.gptboost.io/v1",
});


async function ask_gpt(){ 
    const response = await openai.chat.completions.create({
        model: "gpt-3.5-turbo",
        messages: [{ role: "user", content: "Get me 3 inspirational quotes" }],
        temperature: 0,
        max_tokens: 67,
        frequency_penalty: 0,
        presence_penalty: 0

    });
    console.log(response.choices[0].message.content)
}

ask_gpt()
https://platform.openai.com/docs/api-reference/chat/create