Configuration params
Influence your model's output with configuration parameters
Each LLM exposes a set of configuration parameters that can affect the model's output. Such configuration parameters are invoked at inference time and give you control over various things such as output creativity, max number of tokens in the completion, model's likelihood to repeat the same line, etc.
Let's Check a few examples and their usage
max_tokens - The maximum number of tokens in the completion
temperature - This affects the model's creativity. The value must be integer between 0 and 1. Higher values like 0.9 will make the output more random, while lower values like 0.1 will generate more focused and deterministic completion. Changing the temperature actually alters the predictions that the model will make.
top_p - Alternative to sampling with temperature. Basically, you can limit the random sampling to the predictions - e.g. 0.1 means only the tokens comprising the top 10% probability mass are considered.
* Usually, manipulation of either temperature or top_p is recommended, but NOT both.
n - you can specify how many chat completion choices are to be generated for each prompt.
These are only a few of the most frequently used parameters. A full list and detailed explanation of all exposed by OpenAI parameters may be found in the official docs - https://platform.openai.com/docs/api-reference/chat/create.
Check an example of how to transform your request and start making changes at inference.
Last updated