Context Window Size and Randomness of Output in Gen AI

Context-Window-Size-Randomness-in-Output

When working with Generative AI, there are some terms that we should know to use Gen AI effectively. In this article, We will discuss some common terms in Gen AI: Context window size and randomness in output.

Context window size

Imagine that you have a book of about 300 pages. Then, do you think AI can read them all in one turn and give you a summary of them? The answer is probably Yes if the total content fits the Model context window size. Like the human brain, when you read a phone number 10 digits, when you read to the end, you might forget the first numbers, which means it exceeds your brain’s memory limit, and your brain will automatically get rid of some information. AI can remember the maximum amount of content (count in tokens) that we call context window size. If the size is larger than this, the Model will throw an error. You might notice in some model names there is a number after that model, like gpt3.5-16k, which means this model has a 16k tokens window context. There is a trend and demand with a higher context size. Now, some models can support up to 1M tokens.

Let’s have small tests with A4 paper full of text in font size 13. It’s about 700 tokens, so 1M can read a book of about 1,000,000 / (700 x 2) ~= 714 pages. Most books nowadays have less than 714 pages, so ideally, AI reads every book. Please note that context window size will include input and output in the calculation, so input should always be less than window size; otherwise, output will be cut off. (Calculation based on this page https://platform.openai.com/tokenizer)

Most of the time, we use Gen AI via services like Google and OpenAI… the pricing is based on tokens, so larger input can cost you more money. Regarding the nature of AI models, they can only understand a small amount of tokens at a time. Some techniques help them consume a large amount of tokens. However, the accuracy will be reduced with larger input.

Ramdonness in output and temperature

In RPA (Robotics Procedure Application), if you give a fixed input, it will likely respond in several programmed outputs. Still, AI applications react differently in every call, making AI answers look like humans rather than machines. There is randomness in AI outputs, so in some cases, we need to use AI responses in a specific way; a detailed prompt is necessary to constrain that.

In many language models, temperature configuration is a critical parameter influencing the generated text’s balance between predictability and creativity. This value ranges from 0 to 2. Lower temperatures will give more determined answers, while higher temperatures give more creative answers. Technically, to provide output, AI searches for relevant resources to construct the answers; higher temperature means wider ranges of resources are included. Personal recommendation for temperature: 0.3 for predictable application and 0.7 – 1 for more creative application.


In conclusion, understanding context window size and output randomness is vital for using Generative AI effectively. The context window size limits information processing, while the temperature setting balances predictability and creativity in responses. Mastering these concepts enhances AI utilization, balancing cost, accuracy, and creativity to meet specific needs.