A Gemini 2.5 Flash model optimized for cost efficiency and low latency.
The company that provides the model
The number of tokens you can send in a prompt
The maximum number of tokens a model can generate in one request
The cost of prompt tokens sent to the model
The cost of output tokens generated by the model
When the model's knowledge ends
When the model was launched
Capability for the model to use external tools
Ability to process and analyze visual inputs, like images
Support for multiple languages
Whether the model supports fine-tuning on custom datasets
Gemini 2.5 Flash Lite is Google’s cost- and latency-optimized version of the hybrid reasoning Gemini 2.5 Flash model, letting you balance speed, quality, and expense.
It’s free while experimental. Once paid, it’s $0.10 per million input tokens and $0.40 per million output tokens.
It supports up to 1,048,576 tokens (1M+), ideal for very large or complex inputs.
It can generate up to 65,536 tokens in a single response.
Gemini 2.5 Flash Lite launched on June 17, 2025.
Its knowledge cut-off date is January 1, 2025.
Yes. It can process and analyze visual inputs like images.
Yes. It supports function-calling to integrate with external tools.
Yes. It handles multiple languages for both input and output.
No. Fine-tuning is not supported for the Flash Lite variant.
See the Gemini API docs here
Collaborate with thousands of AI builders to discover, manage, and improve prompts—free to get started.