Google is making waves with Gemini, its flagship suite of generative AI models, apps, and services.
So, what exactly is Google Gemini? How can you use it? And how does Gemini stack up against the competition?
To help you keep up with the latest Gemini developments, we’ve put together this handy guide. We’ll keep it updated as new models, features, and news about Google’s plans for Gemini are released.
What is Gemini?
Gemini is Google’s next-gen generative AI model family, developed by Google’s AI research labs, DeepMind, and Google Research. It comes in four versions:
- Gemini Ultra: The most advanced and powerful model.1
- Gemini Pro: A lightweight alternative to Ultra.
- Gemini Flash: A speedier, distilled version of Pro.
- Gemini Nano: Two small models, Nano-1 and Nano-2, designed to run offline on mobile devices.
- All Gemini models are natively multimodal, meaning they can work with and analyze more than just text. They were pre-trained and fine-tuned on a variety of public, proprietary, and licensed audio, images, videos, codebases, and text in different languages.
This multimodal training sets Gemini apart from models like Google’s own LaMDA, which was trained exclusively on text data.
Note: The ethics and legality of training models on public data, sometimes without the data owners’ consent, are complex. Google has an AI indemnification policy to protect certain Google Cloud customers from lawsuits, but this policy has exceptions. Proceed with caution, especially if using Gemini commercially.
Gemini Apps vs. Gemini Models
Google, known for its sometimes confusing branding, made it clear that Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard).
The Gemini apps are clients that connect to various Gemini models—Gemini Ultra (with Gemini Advanced) and Gemini Pro so far—and provide chatbot-like interfaces. Think of them as front ends for Google’s generative AI, similar to OpenAI’s ChatGPT and Anthropic’s Claude family of apps.
Gemini on the Web: Accessible through this link.
On Android: The Gemini app replaces the existing Google Assistant app.
On iOS: The Google and Google Search apps serve as the platform’s Gemini clients.
Gemini apps can accept images, voice commands, text, and soon videos (uploaded or imported from Google Drive) and generate images. Conversations with Gemini apps on mobile carry over to Gemini on the web and vice versa if you’re signed into the same Google Account.
Gemini in Gmail, Docs, Chrome, Dev Tools, and More
The Gemini apps aren’t the only way to use Gemini models. Gemini features are gradually being integrated into staple Google apps and services like Gmail and Google Docs.
To access most of these, you’ll need the Google One AI Premium Plan. This plan costs $20 per month and provides access to Gemini in Google Workspace apps like Docs, Slides, Sheets, and Meet. It also enables Gemini Advanced, which brings Gemini Ultra to the Gemini apps and supports analyzing and answering questions about uploaded files.
Gemini Advanced users get additional perks, like:
Trip Planning in Google Search: Creates custom travel itineraries from prompts, considering flight times, meal preferences, and local attractions.
Gmail and Docs Integration: A side panel in Gmail and Docs that helps write emails, summarize threads, refine content, and brainstorm ideas.
Slides and Sheets: Generates slides and custom images in Slides and tracks and organizes data in Sheets.
Google Drive: Summarizes files and provides quick facts about projects.
Google Meet: Translates captions into additional languages.
Gemini recently came to Google’s Chrome browser as an AI writing tool, helping users write new content or rewrite existing text. It also appears in Google’s database products, cloud security tools, app development platforms (including Firebase and Project IDX), and apps like Google TV, Google Photos, and the NotebookLM note-taking assistant.
Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, leverages Gemini. Google’s security products, like Gemini in Threat Intelligence, also use Gemini to analyze potentially malicious code and allow users to perform natural language searches for ongoing threats or indicators of compromise.
Gemini Gems Custom Chatbots
Announced at Google I/O 2024, Gemini Advanced users will soon be able to create Gems, custom chatbots powered by Gemini models. Gems can be generated from natural language descriptions (e.g., “You’re my running coach. Give me a daily running plan”) and shared with others or kept private.
In the future, Gems will integrate with Google services like Google Calendar, Tasks, Keep, and YouTube Music to complete various tasks.
Gemini Live In-Depth Voice Chats
Exclusive to Gemini Advanced subscribers, Gemini Live will soon let users have “in-depth” voice chats with Gemini. Users can interrupt Gemini while it’s speaking to ask clarifying questions, and it will adapt to their speech patterns in real time. Gemini can also see and respond to users’ surroundings via photos or video captured by their smartphones.
Gemini Live can serve as a virtual coach, helping users rehearse for events, brainstorm ideas, and more. For instance, it can suggest which skills to highlight in an upcoming job or internship interview and provide public speaking advice.
What Can the Gemini Models Do?
Because Gemini models are multimodal, they can perform a range of tasks, from transcribing speech to captioning images and videos in real time. Many of these capabilities have reached the product stage, with more promised in the near future.
However, it’s worth noting that Google has underdelivered in the past, as seen with the original Bard launch and a video showcasing Gemini’s capabilities that turned out to be more aspirational than live. Also, Google has not addressed some underlying issues with generative AI tech, such as encoded biases and the tendency to hallucinate (i.e., make things up).
Assuming Google’s recent claims are accurate, here’s what the different tiers of Gemini can do now and their potential capabilities:
Gemini Ultra
- Helps with tasks like physics homework, solving problems step-by-step, and identifying possible mistakes.
- Identifies relevant scientific papers and extracts information to update charts with timely data.
- Supports image generation natively, without an intermediary step.
Gemini Pro
- An improvement over LaMDA in reasoning, planning, and understanding capabilities.
- Processes up to 1.4 million words, two hours of video, or 22 hours of audio, and can reason across or answer questions about all that data.
- Available through Vertex AI and AI Studio, with fine-tuning and grounding capabilities for specific contexts and use cases.
Gemini Flash
- Small and efficient, designed for narrow, high-frequency generative AI workloads.
- Analyzes audio, video, images, and text but only generates text.
- Suitable for summarization, chat apps, image and video captioning, and data extraction from long documents and tables.
- Gemini Nano
- A much smaller version of the Gemini Pro and Ultra models, efficient enough to run directly on some phones.
- Powers features like Summarize in Recorder and Smart Reply in Gboard on the Pixel 8 Pro, Pixel 8, and Samsung Galaxy S24.
- Future versions of Android will use Nano to alert users to potential scams during calls, and TalkBack will employ Nano to create aural descriptions of objects for low-vision and blind users.
Is Gemini Better Than OpenAI’s GPT-4?
Google has claimed that Gemini Ultra exceeds current state-of-the-art results on 30 of the 32 widely used academic benchmarks for large language models. However, benchmarks may not be the best indicator of model performance. OpenAI’s GPT-4o outperforms Gemini 1.5 Pro on text evaluation, visual understanding, and audio translation, while Anthropic’s Claude 3.5 Sonnet surpasses both.
How Much Do the Gemini Models Cost?
Gemini 1.0 Pro, 1.5 Pro, and Flash are available through Google’s Gemini API for building apps and services, all with free options. However, free options impose usage limits and leave out some features like context caching.
Here’s the base pricing as of June 2024:
- Gemini 1.0 Pro: 50 cents per 1 million input tokens, $1.50 per 1 million output tokens.
- Gemini 1.5 Pro: $3.05 per 1 million tokens input (for prompts up to 128,000 tokens) or $7 per 1 million tokens (for prompts longer than 128,000 tokens); $10.50 per 1 million tokens output (for prompts up to 128,000 tokens) or $21.00 per 1 million tokens (for prompts longer than 128,000).
- Gemini 1.5 Flash: 35 cents per 1 million tokens input (for prompts up to 128K tokens), 70 cents per 1 million tokens (for prompts longer than 128K); $1.05 per 1 million tokens output (for prompts up to 128K tokens), $2.10 per 1 million tokens (for prompts longer than 128K).
Tokens are subdivided bits of raw data, like the syllables “fan,” “tas,” and “tic” in the word “fantastic”; 1 million tokens is equivalent to about 700,000 words. “Input” refers to tokens fed into the model, while “output” refers to tokens that the model generates.
Ultra pricing has yet to be announced, and Nano is still in early access.
Is Gemini Coming to the iPhone?
It might be! Apple and Google are reportedly in talks to put Gemini to use for several features in an upcoming iOS update later this year. Nothing is definitive yet, as Apple is also in talks with OpenAI and has been developing its own generative AI capabilities.
Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with additional third-party models, including Gemini, but didn’t provide additional details.
This post was originally published on February 16, 2024, and has been updated to include new information about Gemini and Google’s plans for it.