Google announced the General Availability (GA) of its Gemini Embedding 2 through the Gemini API and Gemini Enterprise Agent Platform. Launched last week, this sophisticated model allows developers to map diverse inputs—text, images, video, audio, and documents—into a single embedding space, supporting over 100 languages. This capability opens new avenues for applications ranging from multimodal retrieval-augmented generation (RAG) to visual search.
Gemini Embedding 2 is designed to handle an extensive variety of inputs with a single call, accommodating up to 8,192 text tokens, six images, 120 seconds of video, 180 seconds of audio, and six pages of PDFs. By integrating different modalities into one semantic space, developers can create nuanced experiences that interpret proprietary data in more meaningful ways.
One of the model’s standout features is its ability to process interleaved inputs, allowing for combinations of text and images in a single request. This enhances the model’s understanding of complex, real-world data. For developers needing separate embeddings for distinct inputs, the Batch API will soon offer that capability on the Agent Platform.
Applications of Gemini Embedding 2 are already being realized in various sectors. For instance, the legal research platform Harvey reported a 3% increase in Recall@20 precision on legal benchmarks after implementing the model, thereby providing more accurate citations and answers for law firms. Similarly, Supermemory has developed a “vector database for memory” that enables conceptual searching across disjointed memos. Since integrating the model, it has achieved a 40% increase in search Recall@1 accuracy.
The model also serves as a powerful tool for multimodal search. Nuuly, a clothing rental company owned by URBN, has utilized Gemini Embedding 2 for a visual search tool that matches images taken on the warehouse floor against their catalog. This implementation has dramatically improved their Match@20 accuracy from 60% to nearly 87% and boosted their overall product identification rate from 74% to over 90%.
In addition to visual search, the model is adept at enhancing retrieval pipelines. Embeddings can be recalibrated to rerank initial search results, ensuring that users receive the most relevant answers. For instance, developers can calculate distance metrics, such as cosine similarity or dot product scores, between embedded search results and user queries. This approach allows for a more refined selection of the best match based on contextual relevance.
Applications extend to clustering, classification, and anomaly detection as well. By creating clusters based on similarities, users can quickly identify hidden trends or outliers, making this feature ideal for sentiment analysis. The same task prefix can be used for both the query and document, which simplifies the embedding process.
Efficient storage and usage of these embeddings is another key aspect. They can be stored in vector databases such as Agent Platform Vector Search, Pinecone, Weaviate, Qdrant, or ChromaDB. The embeddings generated by Gemini Embedding 2 utilize Matryoshka Representation Learning (MRL), allowing for dimensional reduction to enhance storage efficiency without compromising accuracy. The default 3072-dimensional vectors can be truncated to dimensions of 1536 or 768 for optimal performance.
Gemini Embedding 2 marks a significant advancement in the realm of data interpretation and machine learning, promising to improve how businesses and developers approach complex datasets. As more organizations explore its capabilities, the model is poised to set new benchmarks in the fields of AI-driven search and data retrieval.
For developers eager to implement this groundbreaking model, the Gemini API and Agent Platform provide the necessary tools for diving into multimodal embeddings that enhance understanding across various industries.
See also
Sam Altman Praises ChatGPT for Improved Em Dash Handling
AI Country Song Fails to Top Billboard Chart Amid Viral Buzz
GPT-5.1 and Claude 4.5 Sonnet Personality Showdown: A Comprehensive Test
Rethink Your Presentations with OnlyOffice: A Free PowerPoint Alternative
OpenAI Enhances ChatGPT with Em-Dash Personalization Feature



















































