from ai_infra.embeddings import MultimodalEmbeddingsProvider-agnostic embeddings for mixed text and image inputs. Generates a single embedding vector from an ordered sequence of text strings and/or images. Supports interleaved content (e.g. caption + image + follow-up text) where the provider supports it. Supported providers: - voyage: Voyage AI voyage-multimodal-3.5 (single-backbone, best RAG) - cohere: Cohere embed-v4.0 (128K context, multilingual) - google_vertexai: Google multimodalembedding@001 (Vertex AI) - amazon: Amazon Titan image embeddings (AWS Bedrock) Requires at least one of: VOYAGE_API_KEY, COHERE_API_KEY, GOOGLE_APPLICATION_CREDENTIALS, or AWS_ACCESS_KEY_ID.
from pathlib import Path
from ai_infra import MultimodalEmbeddings
emb = MultimodalEmbeddings() # auto-detects provider
# Embed a single image
vector = emb.embed([Path("photo.jpg")])
# Embed image + caption together
vector = emb.embed([Path("photo.jpg"), "a picture of a mountain"])
# Batch embedding
vectors = emb.embed_batch([
[Path("img1.jpg"), "caption one"],
[Path("img2.png"), "caption two"],
])
# Async
vector = await emb.aembed([Path("photo.jpg")])Providers: - voyage / voyage_ai: Voyage AI (VOYAGE_API_KEY) - cohere: Cohere (COHERE_API_KEY) - google / google_vertexai / vertexai: Google Vertex AI (GOOGLE_APPLICATION_CREDENTIALS or GOOGLE_CLOUD_PROJECT) - amazon / bedrock / aws: Amazon Bedrock (AWS_ACCESS_KEY_ID)