Building Multimodal Search with Milvus: Combining Images and Text for Better Search Results

213 Views

Thanks! Share it with your friends!

You disliked this video. Thanks for the feedback!

Published Feb 20, 2025

Learn how to build a powerful multimodal search application using open-source tools and models. This tutorial demonstrates how to combine image and text search capabilities using Milvus, an open-source vector database. Follow along as we create a three-step system that includes indexing, retrieval, and reranking using Hugging Face models, PyTorch, and Large Language Vision Models. We'll show you how to:

- Set up multimodal search using only open source tools
- Implement image and text embedding with Visualized BGE
- Create and query a vector database using Milvus
- Improve search results with LLVM-based reranking
- Build a practical example searching Amazon product images
Perfect for developers interested in implementing advanced search capabilities that understand both visual and textual content.

Complete code and instructions available on the Zilliz site.