Meet Kate: Your AI-Powered, Live Multimodal Website Assistant 🤖

Médéric Hurier (Fmind)
4 min readFeb 15, 2025

--

Imagine a world where you can interact with websites using just your voice, having a conversation with an AI assistant that understands your needs and retrieves the information you’re looking for. That’s the world Kate is building.

Kate is an open-source (MIT), cutting-edge multimodal live assistant that leverages the power of Gemini 2.0 and Vertex AI Search to provide a seamless and engaging user experience. She listens to your questions, understands your intent, and responds with relevant information from the website you’re browsing.

How Kate Works ⚙️

Kate’s magic lies in her ability to combine several powerful technologies:

Architecture of Kate: A Live Multimodal Website Assistant
  • Multimodal Interaction: Kate uses Gemini 2.0’s live multimodal capabilities to process your voice input, generate natural language responses, and even potentially incorporate visual elements like talking animations. This creates a more natural and engaging interaction compared to traditional text-based interfaces.
  • Real-Time Communication: Built on the pipecat framework, Kate integrates with platforms like Daily.co, allowing you to interact with her during live meetings or calls. Imagine getting instant answers to your questions without interrupting the flow of the conversation.
  • Website Search: Kate utilizes Vertex AI Search to accurately search the website’s content and provide you with precise answers to your queries. This ensures you get the most relevant information quickly and efficiently.

The Potential of Live Assistants đź”®

Kate is a glimpse into the future of human-computer interaction. Live assistants like her have the potential to:

  • Make websites more accessible: People with disabilities or those who prefer voice interaction can easily access and navigate websites.
  • Enhance productivity: Quickly find information without typing or clicking through multiple pages.
  • Personalize the browsing experience: Kate can learn your preferences and provide tailored recommendations.
  • Revolutionize customer service: Imagine getting instant support from a knowledgeable AI assistant while browsing a website.
  • Transform education: Students can interact with educational materials in a more engaging and interactive way.

Kate in Action: Real-World Examples 🌍

  • E-commerce: Kate can help you find the perfect product, answer questions about shipping and returns, and even provide personalized recommendations based on your browsing history.
  • Healthcare: Kate can assist patients in finding information about their conditions, scheduling appointments, and accessing medical records.
  • Finance: Kate can help you manage your finances, track your investments, and get answers to your financial questions.

Check out this playlist for a demo of Kate on Open Textbook Library:

YouTube Playlist: https://www.youtube.com/playlist?list=PLPCnNL6Y2PbTzUxmsFICoQj0rx_PmVnk-

Lessons Learned from Building Kate 🎓

Building Kate was an exciting journey, but it also came with its challenges:

  1. Building Live Multimodal Assistance is Hard: Dealing with audio/video codecs, live transmissions, and browser permissions can be tricky and requires advanced knowledge of computers and protocols.
  2. Live Interaction is Dynamic: Conversations can be interrupted at any moment, requiring the assistant to adapt and maintain context, and the developer to work in an asynchronous paradigm.
  3. Toolkits Need to Improve: While building Kate was possible, combining all the necessary tools required perseverance and custom development. Hopefully, frameworks will be released to improve developer experience.
  4. The Magic of Live Interaction: Interacting with Kate feels incredibly natural and removes the friction of typing, making the experience truly mesmerizing. There is a high potential for organizations willing to invest in this new way to interact with computers.

Appreciating the Progress of Generative AI 🚀

Just a couple of years ago, building a live assistant like Kate seemed like a distant dream. Today, thanks to the rapid advancements in generative AI, it’s a reality. Kate is a testament to the progress we’ve made and a reminder that we’re still just scratching the surface of what’s possible.

Kate is open source and available on GitHub. If you’re interested in exploring the future of human-computer interaction, I encourage you to check out the project and contribute to its development. Feel free to reach me on LinkedIn or my website: https://www.fmind.dev/ if you want to build a new solution.

Photo by Andy Kelly on Unsplash

--

--

Médéric Hurier (Fmind)
Médéric Hurier (Fmind)

Written by Médéric Hurier (Fmind)

Freelancer: AI/ML/MLOps/LLMOps/AgentOps Engineer | Data Scientist | Python Developer | MLOps Community Organizer | MLOps Coding Course | MLflow Ambassador | PhD

Responses (1)