Ever wanted to ask a question on a Youtube video and didn’t want to watch the whole 1hr video? Same here. With Open-WebUI and Ollama - you can. Even better: It can be done locally and fairly efficiently.

When I first saw this functionality mentioned in the documentation, I was eager to try it out. Unfortunately, the documentation wasn’t very verbose on getting this to work. It didn’t explain how to prompt with a Youtube video context, and it didn’t explain the settings that you had to set. The internet didn’t provide a very good reference to get this to work either.

This post should serve as a means to explain that. Also, there’s pictures!

Assumption Here

The assumption for this blog post is that you already have Ollama, Gemma3 (or another model), and Open-WebUI already setup. There’s a lot to get to this point, so if you haven’t…go ahead and do that first and come back here.

Setting up your environment (One time only)

Enable the option Hybrid Search under the Admin Panel->Settings->Documents settings page.

Going about prompting on a Youtube video

The video I used was a flight review of China Southern Airlines

Start your chat off with the # character in the chat, copy and paste the YouTube link and wait for it to pop up an option to select the video:

PromptingForYoutube

This will take a minute or two for the UI option to become available. The video is being downloaded during this delay.

After that, ask away. I asked about the video: "What was the narrator shocked about?"

Results

After the response from Gemma, I got the answer back:

“The narrator was shocked to discover free inflight Wi-Fi, as well as to realize that a meal was being served as breakfast at 10:00 p.m. when the local time at their destination would be 7:30 p.m”

Not a bad answer.

To be able to interact with the video was pretty exciting. There are a lot of videos out there that are a lot longer than they should be. This is a great way to ask information and to extract summaries.

In the end I got this kind of response out of Gemma3-12b. (That’s a pretty small model)

More Questions that were asked of this video

My Impression

I’m geniunely impressed. While it’s not perfect, but it’s a great way to interact with informationally dense videos and to keep your data local and private.