Hey, just want to thank you for this suggestion. Spent this morning swapping to open router and changing all my prompts to use tools instead of XML. Not only is Gemma and Gemini much cheaper, the output tokens from the tool call are much less too. Cost to analyse one 20 minute video with 10 snapshots went from $0.21 to $0.009, and I'm even sending full HD snapshots instead of the 960x540 ones I was sending before (to save costs). The results so far are pretty good. It looks like the larger images are giving the model more context, so in some cases making the cheaper models results better than the expensive models. I'm going to run this over a few hundred videos today and see how it goes in bulk!
Yeah, it's been awesome! I'm so excited about tool calls and function use, the possibilities are huge. I ran it over 1494 videos that range in length from a few seconds to over 3 hours. Total duration 260 hours and a total size of 3795 GB. I don't know exactly how long it took to run, as I found some bugs I needed to fix when processing mkv files, but it was probably around 24 hours in total. That wasn't all LLM requests, but also the local Whisper transcription and frame extraction / analysis. I used gemini-3.1-flash-lite-preview for the content analysis and tagging. Analysis cost $9.22 and Tagging cost $2.72 and the results seem great (for comparison, I did 885 videos a few weeks ago with Sonnet and it cost $130 in total). Gemini seems much less verbose than Sonnet, even with the same prompt, so the descriptions are much shorter, but they seem very good. The tagging is great. Another added bonus has been that with the larger screenshots being sent, the LLM can now read much more of the text it sees on screen. Some of my videos are top-down showing me drawing and writing, and now it picks that up, so it's all indexed and searchable. I tested a few models with the RAG Chat feature, and the best one so far is GPT4.1-Mini. Before, when asking questions about the library or a video it was around 4 cents each query, now its averaging about half a cent.
Ie, instead of telling it to generate
give it a function That is actually a JSON schema, and the models do great at it. Here's the claude docs, but they are all similar: https://platform.claude.com/docs/en/agents-and-tools/tool-us...