Use function calling/tool use, not XML output. The models are all trained for th...

carpo · 2026-05-23T00:26:34 1779495994

Hey, just want to thank you for this suggestion. Spent this morning swapping to open router and changing all my prompts to use tools instead of XML. Not only is Gemma and Gemini much cheaper, the output tokens from the tool call are much less too. Cost to analyse one 20 minute video with 10 snapshots went from $0.21 to $0.009, and I'm even sending full HD snapshots instead of the 960x540 ones I was sending before (to save costs). The results so far are pretty good. It looks like the larger images are giving the model more context, so in some cases making the cheaper models results better than the expensive models. I'm going to run this over a few hundred videos today and see how it goes in bulk!

nl · 2026-05-25T02:32:21 1779676341

Ha! So glad it helped you!

Very interested in the full run details.

carpo · 2026-05-26T23:49:21 1779839361

Yeah, it's been awesome! I'm so excited about tool calls and function use, the possibilities are huge. I ran it over 1494 videos that range in length from a few seconds to over 3 hours. Total duration 260 hours and a total size of 3795 GB. I don't know exactly how long it took to run, as I found some bugs I needed to fix when processing mkv files, but it was probably around 24 hours in total. That wasn't all LLM requests, but also the local Whisper transcription and frame extraction / analysis. I used gemini-3.1-flash-lite-preview for the content analysis and tagging. Analysis cost $9.22 and Tagging cost $2.72 and the results seem great (for comparison, I did 885 videos a few weeks ago with Sonnet and it cost $130 in total). Gemini seems much less verbose than Sonnet, even with the same prompt, so the descriptions are much shorter, but they seem very good. The tagging is great. Another added bonus has been that with the larger screenshots being sent, the LLM can now read much more of the text it sees on screen. Some of my videos are top-down showing me drawing and writing, and now it picks that up, so it's all indexed and searchable. I tested a few models with the RAG Chat feature, and the best one so far is GPT4.1-Mini. Before, when asking questions about the library or a video it was around 4 cents each query, now its averaging about half a cent.

carpo · 2026-05-22T05:39:11 1779428351

Very interesting. Thank you!