Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another thing you’re running into is the context window. Ollama sets a low context window by default, like 4096 tokens IIRC. The reasoning process can easily take more than that, at which point it is forgetting most of its reasoning and any prior messages, and it can get stuck in loops. The solution is to raise the context window to something reasonable, such as 32k.

Instead of this very high latency remote debugging process with strangers on the internet, you could just try out properly configured models on the hosted Qwen Chat. Obviously the privacy implications are different, but running models locally is still a fiddly thing even if it is easier than it used to be, and configuration errors are often mistaken for bad model performance. If the models meet your expectations in a properly configured cloud environment, then you can put in the effort to figure out local model hosting.



I can't belive Ollama haven't fix the context window limits yet.

I wrote a step-by-step guide on how to setup Ollama with larger context length a while ago: https://prompt.16x.engineer/guide/ollama

TLDR

  ollama run deepseek-r1:14b
  /set parameter num_ctx 8192
  /save deepseek-r1:14b-8k
  ollama serve




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: