Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find this approach of mixing prompts, code, dependecies loading as slowing down the development. Product people should iterate and test various prompts and developers should focus on code. Am I wrong to expect this?


Not wrong. I think you are probably right. One reason why we aren't seeing that is that the space is constantly evolving and in an era before the best practices solidify and become obvious.

Also, langchain is, at best, not that useful and silly.


What if you're depending on aspects of the LLM output that are caused by the prompt? How can you make sure that the product person doesn't cause the output to lose some of it's more machine readable aspects while giving the product person leeway to improve the prompts?

Maybe there is a way to do this, but my toy fiddlings would encounter issues if I tried to change my prompt in total isolation from caring about the formatting of the output.

To give a concrete example, I've been using local CPU bound LLMs to slowly do basic feature extraction of a very long-running (1000+ chapters) niche fan fiction-esque story that I've been reading. Things like "what characters are mentioned in this chapter?", features which make it easier to go back and review what a character actually did if we haven't been following them for a while.

To get my data from my low-rent LLMs in a nice and semi-machine readable format, I've found it best to ask for the response to be formatted in a bulleted list. That way I can remove the annoying intro prefix/postfix bits that all LLMs seem to love adding ("sure, here's a list..." or "... hopefully that's what you're looking for").

I've found that innocent changes to the prompt, unrelated to my instructions to use a bulleted list, can sometimes cause the result formatting to become spotty, even though the features are being extracted better (e.g. it stops listing in bullets but it starts picking up on there being "unnamed character 1").

I've only been fiddling with things for about a week though, so maybe there's some fundamental knowledge I'm missing about the "LLM app pipeline architecture" which would make it clear how to solve this better; as it is now I'm basically just piping things in and out of llama.cpp

If folks have thoughts on addressing the promp-to-output-format coupling, I'd love to hear about it!


I'm using GPT and its functions, which are basically JSON schemas for its output. It makes the formatting a lot more stable, but even then it's still doing tokenized completion and the function definition is just another aspect of the prompt.

I've done some productive collaborating with someone who only works at the prompt level, but you can't really hand off in my experience. You can do some things with prompts, but pretty soon you are going to want to change the pipeline or rearrange how data is output. Sometimes you just won't get a good response that is formatted the way you want, and have to accept a different function output and write a bit of code to turn it into the representation you want.

Also the function definition (i.e., output schema) looks separate from the prompt, but you absolutely shouldn't treat it like that; every description in that schema matters, as do parameter names and order. You can't do prompt engineering without being able to change those things, but now you will find yourself mucking in the code again. (Though I make the descriptions overridable without changing code.)

Anyway, all that just to say that I agree that code and prompt can't be well separated, nor can pipeline and prompt.


You can use the function definitions and few shot prompting (examples) to great effect together.

E.g. I was trying to build a classifier for game genres from descriptions. I could use the function definitions to easily define all the main genres and add subgenres as enums. That ensured that the taxonomy was pretty much always followed. But then I used examples to coerce the model into outputting in {'strategy':'turn-based'} format rather than {'genre':'strategy', 'subgenre':'turn-based'}. The tokens saved on that could then be used to do more classifications per gpt call, making the whole thing cheaper.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: