I like that you go beyond just prompt engineering and "LLM as a judge" and use finetuned (?) ModernBert and Llama models.
In your previous post you mentioned that you "score 20+ dimensions". Are these generic dimensions for all use cases / users, or do you finetune individually for each user?
This is much needed on mac os. But why? Back in the day, mac os was supposed to "just work", and linux was for weirdos who had too much time and customized everything. But now, it's the opposite. Vanilla flavor ubuntu just works out of the box. And mac os needs tons of third party customization. Same with external monitors, on mac one needs 3rd party software for properly scaling the UI on a 4k monitor. That's core OS functionality, not something one should need an extension for.
> Emoji suggestion: Slack might suggest emoji reactions to messages using the content and sentiment of the message, the historic usage of the emoji and the frequency of use of the emoji in the team in various contexts. For instance, if [PARTY EMOJI] is a common reaction to celebratory messages in a particular channel, we will suggest that users react to new, similarly positive messages with [PARTY EMOJI].
Finally someone has figured out a sensible application for "AI". This is the future. Soon "AI" will have a similar connotation as "NFT".
"leadership" at my company tallies emoji reactions to their shitty slack messages and not reacting with emojies over a period of time is considered a slight against them.
I had to up my slack emoji game after joining my current employer
> To do this while protecting Customer Data, we might use an external model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.
This is so stupid and needlessly complicated. And all it does is remove personality from messages, suggesting everyone conforms to the same reactions.
Finally. I am all for this AI if it is going to learn and suggest my passive aggressive "here" emoji that I use when someone @here s on a public channel with hundreds of people for no good reason.
I agree that sampling only valid tokens is a very promising approach.
I experimented a bit with finetuning open source LLMs for JSON parsing (without guided token sampling). Depending on one's use case, 70B parameters might be an overkill. I've seen promising results with much much smaller models. Finetuning a small model combined with guided token sampling would be interesting.
Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.
So sad that we need 3rd party apps for even the most basic functionality in mac os. It used to be that linux was a hassle to set up, and mac os worked out of the box. Now it's quite the opposite.
Similar thing with using an external 4k monitor. By default it's blurry (looks like 720p, awful), and one needs a 3rd party app (BetterDisplay) just to get a decent image. I had this issue with several macs and displays and cables. Terrible user experience. Obviously works out of the box with ubuntu, debian, and windows.
I wish it was easy & reliable to run ubuntu on an M2 macbook. That would be perfect - ubuntu is a much more capable & convenient OS at this point, but nothing comes close to the macbook in terms of hardware.
In your previous post you mentioned that you "score 20+ dimensions". Are these generic dimensions for all use cases / users, or do you finetune individually for each user?