Hacker Newsnew | past | comments | ask | show | jobs | submit | Oras's commentslogin

Nice! I always looked for a solution to animate diagrams as it would help a lot in visualising the workflow.

Feedback:

1. I tried different mermaid diagrams from https://mermaid.live/, and your animation is only working with classes and flowcharts. It didn't work with the sequence diagram (which is the most interesting to me).

2. It would be great to control the animation to be a sequence instead of one animation for all arrows at once. What I would like to do is show fellow devs the workflow from start to finish, according to the spec.

I appreciate that this is just a start, but it looks promising and has great potential. Good luck!


Thanks for the valuable feedback. I already cover most of the issues that you described. Should be good to go in the next release

AI Agent for Kafka Consumer group

/s


How does it detect subscriptions? or do I have to enter subscriptions manually?

They are agile

Went to ahref to check a domain, saw 500 and came here to check.

I have a few domains on cloudflare and all of them are working with no issues so it might not be a global issue


People who use these services most likely don’t use ChatGPT

Why? Did I miss something? There's no indication that OpenAI has been collecting personal information about me (other than typical name, payment info, email) for reasons other than the actual service.

The hardest part in RAQ is document parsing. If you only consider text then it should be ok, but once you start having tables, tables going multiple pages, charts, ignore TOC when available, footnotes … etc, that part becomes really hard and accuracy suffers to get the context regardless of what chunking do you use.

There are some patterns to help such as RAPTOR where you make ingestion content aware and instead of just ingesting content, you start using LLMs to question and summarise the content and save that to the vector database.

But reality is, having one size fits all for RAQ is not an easy task.


Super noob in vector embeddings: I never considered that tables would be a complexifier. (beyond defining in a parseable format for ingestion).

Do vector databases do better with long grouped text vs table formats?


The issue is the ingestion (extracting the right data in the right format). This is mainly an issue in PDFs and sometimes when there are tables added as images in Docx too. You need a mix of text and OCR extraction to get the data correctly first before start chunking and adding embeddings

I really like the idea. I would love a feature to add keywords and see related news.

I can believe that many startups are doing prompt engineering and agents but in a sense this like saying 90% of startups are using cloud providers mainly AWS and Azure.

There is absolutely no point of reinventing the wheel to create a generic LLM, spend fortune to run GPUs while there are providers giving this power cheaply


In addition, there may be value in getting to market quickly with existing LLM providers, proving out the concept, then building / training specialized models if needed once you have traction.

See: https://en.wikipedia.org/wiki/Lean_startup


I got excited by reading the article about releasing the training data, went to their HF account to look at the data (dolma3) and first rows? Text scraped from porn websites!

https://huggingface.co/datasets/allenai/dolma3


Isn’t this before any curation has happened? I looked at it, I can see why it looks bad, but if they’re really being open about the whole pipeline, they have to include everything. Giving them a hard time for it only promotes keeping models closed.

That said I like to think of it was my dataset I would have shuffled that part down in the list so it didn’t show up on the hf preview


Hard time? What value does adult videos description, views and comments add to small (7,32B) models?


It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds


Common Crawl is a particular dataset. commoncrawl.org


what if that's where they learned how to utilize the double entendre? hard times indeed.


Erotic fiction is one of the main use cases of such models.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: