More

Oras · 2025-12-08T23:51:51 1765237911

Nice! I always looked for a solution to animate diagrams as it would help a lot in visualising the workflow.

Feedback:

1. I tried different mermaid diagrams from https://mermaid.live/, and your animation is only working with classes and flowcharts. It didn't work with the sequence diagram (which is the most interesting to me).

2. It would be great to control the animation to be a sequence instead of one animation for all arrows at once. What I would like to do is show fellow devs the workflow from start to finish, according to the spec.

I appreciate that this is just a start, but it looks promising and has great potential. Good luck!

bairess · 2025-12-09T00:10:13 1765239013

Thanks for the valuable feedback. I already cover most of the issues that you described. Should be good to go in the next release

Oras · 2025-12-08T16:32:48 1765211568

AI Agent for Kafka Consumer group

/s

Oras · 2025-12-08T12:58:38 1765198718

How does it detect subscriptions? or do I have to enter subscriptions manually?

Oras · 2025-12-05T13:15:43 1764940543

They are agile

Oras · 2025-12-05T09:02:01 1764925321

Went to ahref to check a domain, saw 500 and came here to check.

I have a few domains on cloudflare and all of them are working with no issues so it might not be a global issue

Oras · 2025-11-30T19:34:01 1764531241

People who use these services most likely don’t use ChatGPT

hereme888 · 2025-11-30T22:38:46 1764542326

Why? Did I miss something? There's no indication that OpenAI has been collecting personal information about me (other than typical name, payment info, email) for reasons other than the actual service.

Oras · 2025-11-29T07:44:19 1764402259

The hardest part in RAQ is document parsing. If you only consider text then it should be ok, but once you start having tables, tables going multiple pages, charts, ignore TOC when available, footnotes … etc, that part becomes really hard and accuracy suffers to get the context regardless of what chunking do you use.

There are some patterns to help such as RAPTOR where you make ingestion content aware and instead of just ingesting content, you start using LLMs to question and summarise the content and save that to the vector database.

But reality is, having one size fits all for RAQ is not an easy task.

Royce-CMR · 2025-11-29T07:57:15 1764403035

Super noob in vector embeddings: I never considered that tables would be a complexifier. (beyond defining in a parseable format for ingestion).

Do vector databases do better with long grouped text vs table formats?

Oras · 2025-11-29T09:38:01 1764409081

The issue is the ingestion (extracting the right data in the right format). This is mainly an issue in PDFs and sometimes when there are tables added as images in Docx too. You need a mix of text and OCR extraction to get the data correctly first before start chunking and adding embeddings

Oras · 2025-11-26T10:09:37 1764151777

I really like the idea. I would love a feature to add keywords and see related news.

Oras · 2025-11-23T18:06:43 1763921203

I can believe that many startups are doing prompt engineering and agents but in a sense this like saying 90% of startups are using cloud providers mainly AWS and Azure.

There is absolutely no point of reinventing the wheel to create a generic LLM, spend fortune to run GPUs while there are providers giving this power cheaply

Esophagus4 · 2025-11-23T19:07:43 1763924863

In addition, there may be value in getting to market quickly with existing LLM providers, proving out the concept, then building / training specialized models if needed once you have traction.

See: https://en.wikipedia.org/wiki/Lean_startup

Oras · 2025-11-21T17:57:15 1763747835

I got excited by reading the article about releasing the training data, went to their HF account to look at the data (dolma3) and first rows? Text scraped from porn websites!

https://huggingface.co/datasets/allenai/dolma3

andy99 · 2025-11-21T20:15:28 1763756128

Isn’t this before any curation has happened? I looked at it, I can see why it looks bad, but if they’re really being open about the whole pipeline, they have to include everything. Giving them a hard time for it only promotes keeping models closed.

That said I like to think of it was my dataset I would have shuffled that part down in the list so it didn’t show up on the hf preview

Oras · 2025-11-21T20:20:54 1763756454

Hard time? What value does adult videos description, views and comments add to small (7,32B) models?

andy99 · 2025-11-21T20:26:16 1763756776

It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds

ccgreg · 2025-11-24T20:48:43 1764017323

Common Crawl is a particular dataset. commoncrawl.org

khimaros · 2025-11-21T22:09:07 1763762947

what if that's where they learned how to utilize the double entendre? hard times indeed.

logicchains · 2025-11-21T18:14:29 1763748869

Erotic fiction is one of the main use cases of such models.