Hacker Newsnew | past | comments | ask | show | jobs | submit | _009's commentslogin

Back in 2005, I remember working on startups running on Scrum principles. It worked well at the time, we where able to ship, grow the team, and move forward with a nice few-features-per-week cadence, working remotely, on a small team; less than 10. Tt always worked fine, but very centralized and slow, as all-things-dev were at the time.

I worked with ActiveColab in 2007, Skype 2007, Yammer 2009, Trello 2011, Pivotal Tracker 2013, Trello 2016, Confluence 2022, Slack 2013, Google Meet, and sometimes I think, scrum became _less-relevant_ over the years as more advanced product management tools became the norm and the product manager role matured by leveraging them.

These days, it's not rare to see lead developers manage kanban-like boards very effectively, releasing on time, with grace, without the need of a scrum master to coordinate efforts.

I do like asynchronous scrum daily standups using http://geekbot.com on slack, when on-site or/and distributed and doing sprints. I seen this work well on startups going from pre-seed to series B.

Personally, I am fascinated with team dynamics and how they've changed over the years. We are definitely living the best of times as a developer and I still see sparkles of well-applied scrum every now and then that works nicely.


>These days, it's not rare to see lead developers manage kanban-like boards very effectively, releasing on time, with grace, without the need of a scrum master to coordinate efforts.

Sadly, it's also common to see such kanban teams endlessly winging it and slowly losing sight of what they were trying to accomplish, at the same time burning out their teams on an endless stream of tickets and testing without ever taking time to reflect and course correct on their goals.


It's almost as if the process was not the most (not even close) important thing to ship value consistently.


True. But given a team of average ability, good process (ie, good habits and conventions) can definitely mean the difference between success and failure.


I'm sorry if I'm splitting hairs a bit here, but I'd argue 'good enough' process is all you need even with average teams. Keep them focused, limit their in flight work, unblock them, iterations with feedback, etc; I just feel some people spend inordinate amounts of time trying to optimize process when process hits diminishing returns pretty quickly.


>I'd argue 'good enough' process is all you need even with average teams.

That's sort of a tautology, right? If it's 'good enough', that implies it's a good process. In my experience, Scrum is a good enough process, with very little wasted overhead. It keeps the team focused, limits their in-flight work, unblocks them and offers regular iterations with feedback.

I'd agree that over-optimization is sometimes a problem, but when something as simple as scrum fails, it's usually down to the basics, like poor meeting practices, or micromanagement, or something outside of the development process entirely, like badly underbidding the project. No amount of process will save a project that was doomed from the start due to poor budgeting of time or money.


I think 'good enough' is just an expression to mean it doesn't need to be perfect or very elaborate. Unfortunately the term scrum these days is far from precise and does not guarantee it'll be lightweight, but the principles I definitely agree with. I've seen all sorts of things, including people over-focusing on the scrum process and nitpicking about all sorts of irrelevant things. I've worked with and without scrum masters in teams as an IC and manager. I think having a scrum master is often unjustified overhead, but having an experienced SM in new teams or teams with an inexperienced manager or lacking in soft-skills, it can help fill the gaps.

And yeah you said it well at the end. There are many other things that I believe are more relevant than the process, but you do need a process teams can follow.


For me, one of the coolest thing about the Robocop movie series is that, each Robocop movie is about the next Robocop. Robocop 1 is about Murphy being the first Robocop, Robocop 2 is about Cain becoming the second Robocop, and Robocop 3 is about the asian looking Robocops -- the third generation.

> my friends call me murphy you call me robocop


AWS is the next Oracle


This is like saying "the market will crash". Anyone can predict that eventually the AWS value proposition will turn sour. What's valuable (and hard) is to predict _when_ this will happen.


Big, expensive, boondoggly, stranglehold on legacy workloads, friends with your C-suite, and engineers and ICs hate it?

What is the opposite?


Cloudflare, HashiCorp, Fly.io, Oxide Computer, Vercel


Render.com too


Heroku* and DigitalOcean too. Hell even Hostgator.

*Heroku is expensive though but is nice to use


And DigitalOcean now supports deploying Docker containers, so there is no reason not to migrate to it.


> so there is no reason not to migrate to it.

Sure, "no reason" if literally the only problem you have to solve is deploying docker containers. While I'm not taking anything away from DO -- it's actually not a bad alternative for people who don't want nor need the complexity of enterprise cloud solutions -- it's still a huuuge exaggeration to say "no reason" when Docker containers is just a drop in the ocean (pun intended) of the scale of problems that AWS (attempts to) solve.


Heroku runs on AWS, doesn’t it?


Heroku:

* Big NO

* expensive YES

* boondoggly NO

* stranglehold on legacy workloads NOT SURE

* friends with your C-suite NO

* engineers and ICs hate it NO

It hits many of the pain points, not all.

Heroku is non-leaky in it's use of AWS. If they switch to something else under the hood you wouldn't know and wont need to change code. The only leaky part is the pricing of someone who is buying wholesale and selling retail.


AI doesn't scale well. Problems get worst as you make your model bigger and more generalized. To make things worst, data, model architecture, precision, hardware, affect your model performance in ways that are hard or impossible to anticipate.

If you watch Tesla's AI presentation, https://www.youtube.com/watch?v=HUP6Z5voiS8, you will notice that they have multiple AI's stacked on each other, which IMO is a step back from truly e2e multimodal AI system. So even with their custom fancy hardware, multimodal is too hard.

I wonder, wouldn't it be better to use geo fencing (using H3), and have the car download the model depending on the zone where it is driving? And optimize multiple models based on "driver engagements"? This could fix the problem of zones where there are particularities in the driving, road, or human activities, and allow for model optimization to happen on a smaller vector space than the whole world. For example, why not have a model for US highways, LA, New Deli, UK, so on.

Tesla also knows where the cars are, and control their expansion plans worldwide, which could inform model prioritization roadmap.

In my mind, it will be easier to test, debug, label, optimize, and guarantee quality to users, that at the end of the day, without knowing exact statistics, I am dare to say spend more than 70% of the time driving around the same county/city/area/town?


One of the most brilliant engineers out there. A true madman with an old hacker mentality that is nowhere to be seen these days, except for maybe George Hotz...

Old days where different, today, it's about leetcode and being overly happy on zoom calls, and playing along investors playbooks... Capitalist only left the hoodies, and that's because another 100m funded startup from their portfolio are selling them.

Feeling nostalgic...


While garbage in, garbage out may seem like a bad policy to the user, to the AI system, it means that it can have a closed feedback loop, where the final code (the solution) can be linked to the initial input, regardless if the input was garbage or not.

I would say that anything that can be stated as a large-scale supervised reinforced learning problem is a gold mine -- if the output of course, has value and supervision is free.

Tesla self-driving and Comma.ai, from an eagle's eye view, exploit the same concept.


Fascinating stuff. There nice trajectory animations on wikipedia, https://en.wikipedia.org/wiki/OSIRIS-REx

From wikipedia:

> Such asteroids are considered "primitive", having undergone little geological change from their time of formation. In particular, Bennu was selected because of the availability of pristine carbonaceous material, a key element in organic molecules necessary for life as well as representative of matter from before the formation of Earth. Organic molecules, such as amino acids, have previously been found in meteorite and comet samples, indicating that some ingredients necessary for life can be naturally synthesized in outer space.


Great!

Wow, hardware already made it to low-orbit, https://upsat.gr/?p=418 (2017-04-20)

> At April 18th 11:11 EDT at Cape Canaveral in Florida, an Atlas-V rocket launched a Cygnus cargo spacecraft to dock to the Internation Space Station with supplies and scientific experiments. Among its cargo UPSat, the first open source hardware and software satellite bound to be released in orbit by the NanoRacks deployment system on-board ISS in the coming weeks.


Apart from search, ANNs can be use for recommendations, classification, and other information retrieval problems.

Currently, ES and Solr, both based on Lucene, can't really manage vector representations, as they are mainly based on inverted indexes to n-grams.

ANNs potential applications extend to audio, bioinformatics, video, among any modality that can be represented as a vector. All you need is an encoder! How nice.

Faiss is definitely powerful. I have been running experiments using 80 million vectors that map to legal documents, and vectorizing protein-folds (using Alphafold). While it is an interesting technology, at this moment, perhaps for my usecases, I see it more as a lib or tool than a full-featured product like ES or Solr.

For instance, ATM, updating a Faiss index is a non trivial process, with many of the workflow tools you would expect in ES missing. There is also the problem of encoding the input into vectors, which takes a few milliseconds (do you batch, parallelize, are you ok with eventual consistency?).

I recently been found with pgvector (postgres + vector support) https://github.com/ankane/pgvector. Perhaps less performant, but easier to work with for teams. With support of migrations, ORM, sharing, and all the postgres goodies.

Another interesting/product-ready alternative is https://jina.ai.

And Google's ScaNN, https://www.youtube.com/watch?v=0SvrDtnUgV4


> "Currently, ES and Solr, both based on Lucene, can't really manage vector representations"

Lucene does have an ANN implementation due in 9.0, based on HNSW - see https://issues.apache.org/jira/browse/LUCENE-9004 for details. See also https://issues.apache.org/jira/browse/SOLR-12890 and https://issues.apache.org/jira/browse/SOLR-14397 for Solr.



> Apart from search, ANNs can be use for recommendations, classification, and other information retrieval problems.

Yeah, those are all good use cases. I was wondering about a different thing: will the demand for a distributed vector search service concentrate to a few big companies, as smaller companies can use a simpler solution so they don't really need to pay for the technology.

> Currently, ES and Solr, both based on Lucene, can't really manage vector representations, as they are mainly based on inverted indexes to n-grams.

ES has kNN plugin, which stores vectors separately in each segment in Lucene index. Plus, they can also use better storage formats and algorithms.


Google has a distributed embedding matching service in preview: https://cloud.google.com/vertex-ai/docs/matching-engine/over...

I guess it depends on what you mean by "simple". The algorithms are complex but there are good tools that implement them. I would imagine smaller companies would use off the shelf tooling, and I would argue that is simpler. Vector embeddings are so unbelievably powerful and often yield better results than classical methods with one of the good tools + pretrained embeddings.

Specifically for search, I use them to completely replace stemming, synonyms, etc in ES. I match the query's embedding to the document embeddings, find the top 1000 or so. Then I ask ES for the BM25 score for that top 1000. I combine the embedding match score with BM25, recency, etc for final rank. The results are so much better than using stemming, etc and it's overall simpler because I can use off the shelf tooling and the data pipeline is simpler.


> I match the query's embedding to the document embeddings,

I assume the doc size is relatively small, otherwise a document may contain too many different topics that make it hard to differentiate different queries?


For my search use case, documents are mostly single topic and less than 10 pages. However I have found embeddings still work surprisingly well for longer documents with a few topics in them. But yes, multi-topic documents can certainly be an issue. Segmentation by sentence, paragraph, or page can help here. I believe there are ML-based topic segmentation algorithms too, but that certainly starts making it less simple.


The moment you cross 10M items or 100 QPS then scaling such a system becomes non-trivial. That's not a high threshold for any enterprise software company handling customer data or any consumer tech company with >10M users. Once you add other requirements to the mix, such as index freshness and metadata filtering, the managed options where this is already built-in start to become compelling even at lower volumes.

Also, Pinecone (disclosure: I work there) has usage-based billing that starts at $72/month, so "paying for the technology" is not that scary.


Well, OpenSearch already has ANN search plugin. Implementation currently uses nmslib. I am using it and it works fine.

https://opensearch.org/docs/search-plugins/knn/index/


I've been looking at Faiss, it's on its way to Debian ...

https://packages.qa.debian.org/f/faiss.html


I remember meeting with KimDotCom lawyer a few years ago in SF. A very flamboyant-person. One of those guys who makes a big impression on you from the first sight.

While KimDotCom is an impressive hacker, I am not sure that I would love to be the target of the FEDs around the world as he is still in the eye of the US.

Fun story about MegaUpload.

I once lived in Argentina back in the MegaUpload days. At the time, piracy was the norm (not only in Argentina), the gov didn't care, and people where selling pirated, burned DVD on the streets. This was a downtown, a high transited area.

Then MegaUpload started to grow like fire, and I remember that starting at 4 pm, the internet would get awfully slow. As people get off their jobs to download the latest movie or episode out there. Then PopcornTime, and things got even worst. Cant find the stat, but I remember something along the line of 60% of Buenos Aires traffic being MegaUpload's at peak time (4-10pm), which caused a lot of controversy at the time.

Old days...


> Old days…

Here in New Zealand we are only just finishing tidying up the fallout from the Dotcom fiasco now. Shame on us.

https://en.m.wikipedia.org/wiki/Kim_Dotcom


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: