More

JVerstry · 2025-12-10T09:51:11 1765360271

Hands down the best post in years... Tears, tears...

JVerstry · on July 25, 2023

Any serious (retail or not) investor must read 'Consequences of fat tails' by Nassim Taleb. Period.

JVerstry · on April 18, 2023

I have a CS degree and pursued a marketing MBA. To make it short: i) a great product with basic marketing trumps a bad product with "fantastic marketing" on the long ii) the product makes the money, not the marketing iii) one can't hold both a CS and a marketing role at the same time as business grows, you will have to choose one or the other. About learning marketing, practice trumps theoretical knowledge. IMHO: build a great product and team-up with a competent marketer.

pabloxio · on April 21, 2023

This is a great piece of advice, I think will be doing this in the long run

mettamage · on April 18, 2023

Thanks, really helpful advice :)

JVerstry · on April 11, 2021

Many times over, I have been working in companies where they had no other option but to call back coders from retirement to ensure migration and operational activities. There are plenty of aging systems requiring coding skills and experience freshmen can't deliver. Tech can be a lifetime career for sure, you only need to figure out where the demand is and navigate it...

JVerstry · on May 5, 2020

I am was on a mission in south-west France at that time and the local pharmacist told me around February she was convinced Covid was there since November 2019... She saw some of her customers having strong cases flu that would not go...

JVerstry · on April 14, 2019

I bet this discovery is related to another recent discovery: "Blocking IgSF9b in pathologically anxious mice has an anxiolytic effect and normalises anxiety behaviour in these animals." (see https://www.mpg.de/12620765/anxiety-protein-amygdala)

JVerstry · on Feb 2, 2019

Not so sure Docker is the only way forward when it comes to cloud scaling or deployment. VM templates are a very good alternative. They are more stable, more flexible/customizable and integrate more smoothly with CI.

talltimtom · on Feb 2, 2019

Docker, cloud hosted VM from templates, vagrant, Azure, AWS, it doesn’t really matter much to me. The important thing to me is that I no longer have any need for anyone else doing company wide infrastructure. We have several department heads engaged in a battle about who’s going to “own” our “cloud infrastructure” they seem Oblivious to the fact that the only thing we need from them is to negotiate an Azure or AWS subscription and after that they will loose any utility.

A lot of talented people are going to find them self in a problematic situation because the area in which their talents lie is only going to be handled by low wage jobs at google, Azure and Amazon. Even big companies won’t bother setting up their own hardware because the people are costly even if they have slight savings on private hardware.

ownagefool · on Feb 2, 2019

> they seem Oblivious to the fact that the only thing we need from them is to negotiate an Azure or AWS subscription and after that they will loose any utility.

I've seen similar and they tried the following:-

- We'll provision VMs for you. Raise a ticket.

- We're doing "Hub & Spoke". You're not allowed to route any internet traffic except through our inspection proxies.

- We've disabled the API. You can only use the Console.

Basically, a couple of old school guys will do anything they can to disable automation, as otherwise they'll be accepting they can't really contribute anymore.

laurentl · on Feb 2, 2019

The old-school guys also think (rightly in some cases) that they have an added value. 10 years ago I was building a cloud platform and explaining to the security team that they would no longer receive tickets to manually configure routes on firewalls, the customers would do it from a console. I thought they’d be happy to be relieved of a menial, boring task but their reaction was “when we receive a ticket requesting to open all ports from any IP address, we can explain to the customer that it’s a dangerous idea. If they can configure it themselves, who will tell them?”

technion · on Feb 2, 2019

I lived through a "empower the developers with Devops", "prevent you doing menial tasks" project a while back and ended up with:

- A mail server which was an open relay, promptly shutdown for abuse

- Every single internal server on an external IP address with an allow any/any ACL

- Brand new environments built with PHP 5.0 in 2018 to run new development projects (EOL over ten years ago)

- Managers patting themselves on the back about the power of Devops

crooked-v · on Feb 2, 2019

I'll second this. My company does everything via AWS, and the personnel overhead for X00,000 users and regular major updates to everything is... maybe half of a full-time position, and it will be less than that when we finish overhauling the hard-to-scale legacy parts of our system.

theredbox · on Feb 2, 2019

I have the feeling that you have literally no idea what you are talking about. Like at all. This usually comes from self entitles semi-decent full stack developers building same old crappy systems that break apart as soon as they get any decent usage.

Guys not all of you are the mythical 10x engineers working on the ground breaking stuff. Deal with it.

We have had most of these technologies way before they became commoditized. What we have done is that we have made them cheaper and more accessible to your average joe.

Containers ? Give me a break. We have had Solaris Zones provisioning mechanisms at large telcos before any of you even knew what a container is. People have been provisioning jails/zones with a click of a button for ages.

It's funny to me because just 10y ago people like you were yelling and screaming about losing jobs to offshore developers in India and Eastern Europe. There's no apocalypse anytime soon.

Things are getting revamped, they are better, faster and what's more important they are accessible. Just because you know how to use docker does not mean you are able to manage a production ready infrastructure. AWS or any of the big providers are not a silver bullet. Never will be. They are at the end of the day very costly services not suitable for all business.

tyingq · on Feb 2, 2019

Cloud doesn't have to be actually better, cheaper, faster, etc, to consolidate and reduce ops, system admin and network jobs. It just has to be highly popular.

theredbox · on Feb 2, 2019

So the jobs change. They move into "Developer" category where we will hire Developer with domain knowledge of "networks", "systems" or "operations".

I have seen it many times with big data, chatops (we wont ever need to login to the system! we will do everything via hipchat/slack/whatever), openstack (who even needs aws?!).

Yadda yadda.

tyingq · on Feb 2, 2019

The work to manage infrastructure is smaller if most of it is centralized at Amazon or similar. Yes, there's still client side work, but there's an overall shift, consolidate, reduce, pattern.

falcor84 · on Feb 2, 2019

> low wage jobs at google, Azure and Amazon

I have no idea what makes you say that. As far as I'm familiar with these roles, none of them are low paying. These companies tend to pay very well because of the amazing scalability involved in these roles. Any engineering work that scales linearly with the number of users is automated almost immediately and the focus is generally on very high-level work.

talltimtom · on Feb 2, 2019

When you go from 100 corporations. Each having infrastructure teams of 50+ people to one provider running the same thing with 20 people from that pool the wages go down. While there certainly are well paid individuals behind Azure or AWS, they are rare compared to “the people on the floor”, and those people used to be able to rise to senior specialist in companies before, now they just order services from the big players.

Now don’t get me wrong there will always be avenues for the most talented players, but the crisis will come when the 45 out of the 50 strong infrastructure team at every larger corporation are no longer needed, and the last 5 will end up doing work that’s completely unrelated to their prior expertise.

JVerstry · on Dec 23, 2018

This year, it has been: R, Azure, Power BI, deep SQL Server, T-SQL, Machine Learning (+ Data Governance, but it is not a tech). I think next year, I'll go deeper with C# and start F#...

JVerstry · on June 13, 2018

I learned R coming from Java, Node, PHP and Python and I love it !!! It is awful as an application development programming language, but it was never designed for that purpose. It was designed for STATISTICS. Try to achieve advanced statistics with your traditional software engineer's preferred language and see which language you hate then. The only tricky R concepts to learn for newbies are: recycling, formulas and vectorized functions. Add RevoScaleR to R and it kicks major ass when dealing with big data manipulation. Oh yes, big time !!!

AlexCoventry · on June 14, 2018

I'll take python over R any time.

curiousgal · on June 13, 2018

Any recommended ressources for Learning those concepts?

sdabdoub · on June 14, 2018

Hadley Wickham's R for Data Science[1] is a generally good starting reference.

[1] http://r4ds.had.co.nz/

JVerstry · on March 19, 2018

I have doubts about the "We’re actually spending some time on every row" claim. Saving data using columnstore often comes with meta-data saved at the page level, such as min, max and count values for the data in that page. These values are used to filter and optimize the flow of data processed for a query (I mean 'skipped' here). If you run a 'count' query 10 times, it's very unlikely the DB will count rows 10 times. It will rely on the page's existing meta-data when available (i.e., already computed). The tests described in the post are misleading IMHO.

EDIT: This comes on top of the fact that DBs can store queries results too. Moreover the post does not tell whether they have implemented clustered or filtered indexes on the considered columns. It does not explain how partition has been performed too. All this has a big impact on execution time.

nikita · on March 19, 2018

The query in the example is not “count” but a count + group by query. While it is possible to precompute group by results on every possible columns per page we don’t do that and this query does touch every row. The article touches on how this is possible: operations on encoded data, AVX2, not using a hash table for the intermediate result of the group by. And we certainly don’t fake it with storing query results.

We guarantee that the result is legit.

HenryR · on March 20, 2018

What does the columnar format look like? Particularly, is the group by column compressed with RLE? That’s kind of a pre-computed group-by + count that would make this kind of query very very fast :)

nikita · on March 22, 2018

This is an astute observation. Here is the accurate answer from the author of the feature: “We do have special handling for RLE for filters and group by column but in both cases there would be per row component (updating selection vector for filter on RLE, updating aggregates for group by on RLE) and so far I saw RLE encoding usually slower than dictionary encoding for both these cases”

nikita · on March 19, 2018

There is a clustered columnstore index on the table. MemSQL doesn’t support “filtered” indexes (indexes that let you specify a where clause that can be matched in a query with an equal or stricter where clause). Since the query needs a full scan those don’t help anyway.

We partition data on stock symbol but this doesn’t make a difference for this query either. It would if we had filters.

JVerstry · on March 19, 2018

I wouldn't be so sure. If data is partitioned on stock symbols, then there is already a grouping by subset happening at the partition level, which may considerably ease the job of aggregation (it depends on the number of symbols and it seems you have about 1500-1600). Considering there is a clustered index, it means values are saved "in order". Then, a page could easily have a min value = max value situation, and the saved count in meta-data would be valid (no need to scan all values in the page again). Should the index use a dictionary, then one only has to count the number of items in the corresponding entry without scanning the rows. Such information is stored in the index, not in the rows. I have no doubt about one's intention. It's just that I think we could do better to support the claim. If your where clause contained something like (shares mod 3) = 0, then I would be pretty sure all rows would be scanned, because such information is not aggregated at the page level. If possible, I would also check the execution plan for any incongruent values.

nikita · on March 19, 2018

That’s right: partitioning on stock symbol allows to push down LIMIT 10 to each partition, however in this case with 1500 stock symbols it doesn’t buy us much. It’s actually possible to compute a full group by (without a limit) by on every partition and merge them at the end. Merging 1500 groups is computationally trivial.

Yes, shared mod 3 or other predicate would make it impossilbe to run this query in O(metadata). It would of course burn more instructions so we would have to have a bigger cluster to hit a trillion a second as well as have a complex explanation in the blog post why this matters.