Hacker Newsnew | past | comments | ask | show | jobs | submit | dsacco's commentslogin

> These guys averaged a ~9% return over the last few years.

Who are "these guys"? The funds discussed in the article have average annual returns well above 9%.


Overdeck and Seigel.

“The firm’s biggest fund, Spectrum, has earned an annual average return of 9.4% net of fees since 2004.”

The S&P 500 has averaged 9% for 80 years. Joe Blow can buy index fund and do just as well as the quant clients. Sure, these guys are averaging 14% before fees, and it’s a great way to get rich. But after fees the client might as well just passively invest.


You need to consider the following elements: -You compare the S&P and two Six Sigma funds over different timelines. The index yielded much less than 9% on 2004-2015 (you can halve the performance). -What’s the common feature between: the S&P, the formal idea of passive indexing, and the emergence of passive investment funds? None of them are 80 years old. -Retail customers (Joe Blow) rarely have access to hedge fund products directly (they might still be exposed through pension funds, sovereign wealth funds, etc. but it makes the allocation change almost out of their hands). -Stating performance figures alone is mostly a Bloomberg-reader thing. In practice, there is usually at least an attempt to incorporate some kind of risk measure when selecting hedge funds.

I am not even anti-passive funds but you are memeing a bit too hard.


> Could somebody explain why so much effort is being put into quant strategies, when it seems that real-world information gathering would be a much easier way to gain an edge over others?

I used to be part of a research group that sold the so-called "alternative data" you're describing to 30 or so hedge funds in the NYC area, including several of the largest. The example I like to give is that we knew well ahead of time that Tesla would miss on the Model 3 because we knew every vehicle they were selling by model, year, configuration, date and price with <99% accuracy. I still occasionally sell forecasts like this and the methodology is straightforward enough that even a solo investor can consistently beat the market if they know how to source the data. But I've mostly lost faith in this technique as the sole differentiator of a fund's alpha.

Some funds, like Two Sigma, have large divisions with a very sophisticated pipeline for this kind of analysis. They do exactly what you describe. For the most part it works, but there are several obstacles that keep this from being the holy grail of successful trading:

1. First and foremost, this analysis is fundamentally incomplete. You are not forecasting market movements, you're forecasting singular features of market movements. What I mean by that is that you aren't predicting the future state of a price; if the price of a security is a vector representing many dimensions of inputs, you're predicting one dimension. As a simple example, if I know precisely how many vehicles Tesla has sold, I don't know how the market will react to this information, which means I have some nontrivial amount of error to account for.

2. This analysis doesn't generalize well. If I have a bunch of information about the number of cars in Walmart parking lots, the number of vehicles sold by Tesla (with configurations), the number of online orders sold by Chipotle, etc. how should I design a data ingestion and processing pipeline to deal with all of this in a unified way? In other words, my analysis is dependent upon the kind of data I'm looking at, and I'll be doing a lot of different munging to get what I need. Each new hypothesis will require a lot of manual effort. This is fundamentally antagonistic to classification, automation and risk management.

3. It's slow. Under this paradigm you're coming up with hypotheses and seeking out unique and exclusive data to test those hypotheses. That means you're missing a lot of unknown unknowns and increasing the likelihood of finding things that other funds will also be able to find pretty easily. You are only likely to develop strategies which can have somewhat straightforward and intuitive explanations for their relationship with the data.

This is not to say the system doesn't work - it very clearly works. But it's also easy to hit relatively low capacity constraints, and it's imperfect for the reasons I've outlined. You might think exclusive data gives you an edge, but for the most part it does not (except for relatively short horizons). It's actually extremely difficult to have data which no other market participant has, and information diffusion happens very quickly. Ironically, in one of the very few times my colleagues and I had truly exclusive data (Tesla), the market did not react in a way that could be predicted by our analysis.

The most successful quantitative hedge funds focus on the math, because most data has a relatively short half-life for secrecy. They don't rely on the exclusivity of the data, they rely on superior methods for efficiently classifying and processing truly staggering amounts of it. They hire people who are extraordinarily talented at the fundamentals of mathematics and computer science because they mostly don't need or want people to come up with unique hypotheses for new trading strategies. They look to hire people who can scale up their research infrastructure even more, so that hypothesis testing and generation is automated almost entirely.

This is why I've said before that the easiest way to be hired by RenTech, DE Shaw, etc. is to be on the verge of re-discovering and publishing one of their trade secrets. People like Simons never really cared about how unique or informative any particular dataset is. They cared about how many diverse sets of data they could get and how efficiently they could find useful correlations between them. The more seemingly disconnected and inexplicable, the better.

Now with all of that said, I would still wholeheartedly recommend this paradigm for anyone with technical ability who wants to beat the market on $10 million or less (as a solo investor). A single creative and competent software engineer can reproduce much of this strategy for equities with only one or two revenue streams. You can pour into earnings positions for which your forecast predicts an outcome significantly at odds with the analyst consensus. You can also use your data to forecast volatility on a per-equity basis and sell options on those which do not indicate much volatility in the near term. Both of these are competitive for holding times ranging from days to months and, with the exception of some very real risk management complexity, do not require a large investment in research infrastructure.


> The example I like to give is that we knew well ahead of time that Tesla would miss on the Model 3 because we knew every vehicle they were selling by model, year, configuration, date and price with <99% accuracy.

Is the way in which you got that information something you can divulge? I mean, was it talking to an employee or was it something exciting and far fetched? By the way, I presume you meant ">99%" or something similar.

> A single creative and competent software engineer can reproduce much of this strategy

By "this strategy", do you mean prediction based on a source of "alternative data"?

Interesting comment, in any case.


Wow! Thanks for the detailed answer. You introduced a lot of issues I hadn’t thought of, and your last paragraph gave me some ideas.

Also...generally speaking, what does this type of information sell to hedge funds for? For something like the Tesla information for example? I would assume it's probably not millions, but somewhere in the 5-6 figures?


Good example: Tesla had a miss on Model 3 production for Q1, yet the stock rose significantly. And the miss was predicted by both the fan vin tracker and Bloomberg's vin tracker.

I used to work for D. E. Shaw & Co., now I work in Silicon Valley and invest my money in index funds. Much better that way.


Any tips/starting point for the uninitiated?


This is precisely the kind of question for which you won’t find any meaningful, public answer. I’d be thoroughly shocked if you could find someone in the know to give you an answer even anonymously.


This has been the case for a long time in applied mathematics and computer science (not so much pure mathematics). There are hedge funds using work that is not only unpublished, but also unknown to research labs like FAIR and Google Brain. The easiest way to be scouted by one of those funds is to publish research that looks like you’re on the verge of re-discovering their work.


Do you have any proof of this, or is this or is it just your opinion that the comparatively smaller groups of researchers at hedge funds are well ahead of academia and the rest of industry?


1. I don't have proof I can share publicly,

2. It's not just my opinion, and

3. I didn't say they're "well ahead" unilaterally.

This isn't unique to finance; industry labs in tech also often have novel results in applied mathematics and computer science that are ahead of academia and other industry labs. You don't have to believe me but it's not exactly a controversial topic. Not everything is published or patented.


I mean I have little doubt that there are trade secrets that these companies have. Specific algorithms and models. And yeah, industry labs are often ahead here.

But I read your claim as saying that there are broad methods and approaches that they hide. And that's, while possible, more peculiar. Most of the tech industry labs don't keep their theoretical research secret. Practically anything that could be published is.

As for 3, the way you described the "rediscovery" made it sound like those Labs were a number of steps ahead, so I hope you pardon my misunderstanding.


At the highest level there are broad approaches which are kept secret in the financial industry, but the reason that's peculiar is because their efficacy is inherently antagonistic to publicity. Tech firms (mostly) don't lose utility of their trade secrets if they're exposed, they just lose first mover advantages on those techniques. But if everyone is aware of your techniques in finance, your techniques cease to have an edge.

Like I said in the original comment: this isn't (to my knowledge at least) pure mathematics that's being kept secret. But there are absolutely families of techniques and algorithms whose applications to finance are nontrivial, non-incremental and very well guarded.


I guess my only followup would be are these "techniques" more akin (in broadness) to ResNet, or to Dropout? (to use an area that I believe we're both familiar with)

In other words, techniques that are broadly applicable to the field, or techniques that maybe spawn a family of related techniques, but appear to be useful only in a specific subdomain.


That's a good comparison. In general, closer to Dropout.


Offtopic, but I have a really difficult time reading articles like this. I don’t know if this reflects a problem with the style or my ability to focus, but I find it really annoying:

> “SANDHOGS,” THEY CALLED THE LABORERS who built the tunnels leading into New York’s Penn Station at the beginning of the last century. Work distorted their humanity, sometimes literally. Resurfacing at the end of each day from their burrows beneath the Hudson and East Rivers, caked in the mud of battle against glacial rock and riprap, many sandhogs succumbed to the bends. Passengers arriving at the modern Penn Station—the luminous Beaux-Arts hangar of old long since razed, its passenger halls squashed underground—might sympathize. Vincent Scully once compared the experience to scuttling into the city like a rat. Zoomorphized, we are joined to the earlier generations.

This goes on for about seven paragraphs before I have any idea what the article about. I understand “setting the scene” but I can’t tell whether or not to care about an article if it meanders about with this flowing exposition before indicating what its central thesis is.

It seems like a popular style in thinkpieces and some areas of journalism. The author makes a semi-relevant title, provacative subtitle, and five - ten paragraphs of “introduction” that throw you right into the thick of a story whose purpose doesn’t seem clear unless you know what the article is about. Rather than capturing my attention with engaging exposition, I find it takes me out of it. But it must work if it’s so uniquitous; presumably their analytics have confirmed this style is engaging.


It's not the style-- it's just not good writing, but it's trying so hard to be. It's the kind of thing that would show up in a college writing workshop and hopefully get workshopped into something more intelligible. As they say, "Show, don't tell." The passage describes a lot but not in a way that helps you actually visualize any of it, thus it's really hard to follow.


Right, any individual sentence is fine and the idea is probably usable, yet it's not clear how each statement relates to those that came before it.


The next sentence afterwards is a monstrosity:

"But, I explained to my work colleagues as the Princeton local pulled out from platform eight and late-arriving passengers swished up through the carriages in search of empty seats, both the original Penn Station and its unlovely modern spawn were seen at their creation as great feats of engineering."

I had to highlight between the commas to get through that one.


It seems growing up with German is a great preparation for such sentences :)


These complaints about Penn station are also a well-worn cliché.


Just imagine ... that some day in the future journalism will be AI based, and will generate entire articles tailored to your viewing habits based on extensive psych profiling and AB testing to maximize clicks and screen time!

The content need not be true, but at least everyone will be happy with their preferred writing styles....

http://karpathy.github.io/2015/05/21/rnn-effectiveness/


What if I dont want the content Im most interested in. It'd probably give me the computer scoentost version of tabloid, but I prefer making myself read things that I don't fully grasp.


There will be a slider for that.


That actually sounds quite appealing to me. In recent years I noticed that the writing style of a book had a much bigger part in my ability to derive value from it than its content.

If I could read fiction that is written exactly for me I would love it. And as for "non-fiction", I reference check any particularly interesting claim anyway, so I'd be happy to try and use the AI for that too. The way I see it, reading is much more about exercising the brain in thinking about new things than about learning new facts.


The piece is just overwrought at the sentence level, as in the example below. I think it's partially inspired by trying to sound like an old-style important newspaper columnist, and partially David Foster Wallace. DFWs long sentences are very readable though, because they are conversational, so you can understand them perfectly if you read them as though hearing them aloud.


The existence of writing like this is why analytics (and attention) are not good ways of deciding if style and subject are "working". Clearly many people hate it - like Garlic - many people hate Garlic; Garlic fails the attention / analytics test. Pop Tarts pass!

And yet a world of Pop Tarts is sooooooo boring... And no one makes heart stoppingly good fish stew using Pop Tarts.

This fella may not have written the best piece of the week, we may not remember this piece tomorrow - but I think that the fact that he's attempting to create something gives him a chance of actually getting there. Looking at a dashboard completely kills that in my opinion.

Screw the stats! Make what you think is good !


It’s a remnant from the time when we paid by the bundle for longform content and trusted the issuing brand not to waste our time.


Some people enjoy writing and some people enjoy reading. It need not be "to the point" all of the time.


I should clarify: I don't mind "unfocused" writing like this. I can definitely appreciate a creative take on exposition. But I think the introduction of an article is not the most appropriate place to do it. An upfront paragraph - even a few sentences - explaining what is happening would basically resolve this for me.


would you expect the same of a novel? why not similarly temper your expectation given the source; some pieces are simply more literary. (nb. im only speaking generally because i havent read the article). and hey at least we have the comments of hn to scan for the tldr :)


It still applies to creative writing. IIRC, I heard it called a promise in a creative writing course. E.g. if you open with an action scene and the rest of the story has no action, that's a broken promise to the reader. It's useful to give the reader an indication of what they're starting in any piece.


Most novels come with a back cover blurb that tells you what the book is about. And I don't know many people who read novels without first reading the cover.


I actually like the baffler, but I 100% agree. The New Yorker and LRB are similar with the new yorker being far far worse. Its distracting and takes away from the story for me as well, and I love long form journalism.


I appreciate the irony of assuming that the writing style of an article about how data-obsessive engineering cultures strip away the capacity for creative thought and engagement must have been determined by profit-maximizing analytics, especially when the article in question was written for a nonprofit leftist magazine.


These convoluted forms of passive voice do make it harder to parse. It’s almost like the polar opposite of Hemingway’s journalistic style.


> If there are a dozen investors willing to buy stock at $x it’s worth $x.

This is not a correct representation of liquidity, and thinking about it under this definition can be very dangerous. You need to consider:

1. How many shares are there outstanding?

2. What is the ask price of those shares on paper?

3. What is the bid price of those shares by investors willing to purchase them on the private market?

4. How many owners are allowed to sell their shares at the same time?

5. How many owners could realistically find a buyer at the paper ask price of the shares?

6. How many owners could sell their shares before the existing deviation (spread) between the paper ask price and available bid prices changed?

This is not to say your overall point is wrong, it's to say that it can't be defended this way; more importantly, we really shouldn't be simplifying our discussion and its definition of liquidity to the one you've presented here, which is too simplistic. There is a lot of nuance about price discovery between public and private valuation that's missing here. For (one) example, you can maintain an artificially inflated valuation of a private company if there are fewer owners willing/able to sell than there are buyers, despite a relatively larger set of potential owners either not allowed to, or not conveniently capable of, selling their shares. This scenario makes presents an asymmetry between the weighting and availability of positive vs negative price sentiment that is much more easily resolved in the public market.


Of course there’s a lot of nuance there. But certainly we can all agree on two things:

1. YC has not had good returns because it has only had one IPO (apparently selling Twitch and Cruise for $1B each don’t count as a win?)

2. While it’s difficult to know what the true value of YC companies is, the fact that there are nearly 100 companies valued at $100m+ is not just a “vanity metric.” Especially st the later stages of more mature companies there are real dollars trading hands and there’s more liquidity available on secondary markets.


Sure, I agree with those two points.


Can you clarify how terms of changed, or direct me to further reading about that?

Otherwise I think point about the $80B not being liquid is a good one. It's not a dishonest figure, but it's clearly inaccurate and inappropriate for the purpose of estimating returns. The real answer is going to be far more nuanced than simply stating the aggregate value of all YC companies on paper.


It used to be $20k for 7%, now it’s basically $120k for 7% with pro rats rights (that’s again a simplification, but it’s close enough for this conversation).

I agree that n companies valued at over x isn’t a great metric, except that the number is so huge and the valuation so high that a single company could return the entire amount YC has invested, and what we’re debating is if YC are “kingmakers.” By nearly any possible measure they are.


An aquihire is not always a positive exit. More generally, not all liquidity events are positive outcomes for all parties involved, including investors. Many exits are agreed to by all parties to cut their losses and recoup some amount of the original investment.


I think the fact that they're phenomenal is pretty clear, but I think a more important question is how reliable the returns are. They might be reliable, but that doesn't seem obvious to me from the comments in this thread, and it's still not clear to me that YC is capable of catapulting an arbitrary company in their set to wild success ("kingmaking") or that they're capable of repeating those successes over the long term.


They have consistently pumped out many of the top companies in Silicon Valley for over 10 years now. Most of the companies are newer, as always, and I don’t think YC would claim they can pick any random company and make them successful, but if YC isn’t a “kingmaker” then there’s no such thing in Silicon Valley.


It was found that 90 percent of the small businesses fail within one year.

http://smb-trends.com/2011/02/smb-failure-rate-us/

Out of 1280 YC companies, only 139 are listed as dead. Without pulling out a calculator to determine an exact figure, that roughly flips that figure from 90% failure rate to a near 90% success rate. And actually it is worse than that because the 90% failure rate is in the first year and the near 90% success rate for YC is an "all time" figure for the history of YC.

http://yclist.com


I think you should clarify what liquidity means in your anecdote, because liquidity is a function of time and volume. Definitionally speaking, shares in a private company are not as liquid as shares in a public company. So what does "completely liquid" mean? Could every owner of private shares find a buyer if they wanted to? If not, what subset could?

That these figures are not public is a very important discussion point, because it does introduce some level of anecdata and arbitrary speculation into the discussion. That's not to say you're wrong, but it's certainly imprecise and questionable.


It means I had shares that I could sell a ton a moments notice on a secondary market, and so far as I could tell the market for those shares was enormous. I’d guess anyone, including VCs and founders, at the top 50 or so YC companies could sell their shares at any time to a large number of willing buyers, given the ability to do so. Therefore I’d argue that the value of those shares is far different from a “vanity metric.”


> anyone, including VCs and founders

What about non-founder employees?


Yes them too (I wasn’t a founder)


Are there any limitations to who can do what in the secondary markets? (Thanks for your answers to my questions - it's not easy to find information on this and I don't know anybody at a company like this to ask.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: