TimeGPT-1

dongobread · on Oct 14, 2023

As someone who's worked in time series forecasting for a while, I haven't yet found a use case for these "time series" focused deep learning models.

On extremely high dimensional data (I worked at a credit card processor company doing fraud modeling), deep learning dominates, but there's simply no advantage in using a designated "time series" model that treats time differently than any other feature. We've tried most time series deep learning models that claim to be SoTA - N-BEATS, N-HiTS, every RNN variant that was popular pre-transformers, and they don't beat an MLP that just uses lagged values as features. I've talked to several others in the forecasting space and they've found the same result.

On mid-dimensional data, LightGBM/Xgboost is by far the best and generally performs at or better than any deep learning model, while requiring much less finetuning and a tiny fraction of the computation time.

And on low-dimensional data, (V)ARIMA/ETS/Factor models are still king, since without adequate data, the model needs to be structured with human intuition.

As a result I'm extremely skeptical of any of these claims about a generally high performing "time series" model. Training on time series gives a model very limited understanding of the fundamental structure of how the world works, unlike a language model, so the amount of generalization ability a model will gain is very limited.

waltherg · on Oct 14, 2023

Great write-up, thank you. Do you have rough measures for what constitutes high/mid/low- dimensional data? And how do you use XGBoost et al for multi-step forecasting, I.e. in scenarios where you want to predict multiple time steps in the future?

isoprophlex · on Oct 14, 2023

Because they're so cheap to train, you can just use n models if you want to predict n steps ahead.

In sklearn, if you have a single-output regressor, use this for ergonomics: https://scikit-learn.org/stable/modules/generated/sklearn.mu...

The added benefit is that you optimize each regressor towards its own target timestep t+1 ... t+n. A single loss on the aggregate of all timesteps is often problematic

aldanor · on Oct 14, 2023

There's been recent advances in joint fitting of multi-output regression forests ("vector leaf"): https://xgboost.readthedocs.io/en/stable/tutorials/multioutp...

In theory, this might suit the multi-step forecast use case.

jprafael · on Oct 14, 2023

I've found that it works well to add the prediction horizon as a numerical feature (e.g. # of days), and them replicate each row for many such horizons, while ensuring that all such rows go to the same training fold.

yeahwhatever10 · on Oct 14, 2023

Thanks for this write up. Your comment clears up a lot of the confusion I've had around these time series transformers.

How does lagged features for an MLP compare to longer sequence lengths for attention in Transformers? Are you able to lag 128 time steps in a feed forward network and get good results?

asavinov · on Oct 14, 2023

I agree that the conventional (numeric) forecasting can hardly benefit from the newest approaches like transformers and LLMs. I made such a conclusion while working on the intelligent trading bot [0] by experimenting with many ML algorithms. Yet, there exist some cases where transformers might provide significant advantages. They could be useful where the (numeric) forecasting is augmented with discrete event analysis and where sequences of events are important. Another use case is where certain patterns are important like those detected in technical analysis. Yet, for these cases much more data is needed.

[0] https://github.com/asavinov/intelligent-trading-bot Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

teeray · on Oct 14, 2023

> I haven't yet found a use case for these "time series" focused deep learning models.

I guarantee you there will be chartists hawking GPT-powered market forecasts.

kylebenzle · on Oct 14, 2023

That is terrifying but inevitable. Lol, the back end will just be chatgpt API asking, which stock should I buy next?

tchaffee · on Oct 14, 2023

What's terrifying about it?

pfalke · on Oct 14, 2023

Foundational models can work where so far „needs human intuition“ was the state of things. I can picture a time series model with large enough Training corpus being able to deal quite well with typical quirks of seasonalities, shocks, outliers, etc.

I fully agree regarding how things have been so far, but I’m excited to see practitioners try out models such as the one presented here — it might just work.

dr_dshiv · on Oct 14, 2023

Reminds me a bit how in psychology you have ANOVA, MANOVA, ANCOVA, MANCOVA etc etc but really in the end we are just running regressions—variables are just variables.

westurner · on Oct 14, 2023

Re: Causal inference [with observational data]: "The world needs computational social science" (2023) https://news.ycombinator.com/item?id=37746921

recursive4 · on Oct 14, 2023

So fraud is consistent with respect to time?

jldugger · on Oct 14, 2023

My read on this was that you can just dump the lagged values as inputs and let the network figure it out just as well as the other, time series specific models do, not that time doesn't matter.

cbeach · on Oct 14, 2023

I assume the time series modelling is used to predict normal non-fraud behaviour. And then simpler algorithms are able to highlight deviations from the norm?

StephenAshmore · on Oct 15, 2023

I agree! I worked on forecasting sales data for years, and we had the same results.

screye · on Oct 13, 2023

No, Transformers are not a silver bullet.

As much as Transformers feel like the state of the art universal function approximators, people need to realize why they work so well for language and vision.

Transformers parallelize incredibly well, and they learn sophisticated intermediate representations. We start seeing neat separation of different semantic concepts in space. We start seeing models do delimiter detection naturally. We start seeing models reason about lines, curves, colors, dog ears etc. The final layers of a Transformer are then putting these sophisticated concepts together to learn high level concepts like dog/cat/blog etc.

Transformers (and deep learning methods in general) do not work for time series data because they have yet to extract any novel intermediate representations from said data.

At face value, how do you even work with a 'token window' ? At the simplest level, time series modelling is about identifying repeating patterns over very different lifecycles conditioned on certain observations about the world. You need a model to natively be able to reason over years, days and seconds all at the same time to even be able to reason about the problem in the first place. Hilariously, last week's streaming LLM paper from MIT might actually help here.

Secondly, the improvements appear marginal at best. If you're proposing a massive architecture change, removing observability & and explainability .........then you better have some incredible results.

Truth is, if someone identifies a groundbreaking technique for timeseries forecasting, then they'd be an idiot to tell anyone about it before making their first $Billion$ on the market. Hell, I'd say they'd be an idiot for stopping at a billion. Time series forecasting is the most monetarily rewarding problem you could solve. If you publish a paper, then by implication, I expect it to be disappointing.

Xcelerate · on Oct 13, 2023

> Truth is, if someone identifies a groundbreaking technique for timeseries forecasting

It’s really quite simple. Just iterate through all possible monotone universal Turing machines where the input tape consists of all data we can possibly collect concatenated with the time series of interest. Skip the programs that take too long to halt, keep the remaining ones that reproduce the input sequence, then form a probability distribution based on the next output bits, weighted by 2^-(program size).

What’s so hard about that?

justinjlynn · on Oct 14, 2023

Oh, just the death of the planet through waste heat buildup. That's all. Just a small matter of code should fix that.

pyinstallwoes · on Oct 14, 2023

Just like Bitcoin

MrNeon · on Oct 14, 2023

How high is the waste heat of maintaining the value of fiat currency?

pyinstallwoes · on Oct 14, 2023

Not higher than having to perpetually secure a network by computation; which taken to the extreme is essentially a sure-footed destination to causing a black hole by argument of having to use all available space for computational security and incentives.

MrNeon · on Oct 15, 2023

That is... not how a blockchain works at all? Do explain what you mean by it requiring ALL space and ALL compute.

pyinstallwoes · on Oct 15, 2023

That is how it works, precisely. You secure the network with compute. A computer requires physical space to run a computation. Thus maximizes towards using all physical space for incentives driven by network security.

Are you telling me that there is not already physical evidence for such? I assure you there is plenty of evidence for physical space being assimilated by the incentive structures related to bitcoin and its progency.

Taken across time to a civilization that grows into further complexity, there exists a limit into how much space can be used to secure the network, and most likely even incentivizes maximizing capture of space for computational security therefore it accelerates our civilization towards creating a black hole. I couldn't come up with a better way to fast-track our way towards a cosmic environmental disaster. It's a pretty bad incentive structure long-term.

If only we could get rid of artificial scarcity.

MrNeon · on Oct 15, 2023

I want specifics, what is it about the Bitcoin blockchain you think requires ever increasing storage and compute. Specifically.

pyinstallwoes · on Oct 17, 2023

Are you technically competent? Have you read the whitepaper? It’s the fundamental theory of the paper. I am not sure how I can describe it better than the whitepaper itself, which necessarily depends on its substrate which is a substratum which provides compute which necessarily implies a concrete material in physicality. Thus, physical material acts as the mechanism for computation, which Bitcoin depends on for network security, incentivized by the value of the distributed nature of the network thus requiring an ever greater need for compute.

Even in a world where computation doesn’t become more efficient it still takes up the total space available eventually due to the incentives of protecting against network failure.

MrNeon · on Oct 17, 2023

Thanks for telling me that computation requires physical matter. That sure will help getting to the bottom of this. Now could you answer the question? What is it about the Bitcoin blockchain that requires EVER INCREASING compute.

pyinstallwoes · on Oct 17, 2023

Network security. What says it doesn’t? The whitepaper specifically points to computation for security. Computation is not evenly distributed and changes with time in allocation. As a generality.

MrNeon · on Oct 17, 2023

Computation is not evenly distributed and changes with time in allocation. As a generality.

I will NOT grant you this. Please, give me actual technical details on WHY it requires ever increasing compute. You've said network security, what about it requires ever increasing compute.

pyinstallwoes · on Oct 23, 2023

Your stubborance or pedantry is not my concern. You have nothing to grant me. I require no grant of you. You offer me nothing, for you display nothing I lack. You already provide me with what I crave: my own self amusement; so thank you.

You can read the paper and understand the principles it is based on, which is rooted on balancing computational asymmetry amongst other concerns across a network of computers. On the most simple level, if you are aware of hashcash and sybil resistance you should be able to figure it out.

If you're still confused: then answer yourself the question why does the bitcoin algorithm adjust to computational power?

MrNeon · on Oct 24, 2023

You're the stubborn one who refuses to be clear about what you're saying.

Just fucking say it, what's with the running around?

Are you unable to explain it?

pyinstallwoes · on Oct 24, 2023

I already did, if you can’t understand, then you lack a much more fundamental understanding. I recommend learning about how a computer work.

MrNeon · on Oct 24, 2023

You are unable to explain it, showing a clear sign of lack of understanding.

But maybe you have links to others who are able to explain it, not the Bitcoin paper which obviously does not lead one to think network security will subsume all available matter for compute.

pyinstallwoes · on Oct 17, 2023

Network security.

azuric · on Oct 25, 2023

You really don’t understand bitcoin. Empirically even, we know the effect of bitcoin on power consumption.

ganzuul · on Oct 14, 2023

Error: Fake news is prediction output returning to input.

chronic7490 · on Oct 14, 2023

> Truth is, if someone identifies a groundbreaking technique for timeseries forecasting, then they'd be an idiot to tell anyone about it before making their first $Billion$ on the market.

This is correct.

I work in HFT and the industry has been successfully applying deep learning to market data for a while now. Everything from pcaps/ticks to candles.

Why publish your method when it generates $1B+/year in profit for a team of 50 quants/SWEs/traders?

ckastner · on Oct 14, 2023

Are you at liberty to say how high the frequency gets in connection with these models?

I assume the latency is comparably much higher but also wouldn't be surprised if microseconds generally aren't a problem, eg because the patterns detected are on a much larger scale.

akrymski · on Oct 14, 2023

Re candles - even longer term, hourly/daily? Are there actually strategies out there that deliver great sharpe over many years with just time series forecasting? Most hedge funds don't beat the index afaik

ganzuul · on Oct 14, 2023

> Why publish your method when it generates $1B+/year in profit for a team of 50 quants/SWEs/traders?

Do you also believe that megacorporations as the custodians of superintelligence is bad?

golol · on Oct 14, 2023

Time series prediction is always about using the particular features of your distribution of time series. In standard time series prediction the features of the distribution are mostly things like "periodic patterns are continued" or "growth patterns are continued". A transformer that is trained on language data essentially learns time series prediction where a large variety of complex feature appear that influence the continuation. Language data is so complex and diverse that continuing a text necessitates in-context learning: Being able to find some common features in any kind of string of symbols, and using those to continue the text. Just think that language data could contain huge excel tables of various data, like stock market prices, or weather recordings. It is therefore plausible that in-context learning can be very powerful, enough to perform zero-shot time series continuation. Moreover, I believe that due to in-context learning language data + transformer architecture has the potential to really obtain general intelligence like behaviour. General pattern recognition. Language data is complex enough that SGD must lead to general pattern recognition and continuation. We are only at the beginning, and right now we are focused on finetuning which destroys in-context learning. But we will soon train giant transformers on every modality, every string of symboly we can find.

jmpeax · on Oct 14, 2023

You can't forecast the market, all you can do is present an "awesome AI" to suckers and take a slice of their profit with no exposure to their losses.

prohobo · on Oct 14, 2023

The reality is that the market has inefficiencies like human emotion and bot/algorithmic trading which absolutely can be exploited by AI. You just need to train an AI to recognize the inefficiencies, which is exactly what neural networks excel at.

sheeshkebab · on Oct 14, 2023

Oh yeah? I guess you never day traded..

tsumnia · on Oct 14, 2023

> people need to realize why they work so well for language and vision.

I agree with your entire post, however this sentence made me think, well video is just layered vision. Why couldn't frames of vision work similar to vision? We know the current answer is it doesn't, but is it a matter of NNs can't or we haven't figured out the correct way to model it yet?

pizza · on Oct 14, 2023

Why do you say it doesn't work well? I thought it did?

pizza · on Oct 14, 2023

edit: I'm not sure what jdkwkbdbs means (dead-banned) by "LLMs don't. ML works pretty well." (well, I do); LLMs solve certain tasks -- and really interesting ones, at that, too -- at certain average cross-entropy loss levels, and typically form the Pareto front of the models of all sizes in terms of parameter count, at least once you start looking at the biggest that we have, e.g. start above a certain parameter count; and they typically increase in accuracy with respect to parameter count in a statistically significant fashion.

In essence, they represent the state of the art with respect to those specific tasks, as measured at the current time. Though you may desire for there to be a better (cross entropy loss, accuracy percentage so 100% acc) tuple at any given instant (e.g. (optimal expected ce-loss, an actual proof of correctness)), for the current time, it seems like not only do they do the best as a class of models for these sets of tasks, but also improve the fastest in terms of accuracy as a class of models as of late. They're quite noteworthy in that regard, fundamentally, imo. Just my 2c.

peteradio · on Oct 14, 2023

Truth is not immutable through time.

ganzuul · on Oct 14, 2023

Truth is not additive. In moral matters, truth is non-commutative. Truth is however associative.

bfung · on Oct 13, 2023

“Some popular models like Prophet [Taylor and Letham, 2018] and ARIMA were excluded from the analysis due to their prohibitive computational requirements and extensive training times.”

Anyone who work a lot in time series forecasting can explain this in some more details?

I’ve def used ARIMA, but only for simple things. Not sure why this would be more expensive to train and run than a Transformer model, and even if true, ARIMA is so ubiquitous that comparing resources & time would be enlightening. Otherwise it just sounds like a sales pitch and throw more obscure acronyms for a bit of “I’m the expert, abc xyz industry letters” marketing.

LevoMX · on Oct 13, 2023

We love ARIMAs. That is why we put so much effort into creating fast and scalable Arimas and AutoArima in Python [1].

Regarding your valid concern. There are several reasons for the high computational costs. First, ARIMA and other "statistical" methods are local, so they must train one different model for each time series. (ML and DL models are global, so you have 'one' model for all the series.) Second, the ARIMA model usually performs poorly for a diverse set of time series, like the one considered in our experiments. The AutoARIMA is a better option, but its training time is considerably longer, given the number and length of the series. Also, AutoARIMA tends to be very slow for long series. In short: for the 500k series we used for benchmarcking, ARIMA would have taken literally weeks and would have been very expensive. That is why we included many well-performing local "statistical" models, such as the Theta and CES. We used the implementations on our open-source ecosystem for all the baselines, including StatsForecast, MLForecast, and Neuralforecast. We will release a reproducible set of experiments on smaller subsets soon!

[1] https://nixtla.github.io/statsforecast/docs/models/arima.htm...

mvanaltvorst · on Oct 13, 2023

I immediately tried to find a comparison with ARIMA as well and was disappointed. It's difficult to take this paper seriously when they dismiss a forecasting technique from the 70's because of "extensive training times".

a5seo · on Oct 13, 2023

Maybe if your time interval is super short and you have hundreds of years of data? Otherwise, I’m not sure what they’re on about.

tomrod · on Oct 14, 2023

Even then, 500 years of daily data is less than 200k observations, most of which are meaningless for predicting the future. Less than 16B seconds of data. Regression might not handle directly, but linear algebra tricks are still available.

m3at · on Oct 13, 2023

I was surprised too!

While I could find some excuses to exclude ARIMA, notably that in practice you need to input some important priors about your time series (periodicity, refinements for turning points, etc) for it to work decently, "prohibitive compute and extensive training time" are just not applicable.

That part is a bit wanky, but the rest of the paper, notably the zero shot capability, is very interesting if confirmed. I look forward for it to be more accessible than a "contact us" api to compare to ARIMA and others myself

alfalfasprout · on Oct 13, 2023

Excluding prophet and ARIMA makes it hard to take this seriously... those are super widely used.

bllguo · on Oct 14, 2023

are you aware of the significant amounts of criticism for prophet?

arima, sure

lr1970 · on Oct 13, 2023

I have need doing time series forecasting professionally. ARIMA is computationally one of the cheapest (both training and inference) forecasting models out there. It suffers from many deficiencies and shortcomings but computational efficiency is not one of them.

EDIT: typos

loxias · on Oct 14, 2023

> “Some popular models like Prophet [Taylor and Letham, 2018] and ARIMA were excluded from the analysis due to their prohibitive computational requirements and extensive training times.”

Yes, I've done some work in time series forecasting. The above sentence is the one that tipped me off to this paper being BS, so I stopped reading after that. :) I can't take any paper about timeseries forecasting seriously by an author who isn't familiar with the field.

nojito · on Oct 13, 2023

The test data is 300k different time series. There’s no way to do an arima in a reasonable amount of time and/or money on that volume of data

contravariant · on Oct 13, 2023

Eh it's not as if you could just project down the 300k time series to something lower dimensional for forecasting. The TimeGPT would have to do something similar to avoid the same problem.

Though I can't quite figure out how the predicting works exactly, they have a lot of test series but do they input all of them simultaneously?

agnosticmantis · on Oct 13, 2023

Really? And they could do LSTMs?

Even if true, they could take a random subset of size 100 out of the 300k and compare on those.

nojito · on Oct 13, 2023

ARIMA is very very slow and computational expensive.

>Even if true, they could take a random subset of size 100 out of the 300k and compare on those.

Sure...but there's a chance that ARIMA won't even finish training on that subset either.

wokwokwok · on Oct 13, 2023

It doesn’t matter.

If you write a paper and exclude comparisons to state of the art, this what happens.

They could have done something, and didn’t.

“It’s hard so we didn’t” isn’t an excuse, it’s just a lack of rigor.

dist-epoch · on Oct 13, 2023

ARIMA is not very good for anything but highly predictable time series (of the summer-hot winter-cold kind).

carbocation · on Oct 13, 2023

If true, then beating it and looking good will be easy.

Having trained ARIMA models in my day, I will say that long training times and training cost -- compared to any deep learning model -- is not something that ever crossed my mind.

aerhardt · on Oct 13, 2023

Ok but the claim is about high training times… Wtf?

ugh123 · on Oct 13, 2023

High training times could be cost prohibitive. Currently, its over $100mil to train GPT4 from scratch (which possibly includes other costs related to RLHF and data acquisition). Not sure how this model compares, but its likely not cheap.

warkdarrior · on Oct 13, 2023

The claim in the paper was the ARIMA has high training times.

cozzyd · on Oct 13, 2023

Maybe they're using a pure Python implementation or something...

Imnimo · on Oct 13, 2023

This is an extremely content-light paper. There's basically zero information on anything important. Just hand-waving about the architecture and the data. Instead it spends its space on things like the equation for MAE and a diagram depicting the concept of training and inference. Red flags everywhere.

CGamesPlay · on Oct 14, 2023

> To request access, please visit nixtla.io.

Yeah, this is an ad submitted to Arxiv.

bfung · on Oct 14, 2023

I watched the YouTube presentation afterwards (linked elsewhere in this post), and it makes more sense what they’re trying to do.

The guise of the academic paper def throws things off and it’s poorly communicated in the paper what the model benefits are.

Posting a quick write up of the talk on youtube would’ve set the right amount of rigor and expectation. (Needs marketing help, lol)

hackernewds · on Oct 14, 2023

Shouldn't there be some curation? No less, how is this on the front page of HN if not with some nefarious activity

LevoMX · on Oct 14, 2023

Hi,

Max from Nixtla here. We are surprised that this has gained so much attention and are excited about both the positive and critical responses. Some important clarifications:

The primary goal of this first version of the paper is to present TimeGPT-1 and showcase our preliminary findings from a large-scale experiment, demonstrating that transfer learning at this scale is indeed possible in time series. As mentioned in the paper, we deeply believe that pre-trained models can represent a very cost-effective solution (in terms of computational resources) for many applications. Please also consider that this is a pre-print version. We are working on releasing a reproducible set of experiments on a subset of the data, so stay tuned!

All previous work of Nixtla has been open source and we believe TimeGPT could be a viable commercial product, offering forecasting and anomaly detection out of the box for practitioners. Some interesting details were omitted because they represent a competitive advantage that we hope to leverage in order to grow the company and keep providing better solutions and continuing to build our ecosystem.

As some others have mentioned in the thread, we are working to onboard as many people as possible into a free trial so that more independent practitioners can validate the accuracy for their particular use cases. You can read some initial impressions of the creators Prophet [1] and GluonTS [2] or listen to an early test by the people from H20 [3]. We hope to see some more independent benchmarcks soon.

[1] https://x.com/seanjtaylor/status/1694745912776749296?s=20 [2] https://www.linkedin.com/posts/tim-januschowski_foundational... [3] https://youtu.be/N0gyDVUFPlg?si=xH8oy5cjgLm-o_WD&t=457

stathibus · on Oct 13, 2023

This is exactly the kind of thing the academics are warning you about when they say things like "peer review is important" and "don't read arxiv preprints if you're not a subject matter expert"

jdonaldson · on Oct 13, 2023

"Uncertainty is an intrinsic aspect of life, a constant element that humans have tirelessly sought to navigate and comprehend."

I can appreciate people doing things from the heart, but in that case, it better be more poetic than a BBC documentary.

moralestapia · on Oct 13, 2023

Related?

Inverted Transformers Are Effective for Time Series Forecasting

https://news.ycombinator.com/item?id=37848321

m3kw9 · on Oct 13, 2023

Looks like a marketing piece for their product https://www.nixtla.io/

philomath_mn · on Oct 13, 2023

I'd love to be corrected, but this seems to do ~20-30% better than a Seasonal Naive model -- that doesn't sound all that useful?

The zero-shot nature is certainly impressive but it doesn't look like you'd be able to _do_ much with it?

allanrbo · on Oct 13, 2023

Only a hosted API for now it seems? On page 12 they mention https://www.nixtla.io/ .

fwungy · on Oct 13, 2023

Time series works amazing until a fundamental assumption breaks or the time frame extends too far. It's pretty much just drawing the existing pattern out further with mathematical precision. It only works till the game changes.

i-use-nixos-btw · on Oct 13, 2023

Yeah, that’s part of the game though. You can’t get perfection from modelling a complex system with the (relatively) few variables that you can actually measure. Assumptions are always evolving, and are always going to be broken at some point or another.

That’s the opening line, right? Uncertainty is a fact of life. With time series forecasts, the best you can ever hope to do is give probability bounds, and even then you can only really do so by either:

- limiting by the rules of the game (e.g. the laws of physics, or the rules of a stock exchange)

- using past data

The former is only useful if you’re the most risk averse person on the planet, and the latter is only useful if you are willing to assume the past is relevant.

AndrewKemendo · on Oct 13, 2023

Good response. People seem to think that what I call “single pass” inference is the only thing that matters - a monolithic single process system

When in fact the world and intelligent agents inside it are ensembles of ensembles of systems with various and changing confidence that flow and adjust as the world does

modo_ · on Oct 13, 2023

Their CTO did a demo of TimeGPT at their launch event last month [1]. It definitely appears to be a very easy tool to use. Love that it's zero shot!

Regardless, need to see more benchmarks to better understand its true performance. If it holds up it would be a big win for time forecasting.

[1] https://www.youtube.com/watch?v=n7luRRyxLoQ

braza · on Oct 13, 2023

Huge props for the Nixtla team for the tool.

A personal note here is that they could done a better job on the tokens because the announcement was so grandiose and maybe they underestimated people with legit interest.

I’m using all libs from Nixtla with active advocation and I did not have a token; meanwhile lots of guys posting their usages on Twitter.

Difwif · on Oct 13, 2023

Aren't LLM's already zero shot time series predictors? Predicting next token and forecasting seem like the exact same problem. I will admit some small tweaks in tokenization could help but it seems like we're just pretraining on a different dataset.

One idea I was interested in was after reading the paper on introducing pause tokens[1] was a multimodal architecture that generalizes everything to parallel time series streams of tokens in different modalities. Pause tokens make even more sense in that setup.

1. https://arxiv.org/abs/2310.02226

-Arm chair ML practitioner

bob1029 · on Oct 13, 2023

I agree - You could frame LLMs this way. Tokens over "time" where time just happens to be represented by discrete, sequential memory.

Each token could encode a specific amplitude for the signal. You could literally just have tokens [0,1,...,MAX_AMPLITUDE] and map your input signal to this range.

In the most extreme case, you could have 2 tokens - zero and one. This is the scheme used in DSD audio. The only tradeoff is that you need way more samples per unit time to represent the same amount of information, but there are probably some elegant perf hacks for having only 2 states to represent per sample.

There are probably a lot of variations on the theme where you can "resample" the input sequences to different token rate vs bits per token arrangements.

beernet · on Oct 13, 2023

That is technically true, however, they are two very different problems from a domain/business perspective

dopidopHN · on Oct 13, 2023

Sorry I’m not in that field. But I did a fair amount of statistics and probability in schools.

Is that not what it is at this point ? Probabilities ?

amelius · on Oct 13, 2023

The word probability can mean little or much, depending on how large the model is behind it.

Smith42 · on Oct 13, 2023

This isn't the first foundation model for time series, see EarthPT from last month: https://arxiv.org/abs/2309.07207

yumraj · on Oct 13, 2023

for stock price prediction, without taking anything else into account?

Purely on rational basis, this, as in predicting time series data, doesn’t seem plausible. But maybe it is.

emmender1 · on Oct 13, 2023

by definition, any public predictable signals (sustained edge) in stock price time series have been exploited already - thereby nullifying them.

chronic7490 · on Oct 14, 2023

> by definition, any public predictable signals (sustained edge) in stock price time series have been exploited already - thereby nullifying them.

Completely false.

In reality, there are more public signals than can be exploited.

The market is most definitely NOT efficient.

the_optimist · on Oct 13, 2023

This is an advertisement.

blorenz · on Oct 13, 2023

Unfortunately, this is my takeaway too. It feels eerily similar to crypto where they publish a white paper then attempt to profit off their coin.

IOT_Apprentice · on Oct 13, 2023

How else might we know it exists?

the_optimist · on Oct 13, 2023

Well, one way would be word of mouth from performance and experience. This is not that. Furthermore, there’s no way to assess that from this.

iFire · on Oct 13, 2023

I can't find the TimeGPT-1 model.

LICENSE Apache-2

https://github.com/Nixtla/statsforecast/blob/main/LICENSE

Mentions ARIMA, ETS, CES, and Theta modeling

inciampati · on Oct 13, 2023

I guess they don't compare with S4? https://hazyresearch.stanford.edu/blog/2022-01-14-s4-1

apienx · on Oct 13, 2023

I’d recommend this post by Manokhin. Quite insightful and helps get some perspective. https://archive.ph/YqPSC

bethekind · on Oct 13, 2023

They include NHITS in their benchmark, but not much else.

I would have liked to see more detailed benchmarking, such as the electricity, exchange and weather benchmarks.

streamfunk191 · on Oct 14, 2023

Training an LLM on timeseries feels limited, unless I’m missing something fundamental. If LLMs are basically prediction machines, if I have an LLM trained on cross-industry timeseries data, and I want to predict orange futures how much more effective can it be? (Genuine question). Secondly, Isn’t context hyper important? Such as weather, political climate etc.

not2b · on Oct 14, 2023

A long time ago when I was a grad student, I got a consulting job with a radiologist who thought that he could use digital processing techniques to predict the options market well enough to make money. He didn't want to shell out for a real quant; he asked my prof it he knew anyone and I decided I could use the extra money. I came up with some techniques that appeared to produce a small profit, unfortunately it was a hair less than what he'd have to pay on commissions. He wanted to keep pushing but I decided to hang it up. I'm sure that there are people here who know far more about this than my decades-old experiments taught me.

So in principle it could work, but the problem is that these days, the big players are all doing high frequency trading with algorithms that try to predict market swings. And the big guys have an advantage: they are closer to the stock exchanges. They trade so fast that speed-of-light limitations affect who gets the trades in first. So I think the only people who could win with an LLM technique is someone who doesn't need to pay commissions (a market maker, Goldman Sachs or similar) with access to real time data, very close to the exchange so they get it fast.

ezekiel68 · on Oct 14, 2023

> Isn't context hyper important? Such as weather, political climate etc

(tongue firmly in cheek) Here is a bastardization of a memory of some video interview with a quantitative analyst, from over a decade ago:

"Show us your yield data and we'll TELL you your weather and political climate."

winddude · on Oct 13, 2023

Where's the dataset? Without the dataset it's impossible to back-test, for finance for example, I'm assuming a large part of that is US stock tickers, or FRED public data, so it's almost certain it's seen data people would want to back-test on.

Also where's the model?

chrgy · on Oct 14, 2023

Author could probably make more impact if you have open sourced their models, the way it is presented looks like ClosedAI sort of pathway. Meaning using papers as a way to advertise their model for developers.

amelius · on Oct 13, 2023

Perhaps a stupid question, but why train it only on time series data and not in conjunction with e.g. news sources like the Financial Times, etc., since LLMs are good at language so why not use it?

davesque · on Oct 14, 2023

Not sure why this is getting so many upvotes. There is no concrete information in the paper about how the model actually works or what differentiates it from other models.

ComplexSystems · on Oct 13, 2023

Could this be used for audio signal processing to do cool stuff?

LastTrain · on Oct 13, 2023

I’m listening.. what kinds of stuff? Zero latency pitch correction?

henriquenunez · on Oct 13, 2023

ofc we are going to connect this to the stock market

jldugger · on Oct 13, 2023

The M7 forecasting challenge makes this goal explicit. Not the only use of forecasting, and IMO it would be good to have other timeseries data to present to models.

ShamelessC · on Oct 13, 2023

Cool it works on time series.

I’m willing to bet 90% of the people who upvoted this did so based on the title alone. Sorry, no time traveling GPT for you.

LastTrain · on Oct 13, 2023

So you are predicting a future quality?

fsiefken · on Oct 13, 2023

psychohistory?

makach · on Oct 13, 2023

this is my thought as well. reading the paper now.. who would have thought that the quote, "those who don't learn from history is doomed to repeat it" might be useful in predictions.

monlockandkey · on Oct 13, 2023

Harry Seldon?

ShamelessC · on Oct 13, 2023

Psycho mantis?

ShamelessC · on Oct 14, 2023

Fan of asimov but not metal gear - noted.

ShamelessC · on Oct 14, 2023

Not a fan of meta commentary either. Got it.

hrdwdmrbl · on Oct 13, 2023

The secret sauce for about 100 simplistic AI startups is public information now.

gardenhedge · on Oct 13, 2023

which ones?

inopinatus · on Oct 13, 2023

Those who repeat history are doomed to learn from it.

reqo · on Oct 13, 2023

Could this replace quants?

helsinki · on Oct 13, 2023

No, because when every player is using it, they will need something else to give them an edge. Anyways, quants do a lot more than time-series forecasting.

mhh__ · on Oct 13, 2023

Most quants work on the sell-side where direct forecasting is almost irrelevant (to an academic, a trader perhaps not...), in that they are usually attempting to "interpolate" market prices to be able to price derivatives.

chronic7490 · on Oct 14, 2023

> Most quants work on the sell-side where direct forecasting is almost irrelevant

Those aren’t real quants. Even the sell-side quants know they aren’t real quants. For those unfamiliar, sell-side quants typically work at banks like Goldman, HSBC, JPMorgan, etc.

The real quants are buy-side quants/traders: prop shops, hedge funds, endowment/pensions funds, etc.

mhh__ · on Oct 14, 2023

Why would someone working as an algorithmic market maker on the sell-side not be a "real" quant? I work on the buy-side for the record.

dackerlunghack · on Oct 13, 2023

  Out come the fingers made of foam
  They finally open-sourced Rehoboam
  Our societies have been thusly blessed
  What it predicts could have anyone guessed
  Targeted advertising wherever you roam