Hacker Newsnew | past | comments | ask | show | jobs | submit | peaslock's commentslogin

Maybe not a good idea to link a page that runs source code created by random people. Well CSS is very safe, but still.


You can do quite a bit of tracking with CSS by conditionally loading third party resources. Tracking pixels, loading different images on hover, active, focus, etc can effectively track users

For example some controlled frameworks can even have CSS only keylogging https://css-tricks.com/css-keylogger/

The correct solution is enable a strict Content Security Policy (CSP) - so even when a user compromises your website with XSS/CSS they cannot extract any data they obtain. Note: this website has not configured a Content Security Policy :(


Not if you ask first.


Neural nets often fail with (repetitive) gibberish output when the input is too different from the training data. This model appears to take in the entire text input at once or look ahead at the next input letters, so the unusual "bla bla" at the end can mess up outputs near the beginning.


The "bla bla" actually doesn't do much, that's the "My first" that triggers it most of the time. I only added the "bla bla" in the end to make the line longer because it looks better that way, but just writing "My first" or even "My f" is enough.

It is described as "Realistic handwriting generator. Convert text to handwriting using an in-browser recurrent neural network", so, unlike GPT, it is not a transformer and it is small, so it most likely doesn't take the entire text input as once. Most likely, it simply overshoots the previous stroke and decides that a loop is the most appropriate way to continue, then it overshoots that loop, and again, and again, until by chance it stops overshooting and proceeds to the rest of the text. Cursive style like #2, the need for precise strokes (high legibility) and specific letter transitions seem to exacerbate the problem.


Can DRIZZLE help to achieve higher resolution? Though with hundreds of photos this will imply a lot of work:

https://en.wikipedia.org/wiki/Drizzle_(image_processing)


Geoffrey Hinton has recently been talking about how analog and "imperfect" computing with specialized hardware/circuitry may yield much cheaper neural nets, that could easily be as large as human brains, but would only cost a few dollars and would be extremely cheap to run. Not a new idea, but it is a fairly promising outlook, I think.

https://www.zdnet.com/article/we-will-see-a-completely-new-t...


The model with the most similar name in this list is code-cushman-001 which is described as "Codex model that is a stronger, multilingual version of the Codex (12B) model in the paper".

https://crfm-models.stanford.edu/static/help.html

The next stronger Codex model is called code-davinci-001 which appears to be a fine-tuned version of the GPT-3 Davinci model which is known to have 175B parameters. The model naming is alphabetical in the order of the model size:

https://blog.eleuther.ai/gpt3-model-sizes/

See also A.2 here: https://arxiv.org/pdf/2204.00498.pdf#page=6


Code is the base model in more recent iterations [0]

[0] https://beta.openai.com/docs/model-index-for-researchers


Amazing if this is only a 12B model. If this already increases coding productivity by up to 50% (depending on kind of work), imagine what a 1T model will be capable of! I do wonder if some programmers at FAANG are already having access to a way more powerful coding assistants, and whether they code much at all at this point, or only make high level code specifications and then fix up the automatically generated code.


> If this already increases coding productivity by up to 50% (depending on kind of work)

Does anyone believe that?

edit: I'm surprised to see that (so far) 3 replies actually agree with the statement. Is there a video that you'd recommend that shows realistic usage and gain from copilot? Maybe a livestream or something.


On menial task, it's way more than 50%. For quick scripting, dirty parsing, PoC and plumping, this is about 300% for me.

However, for anything that requires me to think, it's 5% at best.

Don't take up the 50% figure as anything serious, I think it's just a way to state "if it is a such a meaningful boost in productivity".

Which it is, for a lot of tasks, because the vast majority of programming jobs are boring stuff outside of the HN bubble.

It's amazing how much of the world economy runs on csv uploaded to ftp servers.


Agreed with this. If the main bottleneck is typing, then Copilot can dramatically speed up the process. If the bottleneck is thinking, it doesn't help out nearly as much unfortunately.


I'd add that for me at least it's quite good at some small specific subsets of "requires me to think". For example, I do a lot of 3d rotations & transformations, and it's very good at figuring out the math of that based on the function name I chose etc. Most of those would take me a piece of paper and 5-10 mins, but it usually gets it in 1 or 2 tries.

But yes, mundane work it is best at. Some things I have found it made particularly easy:

- scraping websites

- file i/o

- "mirroring" things (I write a bunch of code for doing something on x axis, it automatically replicates it for y and z etc with the right adjustments, or cardinal directions, or arrow keys, etc etc etc)


It is indeed a cheap script boy for me as well

It does mundane work exceptionally well


Sure. I'm way more productive with Copilot. I haven't been coding much lately but I could imagine it would double my productivity with regards to the actual "get an implementation of a thing done" bit of the work.

In terms of design, I had a long conversation with ChatGPT the other day about designing a database, including optimizations that could be made given certain requirements and constraints, etc. It was a big productivity boost, like rubber ducking on steroids.


I tried it to help me optimize some sql, but even after many attempts it didn't really do anything useful for me. The best thing was really to show how the syntax works for features that I rarely use - so in that sense it's a better stackoverflow.


Can you give us an example how it helped to design the database?

I could not think how it would have helped me, but maybe I m limited in my imagination or don’t know how to ask.


I told it I was designing a database. I told it that my database could tolerate failure levels where more than a quorum of nodes failed at a given time. I then asked it about different algorithms for consensus; RAFT, Paxos, swarm based, etc. It described algorithms for me. I told it that in my database I could guarantee certain things, like that every operation commutes, and I asked how that would let me optimize things - it explained that I could paralellize certain parts of those algorithms.

At one point I told it to name the algorithm we had been discussing something like "OptSwim" and we just kept iterating on the idea.


But aren't you afraid that whenever you veer discussion from Wikipedia/stackoverflow type explanations it's likely lying to you? This was my general experience -- it's great at querying for stuff which already exists and is popular on the internet and for conversing on a surface level or broad level but as soon you delve into details it starts confidently lying and/or hallucinating things, which undermines my trust in it, which in turn means I need to verify what it says, which means it did not increase my productivity that much after all.

It routinely invents arguments, functions or concepts which don't exist in reality or don't apply to the current context, but look like they could, so you are even more likely to get caught by this.


Haha, yes, it indeed invents arguments that aren't part of specific APIs and would offer to do something that you'd like to do in a very easy way, but since they actually aren't part of the API, well, you're out of luck.

It's just taking the "I wish they'd thought of my use case when designing that API" on the next level by simply pretending in a very sincere and convincing way that your wish came true, then writing a usually-pretty-correct program around that assumption that would actually work _if that wish had come true_ - but unfortunately that API doesn't really accept this convenient parameter, so...it's not that easy in reality.


Well then. The singularity is here. Almost no humans understand these things.


I think people may be downvoting you because technically, neither does the AI.


I used CoPilot last Advent of Code and really liked it.

This year I recorded most of my days and uploaded them to youtube. So if you want to get a realistic view, take a look here: https://www.youtube.com/channel/UCOqPGQCzgieAOL6iOJjj8hg.

The earlier days you can see it speeds you up a lot. The later days (such as today) you still want to wrap your own head around difficult computer science concepts so it is kind of useless.

Let me know if you have any questions!


I finally got it to do something useful for me the other day. I got it to invert the rendering of rows and columns in a React widget I was writing.

It wasn’t something I actually needed help on, though. When I tried to go further with it and complete more of the task, it got stuck in a loop of just suggesting more and more comments but never offering more code, and then it mysteriously stopped responding at all.

This is the best experience with it I’ve had so far.


Absolutely. 50% feels conservative. The thing is that Copilot becomes so ingrained in your workflow that you don't notice it until internet goes down and you feel completely handicapped. Only then do you realize how much you rely on it.


I haven’t tried Copilot but I’ve used ChatGPT to help with doing Advent of Code in Python (which I don’t use regularly so I forget bits of syntax).

At first I found it very useful to ask it to parse the input. Much faster than looking up three separate docs to piece together what I had in mind.

But then I asked it to parse a more complex input and it just kept failing badly even when I gave it sample inputs and outputs.

I’d say it definitely offers some productivity gains and is worth trying.


A 1T model would be capable of much more than what the current version of Copilot in terms of autocompletion and even code correction. However, at that point, even with a lot of model parallelism to speedup inference, it's likely to be atleast 10x slower on the generation side. From my experience working on Codeium, a Copilot alternative, this would be too frustrating for users. It could be useful as a tool that runs asynchronously that modifies all your code at scale.


Given how fast Copilot is (a few seconds), I wouldn't mind waiting 10x. I also wouldn't mind letting it run overnight for some tasks (ie: write documentation, write tests, suggest bug fixes, etc...). Will check on my buddy on the next morning.


I think the UX of large suggestions will require a lot of thinking and experimentation. That's because the longer the output of such model, higher the risk of it making some mistake. For short completions, it's often easy to identify mistakes from useful suggestions (though sometimes subtle bugs slip in). But for longer completion, it'll get tedious and we might start accepting wrong suggestions.


That sounds like modern day outsourcing


It could be interesting if it was an alternative that a user could query. I could imagine someone starting to write a new function might be willing to wait 10x more time to get something better.


Very true, I think the issue though is unless that output is very likely to be 100% correct, a user would always prefer something that is incomplete but quicker to iterate on. It would be interesting to see if we can get to a paradigm like that.


Though isn't it highly likely that core devs working at the big tech giants have access to 10x-100x faster compute, e.g. some secret TPU successor at Google?


The magical number for performance is actually memory bandwidth which is actually lower for TPUs compared to A100s. They have more aggregate compute, but it's not trivial to use that to get very low latency on a per request basis.


But they have highly likely internal prototypes with higher bandwidth and latency. Also, with distilled latent diffusion one can probably generate text(-images) much faster anyhow as it could produce long chunks of text at once, rather than needing recurrently feed back the new token to the inputs.


In my eyes, the limitation of these models is that they only fit a limited amount of context. Not the complete API of your code base, or the latest version of the libraries you are using. I also don't believe a bigger model would resolve these limitations.

However, I do believe there could be a meta model that can query code and libraries.


Presumably if you had access to them you could fine tune them on your codebase.


Yeah, continuous online learning by fine-tuning seems like an obvious way of making these models recall information from outside the perceptible context. One could also prompt the model to (recursively) summarize code and prepend this summary to each prompt, and/or enable the model to interactively query function definitions or code summaries before outputting a final answer (trained by RLHF). But any such tricks might also quickly be outcompeted by an even more general model, e.g. one that directly controls the GUI and can communicate with coworkers...


It doesn't work like this. A 1T model without architectural changes would not perform substantially better unless it has been trained on a lot more code. The original Codex was trained on 100B tokens, so you could possibly get some gains by increasing the model size but only up to a point. See the Chinchilla paper for reference.


Not necessarily: https://arxiv.org/abs/2206.14486

Also, even with "Chinchilla laws", you still gain performance in a larger model, you just need a lot more data (if just as noisy) to reach the same level of convergence, but a larger model will have already partially converged to a superior model with the same amount data.


I've actually seen this paper before, but I don't think it's helpful. If the entire GitHub is 100B tokens and your prune it down properly, then fine, you can get equal performance with fewer tokes. However, if you want improved performance, you still need more data, not just a larger model size, and that's hard to obtain. I don't think it's a lost cause and we will be be stuck with current performance by any means though - there are other ways to go.


> if you want improved performance, you still need more data

Not true. See figure 2: https://arxiv.org/pdf/2203.15556.pdf#page=5

The loss decreases with greater model size at the same compute budget (i.e. stopping sooner regarding training data). Also some rehearsal/multi-epoch training improves the forgetting rate (thereby improving performance substantially), which hasn't been taken into account by Chinchilla et al. because they train <1 epoch.

https://arxiv.org/abs/2205.12393


No. It shows the opposite. All model sizes converged to a similar loss as the compute increased towards maximum. But larger models had larger loss for a given compute budget.

Their text about Figure 3 confirms what I'm saying: "We find a clear valley in loss, meaning that for a given FLOP budget there is an optimal model to train"


Yes, but the losses in Figure 3 increase because the larger models see fewer data to keep the FLOP budget constant, not because of overfitting. Large models do not overfit very much, so the loss of a larger model will still be better compared to a smaller model when you keep dataset size constant.


Original Codex is Python only.


True. I think they're counting duplicated code though. I don't see any mention of de-duplication in their paper.


'fix up generated code' but do you agree that finding a mistake (without even knowing if it's there) might be even harder than writing from scratch?


It's likely that programmers have this skill somewhere. We all make mistakes when typing in code, and many of them do get found. Some of them don't, that's what we call a bug. So AI isn't exactly breaking any ground here.

I played with ChatGPT and asked it interview questions, and I thought it was a pretty interesting exercise to find its mistakes and get it to fix them. Good tool for training interviewers, perhaps.


We are doing this all the time anyway during code reviews.


Microsoft is FAANG level and beyond.


Microsoft paid for early exclusive access to GPT-3 internals. They're using it to develop things like Power Apps. FAANG are all doing similar and Google in particular at least purports to have models that outperform what OpenAI is doing.


> Plus you don't need to be on the same network

You mean in case of the Wireless Display Adapter? For Miracast you do need to be on the same LAN, right?


No, it makes a WifiDirect connection to the tv concurrent to your normal wifi.


Can you use a different Wi-Fi network at the same time for internet access or does Wi-Fi Direct block any other Wi-Fi access?


Yes, it works simultaneously, that's what I meant with concurrent.


Google will not disappear. They already have much larger neural nets almost ready for deployment, plus they will be able to afford even larger ones in future. And size is all that matters while the techniques are mostly trivial.


How will we justify our existence unable to contribute meaningfully to the economy?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: