*The biggest challenge is figuring out how to store the text document in memory....

symplee · on Dec 14, 2019

Check out: https://en.wikipedia.org/wiki/Rope_(data_structure)

>> A rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.

markus_zhang · on Dec 15, 2019

I just read from somewhere that VSCode uses a "piece tree" for code strings. Damn couldn't find the link...

appstorelottery · on Dec 15, 2019

https://en.wikipedia.org/wiki/Piece_table

markus_zhang · on Dec 15, 2019

Thanks yeah that's the name

bryal · on Dec 14, 2019

> there's no real need to make your implementation more complicated than a single array

Yeah, good luck enabling line numbers in such an editor.

In Emacs, which uses a gap-buffer for storing text, line numbers have had notoriously slow. It's gotten a bit better lately, but suffice to say, a naïve flat array / gap-buffer approach is not good enough for some relatively common scenarios even on modern hardware.

userbinator · on Dec 15, 2019

I suspect that slowness is due to something else; remember that computers these days can execute a few billion instructions per second.

I've written code to do word wrapping, and it was surprising how fast it was. Line numbers are similarly complex.

srtjstjsj · on Dec 15, 2019

We expect modern computers to do something more than just run a full-screen text editor.

jstimpfle · on Dec 14, 2019

I don't think there should be a problem with line numbers. I would make two helper arrays containing the indices of the new-line characters, corresponding to the two gap buffer text arrays (new-line positions are sorted ascending for the first array, descending for the second array).

Speaking as someone who's gone all the way from implementing a Red-black tree to making a rope data structure using the RB tree, to making a text editor that can edit almost arbitrarily large text files (dozens of gigabytes) without user-perceivable latency ;-)

nine_k · on Dec 14, 2019

But line numbers are trivial to update when a gap buffer needs a move.

A list of strings is more elegant, of course, where only the line being edited becomes a gap buffer. It taxes the allocator a bit more, though, which might be a concern on computers of the time when Emacs was born.

BurningFrog · on Dec 14, 2019

That's a problem to deal with when/if you need to add line numbers. Not a minute before that!

CuriouslyC · on Dec 15, 2019

While that is true to an extent, I've made a lot of money cleaning up after people that didn't architect and design their code to cleanly grow into a fairly obvious potential use case, requiring major rewrites. It isn't a premature optimization to avoid walling yourself into a corner..

BurningFrog · on Dec 15, 2019

This is a complex and nuanced topic.

I agree strongly with designing your code so it's easily changeable into whatever new features are needed. This is much easier said than done, and I don't know if anyone has written well about the tricks of that trade.

But anyway, if you have that kind of code, swapping out whatever you need to make line numbers happen is no more work later than sooner.

Code bases with features implemented that are never used, but you still have to keep working through all changes, because someone imagines it will be a real requirement someday, are what my nightmares are made of.

jstimpfle · on Dec 15, 2019

It's all about the interfaces. More performant solutions require (in general) more complex interfaces.

If your application has grown as long as it could with the simple implementation, and now it is all too slow, chances are there's a lot of code depending on the interface. If your interface (and the implementation) is too simplistic, then all of that code will need rearchitecting, too.

dfox · on Dec 14, 2019

IIRC many DOS-era editors used an array/list of line buffers. Which to me seems like good middle ground solution. Certainly for the todays typical usecase when you care about performance, ie. editing >100MB text data file by hand, which is giant pain in emacs because the gap buffer simply is not good structure for doing few simple edits across three places 10MBs away (as you spend most of the time moving the gap around, while touching essentially all of the memory)

jstimpfle · on Dec 14, 2019

Do you know of any reason why the gap buffer couldn't be easily replaced by a proper rope?

dfox · on Dec 15, 2019

as far as how the emacs API works it should not be an issue, but on the other hand there is bunch of elisp code, that realy expects an gap buffer based implementation and in turn is depended by who knows what...

caconym_ · on Dec 14, 2019

I don't edit really huge files that often (maybe a couple of times a week), but when I do I want to be able to use the same editor I use for everything else. A really great text editor is fast and flexible and powerful regardless of the size of the file you're trying to edit.

alecmg · on Dec 16, 2019

Thats one issue I have with these proposed programming tasks.

You are not going to write a really great text editor as a learning exercise. It has been done by better programmers who had better overview of the problems and over thousands man hours.

This automatically means the task is as useless as a gameboy emulator or basic compiler. The underlying "Things to learn" points are good, but tasks themselves are not.

caconym_ · on Dec 17, 2019

Writing experimental text editors for fun in various programming languages has been one of the most rewarding learning exercises of my life.

It's not really clear what your point is. You say the task is "useless"—what does that mean? Personally I can say that you are categorically wrong, because the skills I gained building things that are not completely new ideas fueled my passion for programming and opened up doors for me that otherwise would have remained closed. Even if I didn't still use a lot of these projects myself (because I built them to fit me), the value I derived from them would still be significant in the "grand" scheme of my life.

If a programmer is excited about the idea of writing her own text editor, what would you suggest she build instead that will sustain that same excitement and offer exploration into the same diverse subject matter but also satisfy your nebulous criterion of not being "useless"?

namirez · on Dec 14, 2019

> there's no real need to make your implementation more complicated than a single array.

I think you are misunderstood about the concept an array. An array has 1) an interface that is easy to use. On the other hand, by definition, an array is 2) contiguous in memory. Property 1 is good but 2 can cause problems. I think you want only 1.

The solution is to create a data type that has the interface of an array but a different implementation under the hood. You can have a linked-list of arrays, a tree of strings, etc.

HereBeBeasties · on Dec 15, 2019

I think the original commenter knows full well what an array is.

Vague justifications like "can cause problems" is probably exactly what he's referring to, in fact - people who know that inserting elements into an array is "slow" and end up making large and complex code as a result. Yes, it's O(N) on the length of your code, but the point is that for a couple of megs of text, O(N) is perfectly acceptable.

At least on a desktop, that'll fit in L3 cache which these days is around 175GB/sec. Or to put it another way, inserting that single char can probably be done at around 40,000 times per second. Which is faster than I can type, at any rate.

namirez · on Dec 16, 2019

You'd be correct if people used editors for opening only source code files. The problem is that people usually open data files too which can be not only larger than L3 cache, but larger the entire system memory. The magic of a good editor like Vim is the capability to handle such files.

The other problem with your comment is the support for Undo operation. Even if you use a flat array, you need a more sophisticated data structure for storing previous changes. Storing a separate array for every single change is not an option.

onion2k · on Dec 14, 2019

Whether an array is contiguous in memory depends on the language (and the specific implementation of that language). JavaScript uses hash tables for its arrays which are really objects.

namirez · on Dec 14, 2019

Good point! Dynamic languages are different in their terminology. AFAIK strongly typed languages have a clear definition of arrays. The OP was talking about arrays having "horrible performance if the user inserts text anywhere other than the end of the document". I think this statement has an implicit assumption that arrays are contiguous which is not true in Javascript.

calcifer · on Dec 15, 2019

> Dynamic languages are different in their terminology

By different you mean wrong. PHP calling an ordered hash map an array doesn't make it one.

prox · on Dec 14, 2019

I wonder what a text editor made by HN would be like, everyone is already thinking up strategies :)

tripzilch · on Dec 18, 2019

But did you try editing a multi megabyte file with that method? Cause I have seen enough editors struggle with big files (especially if the file is a single line and you're going through it with word-wrap on), that I think the basic straight-forward approach already isn't sufficient on such workloads.

from the article:

> Luckily, there are some nice data structures to learn to solve this.

You could have also learned a new data structure!

I mean, it should be obvious that "this thing that the cursor does when moving lines" isn't the big takeaway from this challenge. It's almost cute that the author never noticed it (as a programmer), because I actually use that behaviour to navigate code sometimes. Who hasn't done a quick arrow-left/right to make the cursor lose its memory of which column it used to be on?

> Even when DOS machines with <640K of memory and memcpy() speeds in the low MB/s were the norm, people edited text files of similar sizes, with editors that used a single array buffer, and for that purpose they weren't noticeably slower than ones today.

No way. Every reasonably performant text editor in those days used special data structures and not just an array. Imagine having to copy the entire buffer on each key press (so, when inserting at the start of the file). Believe me, on a 640K DOS machine you'll feel that.

This isn't new stuff, I learned about these data structures in uni -- except I don't remember them because back then I was young and arrogant and didn't think you'd need these fancy data structures for something as simple as an editor :) :)

... but if you never tried to write one, it's hard to see in what ways these editors are not as easy as you think.

phkahler · on Dec 14, 2019

I dont see any problem with an array. Make it huge so you only have to reallocate every megabyte or so. Keep track of the document length and only move as much as needed. Your processor can do this every character faster than you can type. No need for fancy data structures, and trivial to load and save files. The interesting part then becomes formatting for the display.

Laakeri · on Dec 15, 2019

Imagine inserting text in the middle of 1GB file. Moving 500MB of data will definitely take longer than 18ms, and thus will cause at least some visible lag.

Dylan16807 · on Dec 15, 2019

This is an editor for textual documents. Where did you get a 1GB file?

Laakeri · on Dec 15, 2019

For example a server log file?

Dylan16807 · on Dec 15, 2019

Not a document and why are you editing it?

anthk · on Dec 16, 2019

This. The former should be using awk/perl on that and just operating on the chunks he found, never at the whole file at once.

But, OFC, these new "programmers" can't even figure basic Unix tools. Or performance.

jstimpfle · on Dec 15, 2019

Try a 100MB file.

clarry · on Dec 15, 2019

Takes less than 100 ms to read in the file (calling realloc in a loop), insert a byte in the middle (realloc + memmove), and write the modified file out on stdout. The byte insertion amounts to about 4ms.

That's hardly fast, yet still a lot snappier than most modern editors' UI or the web, where apparently achieving 60 fps for a few hundred dynamic DOM nodes is some kind of an achievement.

https://gist.github.com/hmkemppainen/376b973c568fc122e2d8c84...

This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly. Or when you're editing multi-gigabyte files.

jstimpfle · on Dec 15, 2019

I must admit I was surprised, although I shouldn't be. Are we at > 10GB/s memory bandwidth now?

> This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly. Or when you're editing multi-gigabyte files.

I'm working on an editor that I've optimized for such cases. In a test it made random edits to a 4GB file in < 50 microseconds. But, it cost a load of sweat and blood to get that rope data structure right. And it loads files only at about 100MB/s (should optimize for bulk inserts). https://github.com/jstimpfle/astedit

userbinator · on Dec 15, 2019

Are we at > 10GB/s memory bandwidth now?

It's been around a decade since that line was crossed. The peak bandwidth of DDR3-1333 is just a bit over 10GB/s.

cerberusss · on Dec 15, 2019

Interesting project. You don't say so specifically, but it looks like it should compile on both Windows and Linux?

jstimpfle · on Dec 15, 2019

Yes, I make it to compile on both platforms from time to time. The current commit should compile using MSVC, gcc, and clang I believe. I'm happy to fix any issues if you find them :-)

phkahler · on Dec 20, 2019

>> This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly.

What operation is that? Search and replace might have that effect but could be done by copying the entire buffer with replacement happening along the way.

scarface74 · on Dec 16, 2019

The counter-argument to that is that processors are ridiculously fast in human timescales

Until you actually have to implement your algorithm on a mobile device that is both memory, and power constrained and that doesn’t have a swap file. The OS will either lol your program for being too memory or power inefficient, kill another program running in the background (not a great user experience) and/or force the use of the high power cores using unnecessary battery life when a more efficient algorithm could have used the lower power cores.

Attitudes like this also explains why developers don’t think twice about delivering battery consuming Electron apps.

froh · on Dec 15, 2019

Umm the dos 640k were paged, non contiguous. Additionally, smoothly editing larger texts back then required some clever linked lists of blocks to give a truly instantaneous editing experience for inserting text at the beginning of a large text. Those were the 286/386 days.

Today you have fancy rendering, and an instantaneous editing experience for that reason again suggests a more sophisticated data structure for the editor.

Which all text editors have, when you look inside vi/emacs/nano/whatever...

dnautics · on Dec 14, 2019

> so unless you're focusing your use-case on editing correspondingly huge files, there's no real need to make your implementation more complicated than a single array.

I think the other major corner case is when you need concurrent, distributed editing (although that's not popular or anything these days), in which case an array is a very poor datastructure.

xxs · on Dec 14, 2019

TIF and GIF are trivial compared to JPEG and H.261, saying that having implemented LZW and GIF (in assembly for 8086) as teenager

mwfunk · on Dec 14, 2019

For TIFF (and most formats) that's heavily dependent on if you're talking about implementing a reader or a writer for the format. TIFF readers need to handle JPEG streams, so in that sense implementing a general purpose TIFF reader is a superset of implementing a general purpose JPEG reader.

On the other hand, TIFF writers can (very conveniently!) be almost as simple as you want, including no compression at all, just blobs of raw pixel values, and a smattering of tags for width, height, pixel format, and that's it. The only thing simpler to output IMO would be uncompressed ASCII formats like XPM.

So in that sense you're correct- the simplest possible JPEG writer is much more complicated than the simplest possible TIFF writer, but TIFF in general is extensible to a fault (arguably), in the sense that the number of possible combinations of pixel and metadata encodings you have to prepare yourself for when opening arbitrary .tif files are far greater than when opening arbitrary .jpg files, including JPEGs within TIFFs.

xxs · on Dec 18, 2019

Back in the days TIF was just a large uncompressed file.

The initial format is older than GIF87a (no animation which people associate GIF nowadays with). It had header but that pretty much it. Of course the format developed with time and even added LZW once the patent expired. Currently TIF is all kind of things, so writing a fully feature reader is a proper challenge (perhaps not coding-wise, but understand it and implementing the myriads of types/extensions, etc.)

agumonkey · on Dec 14, 2019

Spec implementation is indeed very valuable. But in your cases these are data oriented problems. I'd add some systemic cases: networking, security..

my 2 cents

aliceryhl · on Dec 14, 2019

Last I looked into writing a JPEG en/decoder, I ran into the issue that I was unable to find a specification not behind a $800 paywall.

Jasper_ · on Dec 14, 2019

Here you go:

https://www.w3.org/Graphics/JPEG/itu-t81.pdf

https://www.w3.org/Graphics/JPEG/jfif3.pdf

I'll also link Cristi Cuturicu's "A note about the JPEG decoding algorithm", which is where I started my decoder implementation from, and it was indeed a ton of fun.

http://www.opennet.ru/docs/formats/jpeg.txt

jacquesm · on Dec 14, 2019

The JPEG reference source code is pretty readable.

retrovm · on Dec 14, 2019

"no real need to make your implementation more complicated than a single array"

That's our industry in a nutshell. Our computers, instead of becoming more capable over time, can barely keep pace with the increasing naivety of our programmers.

Denzel · on Dec 14, 2019

Yes, this is engineering in a nutshell: determining a course of action within a set of constraints that meets your objectives. Where constraints can be time, cost, physical limitations (processor speed, memory size, disk space), etc; and objectives can be functional (user can edit files), nonfunctional (user can edit large files in < X seconds, energy usage), personal learning, or any number of other requirements.

The GP offered a valid decision point to consider based upon what an engineer is solving for. I don’t think he said that an array was the solution he’d ship in a production text editor to millions of end-users.

Engineering is hardly naive. :)

userbinator · on Dec 15, 2019

Actually, that's exactly what I was saying --- plenty of existing text editors use the "stupid" single array, yet no one complains about their performance.

One example? Notepad.

ethbro · on Dec 15, 2019

Notepad is notepad because someone, god bless their soul, had the sense to put new features into a different app as Wordpad.

In some terrible, dark dimension, Notepad has a ribbon interface and supports PDFs.

tripzilch · on Dec 18, 2019

Just because people tend not to edit large files in Notepad doesn't mean they'll complain about it when they do. Actual complaints are of course sparse because hardly anyone uses Notepad for anything serious if they can use an alternative. BUT when they do, oh they will complain.

I believe an older version of Notepad even had a (fairly low) limit on file size it would open.

I mean that's the reverse argument, computers have gigabytes of memory today, and are super fast, so you should be able to load a multi gigabyte text file and edit it, on a single line, with word wrapping.

jstummbillig · on Dec 14, 2019

In webdev, increasingly often, the expectation is that programmers not only to do the backend, but also the database management, the frontend (which used to be graphic design, css/html, and js, separately) and everything devops.

Outside of webdev, Unity springs to mind, as another great example of this: The stuff you can do as a single game developer is mind boggling, or at least used to be, until indie devs everywhere started boggling our minds on a daily basis and thus raising the standard of what consumers expect an indie game to be.

This is, of course, not possible because within 50 years humans evolved to be a lot better or smarter or faster than their predecessors. It is made possible through more flexible higher level tooling, that you don't have to understand the inner workings of to take advantage of, and more abundant computing resources, that in tandem, enable work that will be in the "good enough" territory for most use cases.

This is also not a choice that programmers as individuals or even a group make. It's a choice that the market makes.

There is nothing naive about it. Naive is assuming, it would be any other way.

jtms · on Dec 15, 2019

In web dev I have observed the opposite trend: when I first started my career everyone was expected to be full stack and know how to deploy a thing. nowadays devs tend to be strictly front end or back end or dev ops, etc. Devs that can optimize a sql query, model a db schema and then write a well organized react or angular front end seem to be the exception not the rule.

tripzilch · on Dec 18, 2019

> In webdev, increasingly often, the expectation is that programmers not only to do the backend, but also the database management, the frontend (which used to be graphic design, css/html, and js, separately) and everything devops.

What do you mean, increasingly often? This was the case 15 years ago already and I see only examples that it has gotten less, because of all the frameworks that exist.

Also it's exactly what I liked about webdev. When your existing talents for graphics design and explainer-of-technical-things shine in a tech context, that feels good. A lot of programmers have no feel for this, and a lot of designers write awful code. Which could have, but historically did NOT improve at all with higher level tooling, mainly because of this "good enough" attitude. Feel free to prove me otherwise, but what did happen: Thanks to things like Bootstrap, now programmers can avoid the worst design mistakes without having to learn design. Graphics Designers, however, well .. I don't know? Are there tools that allow them to write or generate code that doesn't suck? (Without programming skills, like the coders without design skills).

> This is also not a choice that programmers as individuals or even a group make. It's a choice that the market makes.

> There is nothing naive about it. Naive is assuming, it would be any other way.

I don't know ... Do you believe there no longer exist people that deliver quality over this entire skill set? Or that they somehow exist outside of the market?

Frost1x · on Dec 14, 2019

People jump for complex and overly optimized solutions too quickly, IMHO. From a conceptual perspective, I enjoy these sort of challenges but that's where it ends.

For product demands where deadlines are constantly unrealistic, underfunded, underscoped and demands are ever changing, I'm a fan of providing the simplest conceptual solution to the task at hand and not focusing on developing complex abstractions and optimizations too early.

From my experience, that time is typically wasted until functionality is zeroed in and real money is available to pay for the work, as the early complex abstractions typically fail to meet pace with demands and the optimizations break when ever changing requirements.. change. That's just my experience, YMMV.

OkGoDoIt · on Dec 14, 2019

Are you suggesting that it’s bad to use the simple uncomplicated approach because it’s inefficient, or that it’s bad to add layers upon layers of complexity which end up bringing modern computers to their knees?

Personally I’m in the latter camp. There’s so many layers of abstraction nowadays which each in theory make programming better/safer/easier which in practice end up creating an incredibly inefficient mess.

TeMPOraL · on Dec 15, 2019

Complexity != abstraction != leverage.

Today's software suffers from too many layers of complexity that are each pretty dumb and serve mostly bookkeeping. The result looks like an overinflated bureaucracy. In the example above, using a more efficient data structure for text representation will add at most one layer of abstraction (but there's a good chance you'd create that layer to hide the array anyway), but offer significant benefits in terms of performance, at a cost of little and well-isolated complexity.

This is the best kind of abstraction: complex, deep behavior hidden behind simple interface.

clarry · on Dec 15, 2019

Same. I generally write in C without too many layers between my code and the CPU, and it is just incredible how fast modern CPUs are with naive code that doesn't even attempt to be optimal.

I wish others understood that, because the things I work on are losing performance (and a massive amount of developer time, which could be used for optimization or other useful work) to excess complexity, not too simplistic code.

srtjstjsj · on Dec 15, 2019

Vim is 25 years old. Efficient text-handling data structures aren't newfangled gobbledygook.

anthk · on Dec 16, 2019

And vi is even older. Plus ed...look at the release date.

kortilla · on Dec 14, 2019

Yet if you read the rest of the comment you would realize this specific use-case (editing text) was done fine with a single array buffer when computers had less than 1mb of memory to work with.

This is a perfect example of when it’s stupid to keep optimizing.

srtjstjsj · on Dec 15, 2019

How does 1mb computer keep a 2mb textfile in an array in RAM?

Dylan16807 · on Dec 15, 2019

Do you have text files you need to edit that are that big?

I've opened files that big in a text editor before, but it was definitely the wrong tool for the job.

anthk · on Dec 16, 2019

pointers?

mpweiher · on Dec 15, 2019

I wish that were our industry. Instead, we make things super complicated and make them slower at the same time.

Let's take the text editor example. Let's say we use it to edit a large document. Is Moby Dick large enough? It's around a megabyte of (pure) text. Let's figure out a persistence solution. How about "we save the entire text to disk"? So a megabyte to disk. My laptop's SSD does (large) writes at 2GB/s. So the ultra simple solution could save the entire text around 2000 times per second.

That's a lot faster than I can type.

zepolen · on Dec 16, 2019

Your laptop's SSD sure, 2GB/s - that 5400 rpm laptop hard disk that your user has is writing at a measly 1mb/s because the disk is also being accessed by 5 other programs.

Now the user is either queuing up a bunch of background saves leading to overload or is forced to wait 1s per keystroke.

Well done!

I guess the simple solution then is to tell the user to buy a $3000 laptop just so it's capable of running notepad.

mpweiher · on Dec 16, 2019

Hmm...Mac laptops have been SSD-only for how many years?

Anyway, even laptop drives are well over 40-50 MB/s these days, and any disk scheduler worth its salt will schedule this kind of write (one contiguous chunk) near optimally, so still 40-50 writes/s.

And of course, you queue these writes asynchronously, dropping as needed, so if you actually manage to out-type your disk, all that happens is that your save-rate drops to every couple of characters. Big whoop.

Also remember that this is Moby Dick we're talking about. 700+ pages, so something that vastly exceeds the size of the kinds of documents people are likely to attempt with a Notepad class application.

Last not least, this is a thought experiment to demonstrate just how incredibly fast today's machines are, and that if something is slow, it is almost certainly because someone did something stupid, often in the name of "optimization" that turned into pessimization, because doing Doing the Simplest Thing that Could Possible Work™, i.e. brute-forcing would have been not only simpler but significantly faster.

busterarm · on Dec 15, 2019

Raise your hand if just running your web browser has pegged your top-end, multiprocessor, high-mem system in the last month. Both Firefox and Chrome have for me.

userbinator · on Dec 15, 2019

I think that's largely due to JavaScript and its ecosystem of abstraction-bloat that has been mentioned in another comment here, along with the trend of "appifying" what should really be static sites. A static page that contains even dozens of MB of content won't stress a browser as much as a "web app" containing only a few hundred KB of countless JavaScript frameworks glued together --- despite the latter presenting a fraction of the actual content.

prox · on Dec 14, 2019

I use to read the blog on virtualdub.org (video capture and processing) and enjoy his rants on bundled library bloat. Virtualdub was small in footprint and great to use. So do programmers become reliant on scaffolding too much, or is it a necessity as you learn?

andreareina · on Dec 15, 2019

Why not both? I mean, I wouldn't say that it's actually necessary, but scaffolding exists to hide away the incidental complexities of the problem being solved, revealing the problem for what it is. Demonstrations of recursion and pattern matching tend to use the same problems because they're such a good fit that there's a very close correspondence between the high-level explanation of how to solve the problem and the code itself.

At the same time we ought to be aware of that scaffolding and how it works (or could work), and how to build such abstractions ourselves. Not just because all abstractions leak[1][2] and potentially introduce bloat, but also because it means I don't have to pull in another dependency to save me a page (or three lines) of trivial code. Or maybe because the "standard" solution doesn't quite support your use case (I can't count the number of times that I've rewritten python's lru_cache[3] because of it not accepting lists and dicts).

[1] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

[2] https://www.joelonsoftware.com/2001/12/11/back-to-basics/

[3] https://docs.python.org/3/library/functools.html#functools.l...

audiometry · on Dec 15, 2019

Not sure about Gb of throughout makes editor memory representations unimportant. Just this week my friend said he kill emacs with a few-MB text file. I was astonished that a software of that esteem would struggle with that.

BeetleB · on Dec 15, 2019

In my first job I would routinely open 10-20 MB files in Emacs. It handled it just fine. I mean, it gives a warning that this is considered big, but I ignored it.

Now if you open a large text file in something other than text mode, it could bring it to its knees depending on the mode. As an example, opening an XML file in the nXML mode is quite expensive, because nXML mode is powerful and utilizes your XML structure. I just tried a 12 MB XML file and told it to go to the end of the file. It's taking Emacs forever to do it (easily over 30s). But if I switch to text mode for that same file, it handles it just fine.

I just tried an 800 MB text file. It handled it fine.

The one thing where you can easily get in trouble: Long lines. Emacs cannot handle long lines well. Kinda sad.

kccqzy · on Dec 15, 2019

Are you sure text mode is fine? I usually have to use fundamental mode to edit big files (more than a dozen MBs or so).

BeetleB · on Dec 15, 2019

Yup. Text mode is fine. If that's causing problems, perhaps you have things enabled in your config that causes problems?

As an example, I have anzu minor mode selected. So if I try to search in the 800MB file, it hangs until I cancel.

ses1984 · on Dec 15, 2019

It's unlikely plain emacs struggled with that file.

lodi · on Dec 16, 2019

Try `emacs -nw -q` for a stock experience. That should have no problem with any reasonable text file.

tgv · on Dec 14, 2019

> there's no real need to make your implementation more complicated than a single array

Ugh. That's just offensive.

yaantc · on Dec 14, 2019

Not really, it's a commonly used scheme. Read on gap buffers: https://en.wikipedia.org/wiki/Gap_buffer

It's really an array with a gap at the cursor location. Used by emacs and others for decades.

azhenley · on Dec 14, 2019

In the article, I listed a rope, gap buffer, and piece table as potential solutions instead of "just an array".

jakear · on Dec 14, 2019

VSCode uses a JS piece tree, which allows it to be faster* at manipulating large files than Sublime’s native code implementation.

https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...

*or at least competitive with. I’ve measured it to be faster but I’ve heard others have had different experiences

ben-schaaf · on Dec 14, 2019

From a quick read it's actually pretty close to exactly what Sublime Text does. Source: Worked on that code at sublime recently.

smaddox · on Dec 14, 2019

But that's massively better than just an array. It changes the time complexity from O(char_inserts x file_len) to O(cursur_move x file_len), which is likely a couple orders of magnitude better.

tgv · on Dec 15, 2019

That's not "just an array".

pkaye · on Dec 14, 2019

How big can file can a gap buffer be used on until it starts to slow down with current hardware?

banachtarski · on Dec 15, 2019

That's not a single array though (per the OP's point)