What I find most annoying about tutorials on pointers (as someone relatively new...

rhn_mk1 · on June 17, 2020

Perhaps that's because C pointers are usually (always?) exactly that - a hard coded char array, which starts at memory offset 0.

Depending on the circumstances, it may be easier to think about it as int array, or some other type, but the easiest on-ramp for pointers is still char[] = 0x0.

clarry · on June 18, 2020

> how to e.g. read strings from say a CSV file and parse it into an array of arrays of strings > > Does anyone have any pointers here? (sorry)

I think this kind of thing isn't discussed too often because there are so many different choices that you can make (and once you understand the language & your constraints, you can certainly make those choices). I agree it would be useful to hilight all the possibilities. Let me try to illuminate:

First, do you really need an array of arrays of strings? That's possible, but not all that common. If you don't really need such a thing, then you do something like strtok or strsep: implement a function that you call in a loop to extract fields one by one. Now you just need a single buffer large enough to hold a single field (or one row if you prefer to make it a little simpler). Your function locates the terminating comma, replaces it with a NUL, and returns a pointer to the start of the buffer (and stores the start of the next field -- that is, the address of the byte after the NUL -- somewhere for the next call).

This approach is popular among C programmers because it usually means you don't need dynamic allocation, as long as there's a reasonable maximum field size. Or if a maximum field size cannot be imposed, then you get away with dynamic allocation (and re-allocation, when needed to grow) of a single buffer.

What next? Ok, maybe you really do require "all fields at once." Your average C programmer still won't allocate arrays of arrays of strings. Instead, they read the entire row, extract fields out of it (as above), and store the pointers to the start of each field somewhere. If the caller knows how many fields they want to deal with (very often you do!), then they can provide you a fixed length array for these pointers. Otherwise, you can dynamically allocate a buffer to hold the pointers. Either way, now you need two buffers (and depending on your needs, one or both or neither may be statically sized): one for the row, one for the pointers. Now the buffer containing pointers is exactly like argv in main(), while the fields are in one contiguous buffer spliced by NUL bytes.

Ok, maybe you really do require all rows and all fields at once. At this point C programmers will hate you because you're forcing them into dynamic allocation (or an unreasonable fixed size limit on the file). Otherwise, you do as above, but you also terminate lines with NULs and now you have some choices as to how to lay out your row + field pointers. For example, you could allocate one array for row pointers, which point to field pointers (which could be allocated separately for each row, or, perhaps preferraby, in a single flat array containing pointers to all the fields). If you expect the caller to iterate (rather than random-access) rows, you could use a single mixed type array containing row metadata + field pointers.

Another very common approach would be to allocate field pointers + data for each row separately and then link rows in a linked list. Again this favors iteration but you're less likely to need to resize huge arrays.

Sometimes you deal with memory that is best to keep immutable; in that case, splitting the data into rows and fields with NUL bytes is not an option and you must take a different approach. Either duplicate the data (more mallocs, required if you also need to do things like unescaping) or keep it where it is and store pointer + length tuples.

See, there are lots of choices, and lots of variables, depending on how much memory you want to use, can you use fixed size buffers, what kind of access pattern the caller expects, does the caller expect to be able to free individual rows, etcetra.

There is no best practice but generally C programmers gravitate towards using iteration & fixed size buffers unless more is required (this is simple and lets the caller persist data if they want to, but doesn't force them to deal with dynamic memory if they don't want or need to). If something more specific is required, then you'll know what is required and implement something that supports just that.

The Python style divide-and-conquer where you first read an entire file, then chop it into lines, then chop lines into columns, etc. is not very popular.