> Maybe I'll post more tomorrow, if anyone is still around.
I can only speak for myself but I will definitely be around to read another post if you write one.
The design is fascinating and I'm very excited to see a new language that combines functional, relational, and reactive elements.
One design choice that I'm particularly interested in is your choice to fix the set of possible values in a way that supports generic value introspection and orthogonal persistence. There are clear benefits but some costs include possible obstacles to abstraction / modularity and also extensibility. I'm wondering how you see these trade-offs.
(Replying to my own comment because I don't see a way to edit it.)
Another question I have is about recursion. Browsing through the code a little bit, I see loops and comprehensions are often used but I'm not sure I've seen a single recursive function. Is recursion something you've consciously tried to avoid and if so can you speak about the rationale on that topic?
What the parent is saying is that no, the struct will be loaded into registers and passed there. Here is a simpler example: https://godbolt.org/g/DcHgMr
As you can see, structs used to be on the passed stack on x86-32. Most other/newer architectures use ABIs that pass the initial struct elements in registers.
There are two orthogonal things going on. And I was actually mistaken as to certain specifics. So if only to clear up my own confusion, let me go into a somewhat unnecessary level of detail. TL;DR, though:
- To answer ericbb's concern, the compiler will make copies as necessary to preserve pass-by-value semantics; it just sometimes looks like pass-by-reference at the assembly level. (But "pass on the stack" isn't quite the right wording either; I was referring to a special case of that.)
- Only very small structs are worth passing or returning by value, with the exact limit depending on the platform (and often unnecessarily low).
Anyway.
First, as you (tom_mellior) mention, most ABIs have a list of registers used for function arguments; if there are too many arguments (or on 32-bit x86, any arguments), the remainder are placed on the stack, starting at a fixed offset from where the stack pointer points at function entry.
Second, a struct argument can be passed to a function in three different ways.
The fastest way is to pass each struct field in order as if it were a separate function argument.[1] Thus each field could go in a register or, if all the registers have been used, on the stack. As I said, this is often done only for structs under a certain size: 1 word on Windows x86-64, 2 words on ARM64 and non-Windows x86-64. But actually, some ABIs have no limit, like 32-bit ARM and some less popular architectures. (Thus I shouldn't have said ARM's limit was 4. 4 is the number of registers available for arguments in total, but that's different. The ABI doesn't change behavior for individual arguments based on the struct size. And even a huge struct can have the first few fields passed in registers.)
Another way is to pass a pointer to the struct in place of the struct itself. This is what happens on Windows x86-64 (for structs larger than 1 word) and ARM64 (for structs larger than 2 words). The semantics are still pass-by-value, so the caller generally needs to make a copy of the struct on the stack (at least if the original value is used afterward), but that's its own business; the callee just sees a pointer.
This is what I meant by "implicitly pass by reference", at least for arguments (see below about returns). At best, it's identical at the assembly level to using a real pointer argument, and thus equally efficient; at worst, it's less efficient, since the compiler may emit a copy when it could have just passed a pointer to an existing copy. This might be because the language semantics don't guarantee the existing copy won't be modified, but it can happen even if they do: struct arguments/returns aren't that popular in C and C++, so even modern compilers don't always optimize them as well as they could. This is a problem for Rust.
The third way - which I've only seen on non-Windows x86-64, and which I had confused with the previous one - is to pass the struct like excess arguments, except forced to be on the stack, skipping over any registers that may be available. Thus, if you have
void func(int x, struct foo foo, int y);
then 'x' and 'y' would be put in the first two registers, while 'foo' would go on the stack, at the aforementioned fixed offset. On the other hand, if there were a lot of preceding arguments that already filled up all the registers:
void func(int a, int b, int c, int d, int e, int f, int x, struct foo foo, int y);
...then 'foo' would have to be on the stack anyway, so the two-word limit makes no difference. The stack (starting at the fixed offset) will contain 'x', then 'foo', then 'y'.
Anyway, in this case too it's usually equally efficient to use a real pointer, though it's not identical at the assembly level. If you use a real pointer, and the pointer fits into a register, then to load a field of a struct, the function has to load from that register, which is one instruction:
mov %rax, 0(%rdi)
If you pass by value, and it's forced onto the stack, then the function has to load from the stack, which is still one instruction:
mov %rax, 8(%rsp)
Though if the pointer itself ends up on the stack, then of course it needs two instructions.
Separately, performance can also differ greatly due to the cache. In general, the top of the stack is almost certainly in L1 cache, while some random pointer might not be. But if the choice is between the callee loading the pointer versus the caller loading the same pointer to copy it to the stack, then that doesn't really matter.
...Then there's return values, which are a bit simpler, since each function has only one return value. Every ABI I know of uses uses either one or two registers for return values. (Even legacy x86 uses registers for returns, rather than the stack.) The limit for struct return values is zero, one, or two words. Larger return values are done by having the caller pass an out pointer, usually treated like an extra function argument. So here too, for large structs it's at most equally efficient to use an explicit out pointer, often more efficient.
Note that annoyingly, the limit is often smaller than you'd expect from the number of registers available. On 32-bit x86, struct return values always go by out pointer, even if the struct contains a single int. On 32-bit ARM, single-field structs are OK, but annoyingly, a struct containing two 32-bit values cannot be returned in two registers, even though two registers are used in other cases - namely, to return a single 64-bit value.
[1] Alternatively, the struct may have its original layout divided into words, and have those words passed as separate arguments. The difference comes when struct fields are smaller than machine words. If you have
struct example { uint32_t a, b; };
then on non-Windows x86-64, passing 'struct example' as an argument will pack both 'a' and 'b' into a single 64-bit register (rdi), whereas if you have two uint32_ts as standalone arguments, they will be passed in separate registers (rdi and rsi).
Instead of passing the struct in registers (rdi, rsi, rdx) or passing a pointer to the object (pass by reference), the caller places the object on the stack and the callee knows where to find it just based on the type of the function (it's at [rbp+0x10]).
On the other hand, if you delete the c field from the struct type, then the beginning of foo() looks like this:
In that case, the struct was passed in via registers rdi, rsi (similar to tom_mellior's example) and subsequently copied to the stack (at [rbp-0x10] this time).
So, in neither case would I consider it "pass by reference" because neither leads to a pointer to the x object being passed in as an argument.
It's interesting to learn that Windows does it differently and does in fact use what I'd call "pass by reference". The only ABI I'm (somewhat) familiar with is Linux x86-64.
Your point about the annoying limitations on return values is a good one too. I had the thought at one point that it'd be possible to uniformly use structs with just one field instead of typedefs but the limitations on return value calling conventions mean that the two alternatives do not lead to equivalent code, surprisingly.
For people reading this thread who want to better understand the distinctions between files, file descriptions, and file descriptors, I recommend Michael Kerrisk's book The Linux Programming Interface, which has great coverage of this topic.
With the C-c issue specifically, I don't think that emulator throughput is the most likely culprit.
It could be that the emulator is not handling inputs fairly: maybe it tries to process all available input from the pseudoterminal before processing the next batch of keyboard input. Or it could be that the pseudoterminal (kernel driver) is not configured to send the signal as expected. Or it could be that the process you're trying to interrupt is not responsive to the signal.
I maintain a terminal emulator that is not optimized at all and I just tested interrupting a process that was dumping one gigabyte of text to the screen. The interrupt was handled instantaneously.
It isn't the most likely culprit. The mosh people point out that the place where people hit this is with a SSH session to another machine. Where the data are actually building up is the SSH connection. It's not a problem with the terminal emulators at all.
> `lambda` and function application alone are Turing-complete, as McCarthy would have known. The credit here belongs with Turing and Church, not McCarthy. `atom`, `cons`, `car` and all the rest are just icing on the cake of the lambda calculus when it comes to computability.
This point reminds me of something interesting that I noticed recently despite having been familiar with basic LISP ideas for a long time.
If you carefully read McCarthy's original presentation of LISP, you can see that he seemed to be influenced at least as much by Kurt Gödel's work as by Alonzo Church's work.
In Church's lambda calculus, functions are abstract values that can only be used by means of function application. LISP evolved toward this style over time, but in the early history of LISP, functions were not abstract values but were represented by encoding into S-expressions. This encoding process resembles Gödel numbering.
Also, recursion arises in a different form in lambda calculus than it does in the approach to computability that is based on general recursive functions. Again, I think that LISP is closer to Gödel here than to Church.
> Also, recursion arises in a different form in lambda calculus than it does in the approach to computability that is based on general recursive functions. Again, I think that LISP is closer to Gödel here than to Church.
I think I understand most of the rest of your post, but I got lost here. Could you describe what difference you see here, and how it applies to LISP?
When I mentioned "general recursive functions", I was thinking about systems that define functions using a system of functional equations rather than using a lambda term.
Example:
odd 0 = False
odd n = even (n - 1)
even 0 = True
even n = odd (n - 1)
I think "rewriting system" is something similar. (I'm not an expert in this stuff so I'm providing hints for further reading more than anything else).
In these systems, recursion is "built in" to the formal language itself.
On the other hand, lambda calculus does not have recursion "built in". All variables are bound by function application so it is not possible to have recursive bindings like you see in these recursive function systems. You can define recursive functions using mechanisms like the Y combinator but these work by reconstructing something over and over as the computation evolves. It's different.
In McCarthy's paper, there is a system of toplevel recursive functions and there is a LABELS form that behaves in a similar way. Lambda calculus has neither of these.
One resource that describes recursive function systems of the kind I'm thinking of is the book "Lambda-Calculus and Combinators, an Introduction" by Hindley and Seldin (especially chapter 4).
I always use dynamic linking but I thought it'd be fun to try making a small, static hello-world. The following assumes x86-64 Linux (and is a total hack that "works for me" NO WARRANTY!).
$ cat test.c
static int sys_write(int fd, const void *buf, unsigned long n)
{
asm("mov $1,%rax; syscall");
}
static void sys_exit(int status)
{
asm("mov $60,%rax; syscall");
}
void _start(void)
{
char s[] = "Hello, World\n";
int r = sys_write(1, s, sizeof(s) - 1);
sys_exit(r == -1 ? 1 : 0);
}
$ gcc -nostdlib -static -o test test.c && ls -lgG test
-rwxr-xr-x 1 1480 Sep 28 11:47 test
More to the point of your question, if you want to see the effect you're looking for, I think you need to link to a libfoo.a library that includes some module.o that your program doesn't use.
No, the parent said "the linker can throw the unnecessary parts away". There is plenty of stuff in libc that my simple "Hello World" program doesn't use, yet gcc does not throw those parts away.
Theoretically I can see that it's possible, I just wonder how to actually do this in practise.