Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Early C largely inherited the data model and semantics of BCPL, which was built around "machine words" and not byte-addressable memory. The introduction of the C "char" allowed them to take advantage of PDP-11's byte addressability for special occasions, but "int" was still the default choice for storing everything from sizes, to file descriptors, to pointers.

In that sense, it was not exclusively built for a byte-addressable machine with registers of a varying size.



there are an awful lot of chars in the v6 unix code, i don't think it's really 'for special occasions'

also pointer arithmetic on a byte-addressed machine is different from int arithmetic, so you have to know if something is an int or an int pointer if you want to increment it

from the horse's mouth in https://www.bell-labs.com/usr/dmr/www/chist.html, dmr's hopl ii paper, with the advantage of 20 years of hindsight

> The machines on which we first used BCPL and then B were word-addressed, and these languages' single data type, the `cell,' comfortably equated with the hardware machine word. The advent of the PDP-11 exposed several inadequacies of B's semantic model. First, its character-handling mechanisms, inherited with few changes from BCPL, were clumsy: using library procedures to spread packed strings into individual cells and then repack, or to access and replace individual characters, began to feel awkward, even silly, on a byte-oriented machine.

> Second, although the original PDP-11 did not provide for floating-point arithmetic, the manufacturer promised that it would soon be available. Floating-point operations had been added to BCPL in our Multics and GCOS compilers by defining special operators, but the mechanism was possible only because on the relevant machines, a single word was large enough to contain a floating-point number; this was not true on the 16-bit PDP-11.

> Finally, the B and BCPL model implied overhead in dealing with pointers: the language rules, by defining a pointer as an index in an array of words, forced pointers to be represented as word indices. Each pointer reference generated a run-time scale conversion from the pointer to the byte address expected by the hardware.

> For all these reasons, it seemed that a typing scheme was necessary to cope with characters and byte addressing, and to prepare for the coming floating-point hardware. Other issues, particularly type safety and interface checking, did not seem as important then as they became later.


This confirms that it was conceived pretty much as a typeless language. The purpose of "char" was to be able to process byte-addressed strings character by character, and to do byte-sized pointer arithmetic without any runtime conversions. For most other purposes, "int" was the machine word. Hence the weak typing and the odd conversion and promotion rules between "char" and "int".


yes, b (like bcpl) was completely untyped, and in c both int and pointers were machine words (and in 6th edition c typing was very weak indeed), but you have the pointer arithmetic thing backwards

on a byte-addressed machine, byte pointer arithmetic works fine if you treat the byte pointers as integers and don't do any conversions at dereference time; that's what it means to be a byte-addressed machine usually (certainly in the case of the pdp-11)

it's pointers to larger-than-byte things (ints, pointers, and later floats, structs, and arrays) where runtime conversions rear their head; if you try to not distinguish between ints and pointers to ints, then for *(p+1) to refer to the int after *p (instead of one overlapping it, giving a bus error), you need to shift p left by one bit at dereference time, or two bits on a 32-bit machine (if its memory addresses identify 8-bit bytes, as on the 360, pdp-11, vax, and 8086). no such conversion is required for char pointers

hope this clarifies


Also worthy of note that BCPL was originally designed as means to Bootstrap CPL, not to go around writing full systems with it, hence why it was so basic.


I think on x86 loading a char from memory took one more clock cycle than loading an int.


You mean one less cycle? I think that was only ever the case in the 8088 with its 8-bit bus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: