Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Letting userspace know the page size was IMO a design mistake.

Imagine a world where the page size is secret to userspace. Anything that needs page size alignment will be chosen by the kernel.

That in turn allows mixed page size, variable page size, heirarchical pages, etc.



The underlying problem is that there are several different things, and many of them currently don't have APIs:

* the known-constant page size that must be passed so you won't get failure from `mmap(MAP_FIXED)`, `mprotect`, etc. (even if you don't call these directly, ELF relies on them and traditionally 4K was used on many platforms; raising it requires a version of ld.so with proper support). There is no macro for this. On platforms I know about, it varies from 4K to 64K. Setting it to a higher multiple of 2 is by design always safe.

* the known-constant page size that unconstrained `mmap` is guaranteed to return a multiple of. There is no macro for this. This is 4K on every platform I know about, but I'm not omniscient. Setting it to a lower multiple of 2 is by design always safe.

* the dynamic page size that the current kernel is actually using. You can get this using `getpagesize()` or `sysconf(_SC_PAGESIZE)` (incidentally the man page for `getpagesize(2)` is outdated by assuming page size only varies by machine, not at boot time)

The macro `PAGESIZE` is traditionally provided if the upper/lower bounds are identical, and very many programs rely on it. Unfortunately, there's no way to ask the kernel to increase the alignment that unconstrained `mmap` returns, which would safely allow defining PAGESIZE to the largest possible value.

Note that it is possible to get compiler errors for incompatible definitions of constants (or ranges thereof) by relying on `ld` magic (each value pulls in variable definied in a distinct object file that also defines an additional shared variable, causing multiple-definition errors), but this would need to be done by whoever provides the macros (which should be libc).


It's unavoidable. APIs like mprotect() operate on pages; there's no way to hide that from them.


Also various performance characteristics graphs will have steps around the multiples of page sizes. People would find out anyway. (Although maybe the runtime detection is not a bad idea)


Just like cache size.

But when someone releases a new CPU with a larger or smaller cache, all old software continues to work.

Secret page size would offer the same benefit.


Cache line size would be a better example.


Yup.

And like with page sizes, the big problems with cache line sizes is not when people design things against a specific line size, but the (much more numerous) cases where they design something that only works well for a fixed size but they don't even know that, because they literally never thought about it, and it worked fine because every machine they used was similar.

It doesn't matter what apis you were provided or what analysis you did, you don't know your software works with different hardware until you test it on it.


the mprotect API could have been designed more like malloc() - ie. you don't protect a preexisting memory range - instead the API returns a new memory range with the protections you've asked for, possibly copying a load of data into it for you incase you asked for a readonly range.

And that 'copy' might be zero-overhead remapping of the original pages.


That API basically exists (you can use mmap() with a dummy backing file), and it is not useful for any of the things you use mprotect() for.


There's some important use cases for mprotect(), like toggling W^X on JIT pages, which that wouldn't work for.


You can probably replace some of those use cases by mapping the same physical page in two places with different permissions (W^X is per-address, not per physical page).


If you had a way of allocating pages of at least N bytes it would work.

  uint8_t* mem = malloc_pages(1024);
  mprotect_paged(mem, 1024);


That's how mmap and mprotect work today. Sizes get rounded up to the next multiple of the page size. Many applications using mmap do not need to know about the page size.

I think it's even part of POSIX: “The system performs mapping operations over whole pages. Thus, while the parameter len need not meet a size or alignment constraint, the system shall include, in any mapping operation, any partial page specified by the address range starting at pa and continuing for len bytes.”


W^X would be supported by the API that they suggest:

"possibly copying a load of data into it for you incase you asked for a readonly range."


For JIT, you specifically need to be able to write some code to a page, then switch that page from writable to executable in place. Having the page move to a new location isn't acceptable, as the generated code will contain relative calls to other JIT generated code in other pages you've previously written to.


Oh right, good point!


Yes, but that is a platform implementation detail, not something NDK apps should mess with.


you can simulate arbitrary sized mprotect() by having the kernel do the closest it can using the hardware, and then any pages crossing the boundary will be handled by page faults. The performance hit should be small as long as most mprotect regions are large (which they typically are).


That'd fail horrifically for guard pages around a main region where the whole used region is reasonably expected to be used.

Considering that it'd be literally completely entirely unacceptable to ever hit such an emulated range in code of any significance as it'd be a slowdown in the thousands of times, it'd make mprotect entirely pointless and effectively broken. Unless of course you add "padding" of expected page size (not applicable for guard pages though, those are just SOL), which is basically status quo except that, instead of apps doing hard-coding crashing on unexpected page size, they just become potentially 1000x slower, but of course in practice there's no difference between a crash and 1000x slower. I'd even say a crash is better - means a bug report of it can't be dismissed as just a perf issue, and it's clear that something's going wrong in the first place.

The size of the mprotect region does not matter. What matters is worst-case behavior, and, regardless of the size, if a single critical byte of memory ends up in the emulated region, your program is dead to the user, causing data loss or whatever consequences from being, practically speaking, completely frozen your program has.


Emulating it would work quite terribly. With a 4kb page size you can write 4kb of memory as RW, then an adjacent 4kb of memory as RX, and have them interact.

Trying to emulate that under 16kb pages would fault every memory operation.


That isn't officially supported API, so if any app on NDK is making use of it, though luck.

https://developer.android.com/ndk/guides/stable_apis


I'm not sure how you get that idea - as a Linux system call, mprotect() is part of "the standard C11 library headers", and it's used internally by the dynamic linker. Android doesn't need to take any steps to support it, and there's little they could do to remove support for it.


I got that idea knowing ISO C standard, POSIX and not mixing themes.

By not being part of the official stable APIs, Google is free to break whatever userspace application they feel like.

There are seccomp and sandboxing filters in place for what is exposed to non-system apps, aka regular PlayStore apps.


Could not possibly disagree more. Trying to hide details from developers causes immense and suffering.


Absolutely. Breaking apps for all users when the system's page size changes is much better than inconveniencing the handful of developers that would have to work around a more abstracted page size.

Same reason why I think electron is great. Devex for 3 people is so much more important than ux for the 30M users.


> why I think electron is great

I take it back. It is possible for me to disagree more strongly.

You don’t need to abstract away page size. Abstraction isn’t the solution to all problems. You just have to expose it. Devs shouldn’t assume page size. They simply need to be able query whether it’s 4 or 16 or 64 or however large and voila.


That exists via sysconf(_SC_PAGESIZE). The problem is that hard coding the page size to 4k has worked just fine for much of the software written in the last few decades, which means that the 4k size has ossified to some extent (Hyrum’s Law).


> Devex for 3 people is so much more important than ux for the 30M users.

I think they were being sarcastic, and you might agree more than you realize.


Ahh you are probably right. I was confused by that phrase. I am not a clever man.


This change doesn't break all apps. It breaks a small subset. Just like you wanted.


It's hard to hide all the details, especially when you involve varied permissions over different chunks of memory.


> Letting userspace know the page size was IMO a design mistake.

How would you have had users handle SIMD tails?


Fixed-length SIMD was a design mistake too.

But realistically, you only need to know the lower bound for the page size, so pages larger by an unknown multiple are not a problem. Or use masked loads, and don't even worry about pages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: