I agree, but I think such information should be available to all people.
Some people will be doing this anyway, and they can hide this by using a transparent pixel.
I did some research before writing an article. GitHub started proxying images in 2014, and there are a lot of repositories that use this technique to keep their stats. I think GitHub is OK with that.
>Unlike GitHub, most of them don't even bother proxying the image to hide IP, referrer, and browser agent. If you want to allow external images on your site, you must proxy them and hide everything about a person who requested it.
> A person with bad intentions can trick a victim into opening your profile that looks completely legit and detect his IP and a browser.
Can you explain this in more detail? Given a profile host that doesn't proxy, how does that attack work?
1. your browser opens image from external server (in this step the server gets your IP and potentially user agent as that's how browsers communicate with servers)
I don't think system allocators are clever enough to process and allocate 100-500k of very small objects each minute when Python is performing something very intensive.
It's a pretty standard way to speedup allocation for dynamic languages. Game developers use similar techniques as well.
Can confirm for perl. We are doing the very same. It's a huge win.
Differences:
We never free empty pools.
Our arenas are just single linked lists, no need for the prev.
Notes:
For a statically compiled perl the biggest win is to avoid arena allocation (mmap) at all. Data and code is made static. That's around 10-20% of the runtime (for shortrunning programs).
Also we rarely free at the end. The OS does it much better than free(). Only the mandatory DESTROY calls and FileIO finalizers are executed.
Can Python actually return memory pages back to the OS, e.g. by sbrk() with negative argument?
I'm currently having a problem with this, where I load a large deep learning model into "CPU" memory then move it to the GPU, but I can't get rid of the memory reserved by the process.
I can't answer your exact question, but any large allocations/deallocations should be handled by mmap under the hood and in those cases the memory should be returned.
In your case you should first consider the possibility that there is a pointer to your model's objects that is for some reason not being released. It might simply be that even though you are moving your model the GPU and maybe removing any of your own references, there might be internal references to your model's data that is hidden from you. At least something to consider.
edit: To add to this, I'm now quite sure (though I could be wrong!), that whether python does or does not use sbrk with a negative value is beyond the scope of python. Python is making use of malloc/free under the hood:
There's some flexibility for wrapping free in different ways in that file, but it seems like it'll basically always be using free at the core. At least on my system in a debugger I just verified that. So if it's true that python by default uses malloc/free, then the question of whether sbrk with a negative number comes into play is more a question of how your libc implements malloc/free.
Of course I might be wrong, but I think that you should probably stop worrying about it at that level and instead look into object references first as I detailed above.
Which framework do you use for deep learning? It can allocate some object on its own.
Can you give me some stats when using a model and after it's no longer in use and can't be accessible? You can get it by calling the sys._debugmallocstats() function.
My article describes the tricks which are used inside the Python interpreter. Every improvement in the interpreter saves an insane amount of computing power considering its popularity.
Maybe I've missed something, but you can also get the same id (it's basically an address in the memory) because of the how memory allocation works. There is a special allocator in CPython, which preallocates big chunks of memory and constantly reusing it without allocation overhead. I have an article on this too.
Ahh.. interesting.
Then I guess using the id of the tuples to show that they are using the same object doesn't exactly prove the point.
To your original point (that tuples reallocate based on length) I see that if I delete a tuple of length 3 and then create a tuple of length 5 I see the id is immediately changed. So that's correct.
Lists on the other hand seem to keep reallocating the same address in my limited test.
Strings seem to behave like tuples. When I delete a string and create a new one it creates a new object with a new address.... unless the strings are of the same length.
Perhaps this is no real revelation, I'm rather new to python and spending my time poking around to see how it works. :)
> Then I guess using the id of the tuples to show that they are using the same object doesn't exactly prove the point
Without the `del a`, it does, because they both have active references. If they were unique objects, we'd see a unique ID for `b` as long as a reference to `a` is active.
> Strings seem to behave like tuples. When I delete a string and create a new one it creates a new object with a new address.... unless the strings are of the same length.
String _objects_ (as opposed to variables referring to them) are immutable in Python. They tend to be allocated anew, but for optimization reasons you can end up with cases where the string objects have the same ID. Like here:
>>> a = 'asdf'
>>> id(a)
4389881424
>>> a = 'qwerty'
>>> id(a)
4395015672
>>> a = 'asdf'
>>> id(a)
4389881424
'asdf' has the same ID with no deling involved because its object wasn't GC'd yet.
Below are some relevant links if you want to knock yourself out (some do) but I write an awful lot of Python and even for me this is well into the realm of "what happens when..." after a few beers or job interview trivia.
Most of my posts are about Python's internals and some security stuff