I quoted the section from the Python module here. [1]
If you do not specify multi-line, bar$ matches a lines ending in bar, either foobar\n or foobar if the terminating newline has been removed or does not exist. If you specify multi-line, then it will also match at every bar\n within the string. So it either treats your input as a single line or as multiple lines. You can of course not specify multi-line and still pass in a string with additional newlines within the string, but then those newlines will be treated more or less as any other character, bar$ will not match bar\n\n. The exception is that dot will not match them except you set the single-line/dot-all flag, bar\n$ will match bar\n\n but bar.$ will not unless you specify the single-line/dot-all flag.
I would even agree with you that it seems a bit weird. If you have a proper line without additional newlines in the middle, then multi-line behaves exactly like not multi-line. Not multi-line only behaves differently if you confront it with multiple lines and I have no good idea how you would end up in a situation where you have multiple lines and want to treat them as one unit but still treat the entire thing as if it was a line.
The docs do not say what you're saying. Your phrasing is completely different, and the part where "if ^/$ are in the pattern then the haystack is treated as a single line" is completely made up. As far as I can tell, that's your rationalization for how to make sense of this behavior. But it is not a story supported by the actual regex engine docs. The actual docs say, "^ matches only at the beginning of the string, and $ matches only at the end of the string and immediately before the newline (if any) at the end of the string." The docs do not say, "the string is treated as a single line when ^/$ are used in the pattern." That's your phrasing, not anyone else's. That's your story, not theirs.
I still have not seen anything from you that makes sense of the behavior that `cat$` does not match `cat\n\n`. Like, I realize you've tried to explain it. But your explanation does not make sense. That's because the behavior is strange.
The only actual way to explain the behavior of $ is what the `re` docs say: it either matches at the end of the string or just before a `\n` that appears at the end of the string. That's it.
You are right, it is my wording, I replaced end of string or before newline as the last character with end of line because that is what this means. You could also write that into the documentation but then you would have to also explain what end of line means. And I will grant you that I might be wrong, that the behavior is only accidentally identical to matching the end of a line but that the true reason for it is different.
cat$, the $ matches the end of the line, the second \n, cat is not directly before that. I guess you want the regex engine to first treat the input as a multi-line input, extract cat\n as the first line, and then have cat$ match successfully in that single line? What about cat$ and dog$ and cat\ndog\n.
Ignoring compatibility concerns, I would want the regex engine to behave the same way RE2, Go's regexp package and Rust's regex engine behave. I remember specifically considering Cox's decision ~10 years ago when writing the initial implementation of the regex crate. I thought Perl's (and Python's) behavior on this point was whacky then and I still think it's whacky now. So I followed RE2's semantics.
The OP is right to be surprised by this. And folks will continue to be surprised by it for eternity because it's an extremely subtle corner case that doesn't have a consistent story explaining its behavior. (I know you have proffered one, but I don't find it consistent in the context of a general purpose regex engine that searches arbitrary strings and not just lines.)
Of course, compatibility is a trump card here. I've acknowledged that. Changing this behavior now would be too hard. The best you could probably do is some kind of migration, where you provide the more "sensible" behavior behind an opt-in flag. And then maybe Python 4 enables it by default. But it's a lot of churn, and while people will continue to be confounded by this so long as the behavior exists, it probably isn't a Huge & Common Deal In Practice. So it may not be worth fixing. But if you're starting from scratch? Yes, please don't implement $ this way. It should match the end of the string when 'm' is disabled and the end of any line (including end of string and possibly being Unicode aware, depending on how much you care about that) when 'm' is enabled.
If you do not specify multi-line, bar$ matches a lines ending in bar, either foobar\n or foobar if the terminating newline has been removed or does not exist. If you specify multi-line, then it will also match at every bar\n within the string. So it either treats your input as a single line or as multiple lines. You can of course not specify multi-line and still pass in a string with additional newlines within the string, but then those newlines will be treated more or less as any other character, bar$ will not match bar\n\n. The exception is that dot will not match them except you set the single-line/dot-all flag, bar\n$ will match bar\n\n but bar.$ will not unless you specify the single-line/dot-all flag.
I would even agree with you that it seems a bit weird. If you have a proper line without additional newlines in the middle, then multi-line behaves exactly like not multi-line. Not multi-line only behaves differently if you confront it with multiple lines and I have no good idea how you would end up in a situation where you have multiple lines and want to treat them as one unit but still treat the entire thing as if it was a line.
[1] https://news.ycombinator.com/item?id=39765086