Python violates that definition however, by allowing internal newlines in string...

danbruc · on March 20, 2024

Python's behavior is not a hack, it is the common behavior. $ matches at the end of the string or before the last character if that is a newline, which is logically the same as the end of a single line. But as you said, you can have additional newlines inside of the string which is also the common behavior and not specific to python. Personally I think of this as you just assume that the string is a single line and match $ accordingly, either at the end of the string or before a terminating newline, if there are additional newlines, you treat them mostly as normal characters, with the exception of dot not matching newlines unless you set the single-line/dot-all flag.

sltkr · on March 20, 2024

> Python's behavior [..] is the common behavior.

The very post we're commenting on shows that that's not true: PHP, Python, Java and .NET (C#) share one behavior (accept "\n" as "$"), and ECMAScript (Javascript), Golang, and Rust share another behavior (do not accept "\n" as $).

Let's not argue about which is “the most common”; all of these languages are sufficiently common to say that there is no single common behavior.

> $ matches at the end of the string or before the last character if that is a newline, which is logically the same as the end of a single line.

Yes, that is Python's behavior (and PHP's, Java's, etc.). You're just describing it; not motivating why it has to work that way or why it's more correct than the obvious alternative of only matching the end of the string.

Subjectively, I find it odd that /^cat$/ matches not just the obvious string "cat" but also the string "cat\n". And I think historically, it didn't. I tried several common tools that predate Python:

  - awk 'BEGIN { print ("cat\n" ~ /^cat$/) }' prints 0
  - in GNU ed, /^M/ does not match any lines
  - in vim, /^M/ does not match any lines
  - sed -n '/\n/p' does not print any lines
  - grep -P '\n' does not match any lines
  - (I wanted to try `grep -E` too but I don't know how to escape a newline)
  - perl -e 'print ("cat\n" =~ /^cat$/)' prints 1

So the consensus seems to be that the classic UNIX line-based tools match the regex against the line excluding the newline terminator (which makes sense since it isn't part of the content of that line) and therefore $ only needs to match the end of the string.

The odd one out is Perl: it seems to have introduced the idea that $ can match a newline at the end of the string, probably for similar reasons as Python. All of this suggests to me that allowing $ to match both "\n" and "" at the end of the string was a hack designed to make it easier to deal with strings without control characters and string that end with a single newline.

danbruc · on March 21, 2024

So the consensus seems to be that the classic UNIX line-based tools match the regex against the line excluding the newline terminator (which makes sense since it isn't part of the content of that line) and therefore $ only needs to match the end of the string.

If you read a line, you usually remove the newline at the end but you could also keep it as Python does. If you remove the newline, then a line can never contain a newline, the case cat\n can never occur. If you keep the newline, there will be exactly one newline as the last character and you arguably want cat$ to match cat\n because that newline is the end of the line but not part of the content. It makes perfect sense that $ matches at the end of the string or before a newline as the last character as it will do the right thing whether or not you strip the newline.

If you want cat$ to not match cat\n, then you are obviously not dealing with lines, you have a string with a newline at the end but you consider this newline part of the content instead of terminating the line. But ^ and $ are made for lines, so they do not work as expected. I also get what people are complaining about, if you are not in multi-line and have a proper line with at most one newline at the end, then it will behave exactly as if you are in multi-line which raises the question why you would have those two modes to begin with. Not multi-line only behaves differently if you have additional newlines or one newline not at the end, that is if you do not have a proper line, so why should $ still behave as if you were dealing with a line?

Izkata · on March 20, 2024

> - in vim, /^M/ does not match any lines

But /\n/ does

sltkr · on March 21, 2024

Thanks for the correction! That's interesting.