Odd that this list made you so angry you had to post how boring the people who follow must be? I feel sorry for you. Honestly. Judgement of others tends to make people miserable.
Features:
- Connect to Kafka Brokers: You can manage connections to different Kafka brokers all in one place.
- Send Messages: Sending messages to Kafka topics is just a few clicks away.
- Listen to Topics: Listen to messages from Kafka topics in real time, right in your browser.
- Filter Topics: Use regular expressions to filter and find exactly what you’re looking for in your topics.
- Responsive Design: The interface is clean and looks great on any device, thanks to Tailwind CSS.
- Persistent Connections: Your Kafka connections are saved in an SQLite database, so you don’t have to re-enter them every time.
def streaming_algorithm(A, epsilon, delta):
# Initialize parameters
p = 1
X = set()
thresh = math.ceil((12 / epsilon ** 2) * math.log(8 * len(A) / delta))
# Process the stream
for ai in A:
if ai in X:
X.remove(ai)
if random.random() < p:
X.add(ai)
if len(X) == thresh:
X = {x for x in X if random.random() >= 0.5}
p /= 2
if len(X) == thresh:
return '⊥'
return len(X) / p
# Example usage
A = [1, 2, 3, 1, 2, 3]
epsilon = 0.1
delta = 0.01
output = streaming_algorithm(A, epsilon, delta)
print(output)
I don't think there is a single variable name or comment in this entire code block that conveys any information. Name stuff well! Especially if you want random strangers to gaze upon your code in wonder.
well the paper also contains the code so I doubt anyone who looked at the paper cares about this paste - for folks who did not read the paper this is not very readable
OP is following the same variable names of the article. I prefer that over changing the variable names and then figuring out what variable name maps in code to the article.
Speaking of, one of my favorite discoveries with Unicode is that there is a ton of code points acceptable for symbol identifiers in various languages that I just can't wait to abuse.
Besides the ideas from istjohn, empath-nirvana, and rcarmo, you can also just "flip the script": solve for epsilon and report that as 1-delta confidence interval for the worst case data distribution as here: https://news.ycombinator.com/item?id=40388878
Best case error is of course zero, but if you look at my output then you will see as I did that the worst case is a very conservative bound (i.e. 15X bigger than what might "tend to happen". That matters a lot for "space usage" since the error =~ 1/sqrt(space) implying you need a lot more space for lower errors. 15^2 = 225X more space. Space optimization is usually well attended for this kind of problem. And, hey, maybe you know something about the input data distribution?
So, in addition to the worst case bound, average case errors under various distributional scenarios would be very interesting. Or even better "measuring as you go" enough distributional meta data to get a tighter error bound. That latter starts to sound like it's one of Knuth's Hard Questions Which if You Solve He'll Sign your PhD Thesis territory, though. Maybe a starting point would be some kind of online entropy(distribution) estimation, perhaps inspired by https://arxiv.org/abs/2105.07408 . And sure, maybe you need to bound the error ahead of time instead of inspecting it at any point in the stream.
You would want to calculate the threshold by choosing your target epsilon and delta and an 'm' equal to the largest conceivable size of the stream. Fortunately, the threshold increases with log(m), so it's inexpensive to anticipate several orders of magnitude more data than necessary. If you wanted, you could work backwards to calculate the actual 'epsilon' and 'delta' values for the actual 'm' of the stream after the fact.
You actually don't need to do that part in the algorithm. If you don't know the length of the list, you can just choose a threshold that seems reasonable and calculate the margin of error after you're done processing. (or i guess at whatever checkpoints you want if it's continuous)
In this example, they have the length of the list and choose the threshold to give them a desired margin of error.
An error condition. I decided to do away with it and take a small hit on the error by assuming the chances of the trimmed set being equal to the threshold are very small and that the error condition is effectively doing nothing.
I also changed the logic from == to >= to trigger unfailingly, and pass in the "window"/threshold to allow my code to work without internal awareness of the length of the iterable:
from random import random
def estimate_uniques(iterable, window_size=100):
p = 1
seen = set()
for i in iterable:
if i not in seen:
seen.add(i)
if random() > p:
seen.remove(i)
if len(seen) >= window_size:
seen = {s for s in seen if random() < 0.5}
p /= 2
return int(len(seen) / p)
I also didn't like the possible "set thrashing" when an item is removed and re-added for high values of p, so I inverted the logic. This should work fine for any iterable.
My point is that there is a difference between a Python function's returning false and the function's raising an error, and sometimes the difference really matters, so it would be regrettable if logic teachers actually did use ⊥ to mean false because programming-language theorists use it to mean something whose only reasonable translation in the domain of practical programming is to raise an error.
Yes, this seems like ChatGPT’s style of writing. Another post has a conclusion that matches ChatGPT’s conclusion style:
> Remember, namespaces are a powerful feature that requires careful configuration and management. With proper knowledge and implementation, you can harness the full potential of Linux namespaces to create robust and secure systems.
A clue from this post itself is that all the links were added to the intro because GPT won’t intersperse links throughout.
e:Softened my language since there’s no way to know, and w/e ChatGPT is smart anyway. Better to judge content on the merits anyway imo
chatGPT is amazing if you have not mastered the language you are writing in. its what it is for. give it some text, have it rewrite it. that its generated by it doesnt mean any content was produced by it. often ppl just use it to rephrase. imho thats what its for. (hard to tell tho which is which :D)
Yep, I do use GPT as one of the tools in my workflow. I write these blogs in markdown locally and have a helper script which takes the raw content and with a prompt it helps me generate a title, summary, Intro and conclusion (personal preference to keep these consistent on all blogs) and proofread the whole raw content for any mistakes (replaces grammarly completely now).
Quite happy with this workflow because it helps me publish articles more frequently where I don't have to worry about stuff other than just dumping my thoughts in raw format.
Its similar to how I use Astro as a tool to generate static pages from these markdown files to easily deploy on web or TailwindCSS etc etc you get the point.
Noted. Although, adding that on top of each blog is very repetitive info and not specifically related to the blog. I will find a better place to add this info but will definitely add this by today, most likely under the `/uses/` page.
I would still put it in some prominent place, similar to how newspapers put "sponsored". I.e I wouldn't want unknowingly to read a blog written (in large part) by chatgpt and would feel deceived when it wasn't clear from the start.
I got really frustrated reading the article because most of the text felt like padding without actually explaining why these specific commands were necessary. Definitely felt like LLM output.
Back when I don't even know what PHP is, i was dreaming of creating this Facebook like social media. Now 7 years later, I have the knowledge of creating. I was not even interested in creating another social media, just because I know I am nobody, I am not an MIT graduate, I don't have a network, my social/communication is terrible. Now I am settled on a working day job, those big ambitions are buried when I fully grasp the reality that I am not gonna be that one lucky enough to launch a successful billion dollar social networking app.
This method used to work for me, but i don't know lately it's barely even touch my feeling about the very vast universe, and how tiny and unsignificant we are.
Because the universe around us is unreachable, for a long time coming. It simply does not matter to our daily life that we are insignificant. We can observe, explore and expand knowledge. But never touch the surface of any other planet or feel the warmth of another star. Your neighbor matters more to your life than the Andromeda galaxy.
Not even the closest star changes or affects life on Earth in time scales relevant to us. For all intents and purposes we are an isolated microcosmos, save maybe some astronomic catastrophes unlikely to happen within humanity's span of existence (which isn't even a single million years yet).
Not as i expected but, great work, continue, maybe in the future it doesn't just a display of only 5 different things that i already know, and one of them isn't clickable.