Hacker Newsnew | past | comments | ask | show | jobs | submit | st553's commentslogin

My dad passed away from glioblastoma 2 weeks ago. My mom and I spent every day with him before his passing which helped us not live with any regrets after. After he died, my mom and I would do things he loved to do. It was painful do those things knowing he wasn't physically there... but it was comforting to do them as a way feel his presence. My dad loved camping with his trailer (I know nothing about trailers). I spent a couple days going through all his tools, learning how to setup a trailer and properly hitch it, then going on a camping trip for a few days with my mom. I don't think we will ever get over his loss but its nice to do things that you think would make him proud.


I've been using PIA for a few years and have been disappointed to see an increasing number of websites blocking VPN access.


That and CloudFlare ... more and more frequently I've been asked to solve those really annoying "pick seven of these sixteen pictures that have X in them" captchas. Those take way too long and I'll often just leave the site instead of answering it.


Just to expand on this, maybe a job board should tell you what the interview process is like. Whether its take home test, whiteboard, in person project, etc.


Yes, and this being a field that you can filter by.


My recommendation for preparing for tech interviews at the big tech companies is to answer as many questions as you can from here:

http://www.programcreek.com/2012/11/top-10-algorithms-for-co...

Use coderpad.io to write code for practice passing the phone screen.


My recommendation is to write a polite reply saying why these are a waste of time and why you won't be doing it, and ask them if they are still interested in interviewing.


I actually did that a few times, most of them run away from me. I said I don't have patience to do linked lists, determine if 4 points form a square type of questions, etc, as that does not prove anything. It is easy to game the system to pass these questions, but then most people that passes writes the most horrible un-maintainable code.


>Not everything needs a schema.

Any good examples?


One example would be ingesting structured or semi structured data from sources that you don't control.

You may know some invariants, but much can change without notice. So you want to be able to work with the structure you have without preventing non-conforming data from entering your system.

In some cases schema conformance is just delayed, in other cases it is never achieved completely or not even a goal.


> one example would be ingesting data from structured or semi-structured sources that you don't control

Can you give a more specific example?


For instance, we need to retrieve statistical data on various macro economic indicators from various statistics offices and international organisations. There is considerable overlap in the fields they use but it's rarely exact and often you can't merge them because they do not refer to the exact same entity or the data uses incompatible units. It's impossible to properly model all of it before storing it because so much changes all the time and it's all noisy and partly broken.

A similar thing happens when you retrieve data on securities and companies from various exchanges, from the SEC, from national registries all over the world or you try to include XBRL from different countries.

And then you often have documents (like quarterly reports) that contain structured fields and tables but not in a formally specified syntax. You don't know exactly what fields will be in those documents before you parse them. So you parse the documents, store key/value pairs, and then you clean them up gradually.

There are tons of situations like this in data integration. It's a never ending cleanup and merge process. You can use RDBMS for all of that but they're not always the best tool for the job (but they are still my preferred tool most of the time).


Having worked on that sort of process many times, I'm of the opinion that a message queue is the ideal solution there, not a database. If you're storing the data for the purpose of processing it again later, it should probably be ephemeral and fast, rather than long-lived and flexible.


That doesn't work for us (beyond the first stage), because the fields we extract from the original source are not ephemeral.

We need to store the key/value pairs and explore them in a reasonably productive fashion (i.e using queries) in order to come up with machine learning algorithms. And any new algorithms we write need access to all historical data.


Metrics. A metric has a known source, a timestamp, a name and a value. It can also ship with any arbitrary number of descriptive fields.

Similarly, events.


The known source might very well be expressed as a relation between two entities: a metric entity and a source entity.

The source entity, more often than not, is also complemented with other data that needs to remain a part of the persistence layer.


Metadata can be quite variable. Library, catalogue, picture tags. The majority of terms are common, but some can be pretty specific and (as a developer) you'd need to store them. You might not have control over the schema or even have a "finite" set of possibilities.

Imagine you want to store random metadata from a digital camera picture, or perhaps even XML/HTML attributes. You can create another table and add each new attribute – join on query – but if you don't plan to search for that data directly, it's easier to skip normalisation and dump the original set into a JSON(B) or HStore field. You don't have to add every possible attribute to your data model or schema, you can carry data along and not analyse it if it's not relevant to you.


At a previous company I worked I worked at there was a table that maxed out postgresql's column limit. That did not need to be that wide. It was much better suited as a ~30 column table with a single hstore column (does the key exist? return the value, otherwise? null), as 99% of each of the rows for those columns were completely empty, and PGSQL does not support sparse tables (the "right" solution here).


If 3/4 of a software engineers day is communicating I hope they aren't staying at work till 9pm doing actual work.


>Once you have some solid offers lined up

Does this actually work for people? My experience interviewing for software engineering roles is that it's a time consuming and tedious process. I can't imagine juggling a full time job while interviewing with more than one company at a time.


My one experience doing it has only involved one job change/interview/offer, so I'm not sure what the typical experience is.

After I grew dissatisfied with my first job out of college, partially because I'd just been there five years and wanted to try something else, partially for ethical reasons (the new owners were hosting fundraisers for Jenny McCarthy), I sent resumes to three companies, got a callback from one, did a phone screen, took a day off work for the interview, and then got the offer a few weeks later. Gave three weeks notice, took a month off, was at a new job about three months after I first decided to leave the old.


> Gave three weeks notice, took a month off

This is absolutely not a typical experience. In most cases, software companies are very hesitant to give you more than 3 weeks.


Same here


Anyone know if its possible or makes sense to use something like t-sne for dimensionality reduction? If so, could the reduced data set be used to build a classifier?


This hasen't been my experience. I've been referred a few times before and still thrown to the whiteboard to reverse a linked list.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: