My dad passed away from glioblastoma 2 weeks ago. My mom and I spent every day with him before his passing which helped us not live with any regrets after. After he died, my mom and I would do things he loved to do. It was painful do those things knowing he wasn't physically there... but it was comforting to do them as a way feel his presence. My dad loved camping with his trailer (I know nothing about trailers). I spent a couple days going through all his tools, learning how to setup a trailer and properly hitch it, then going on a camping trip for a few days with my mom. I don't think we will ever get over his loss but its nice to do things that you think would make him proud.
That and CloudFlare ... more and more frequently I've been asked to solve those really annoying "pick seven of these sixteen pictures that have X in them" captchas. Those take way too long and I'll often just leave the site instead of answering it.
Just to expand on this, maybe a job board should tell you what the interview process is like. Whether its take home test, whiteboard, in person project, etc.
My recommendation is to write a polite reply saying why these are a waste of time and why you won't be doing it, and ask them if they are still interested in interviewing.
I actually did that a few times, most of them run away from me. I said I don't have patience to do linked lists, determine if 4 points form a square type of questions, etc, as that does not prove anything. It is easy to game the system to pass these questions, but then most people that passes writes the most horrible un-maintainable code.
One example would be ingesting structured or semi structured data from sources that you don't control.
You may know some invariants, but much can change without notice. So you want to be able to work with the structure you have without preventing non-conforming data from entering your system.
In some cases schema conformance is just delayed, in other cases it is never achieved completely or not even a goal.
For instance, we need to retrieve statistical data on various macro economic indicators from various statistics offices and international organisations. There is considerable overlap in the fields they use but it's rarely exact and often you can't merge them because they do not refer to the exact same entity or the data uses incompatible units. It's impossible to properly model all of it before storing it because so much changes all the time and it's all noisy and partly broken.
A similar thing happens when you retrieve data on securities and companies from various exchanges, from the SEC, from national registries all over the world or you try to include XBRL from different countries.
And then you often have documents (like quarterly reports) that contain structured fields and tables but not in a formally specified syntax. You don't know exactly what fields will be in those documents before you parse them. So you parse the documents, store key/value pairs, and then you clean them up gradually.
There are tons of situations like this in data integration. It's a never ending cleanup and merge process. You can use RDBMS for all of that but they're not always the best tool for the job (but they are still my preferred tool most of the time).
Having worked on that sort of process many times, I'm of the opinion that a message queue is the ideal solution there, not a database. If you're storing the data for the purpose of processing it again later, it should probably be ephemeral and fast, rather than long-lived and flexible.
That doesn't work for us (beyond the first stage), because the fields we extract from the original source are not ephemeral.
We need to store the key/value pairs and explore them in a reasonably productive fashion (i.e using queries) in order to come up with machine learning algorithms. And any new algorithms we write need access to all historical data.
Metadata can be quite variable. Library, catalogue, picture tags. The majority of terms are common, but some can be pretty specific and (as a developer) you'd need to store them. You might not have control over the schema or even have a "finite" set of possibilities.
Imagine you want to store random metadata from a digital camera picture, or perhaps even XML/HTML attributes. You can create another table and add each new attribute – join on query – but if you don't plan to search for that data directly, it's easier to skip normalisation and dump the original set into a JSON(B) or HStore field. You don't have to add every possible attribute to your data model or schema, you can carry data along and not analyse it if it's not relevant to you.
At a previous company I worked I worked at there was a table that maxed out postgresql's column limit. That did not need to be that wide. It was much better suited as a ~30 column table with a single hstore column (does the key exist? return the value, otherwise? null), as 99% of each of the rows for those columns were completely empty, and PGSQL does not support sparse tables (the "right" solution here).
Does this actually work for people? My experience interviewing for software engineering roles is that it's a time consuming and tedious process. I can't imagine juggling a full time job while interviewing with more than one company at a time.
My one experience doing it has only involved one job change/interview/offer, so I'm not sure what the typical experience is.
After I grew dissatisfied with my first job out of college, partially because I'd just been there five years and wanted to try something else, partially for ethical reasons (the new owners were hosting fundraisers for Jenny McCarthy), I sent resumes to three companies, got a callback from one, did a phone screen, took a day off work for the interview, and then got the offer a few weeks later. Gave three weeks notice, took a month off, was at a new job about three months after I first decided to leave the old.
Anyone know if its possible or makes sense to use something like t-sne for dimensionality reduction? If so, could the reduced data set be used to build a classifier?