Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I realise the article was written for a specific audience for which this may be obvious, but what is the difference between data scientist and data engineer (in terms of what their job is)?


Generally speaking:

"Data engineering" means building systems that can manipulate data (e.g. storing, retrieving, and delivering it). There are usually fairly well-defined functional requirements about what the system is supposed to do, plus goals about performance and reliability that might be slightly more nebulous.

"Data science" means building systems that can draw conclusions from data. The functional requirement is usually some form of "accuracy", as measured somehow against some kind of human evaluation of the same conclusion.

Concretely: a data engineer might be asked to build a system that can ingest every tweet posted to Twitter, and return the 10 most widely-used hashtags in the last hour. A data scientist might be asked to build a system that looks at a tweet and figures out what language it's written in, or whether it's spam, or whether an attached image is pornographic.


Data scientist actually cook up and run the statistical/ML models on data and write reports about their "findings".

However, the data that data scientists want to use is often messy and comes from varied sources. Hence, data engineers do supporting infra work like cleaning/loading data from different databases, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: