But when you create a dataframe in spark, that schema needs to be defined - or if it’s sql takes the form of the columns returned.
Use of Python can create hotspots with data transfers between spark and the Python gateway. Python UDFs are a common culprit.
Either way, my point is there are architectural and design points to your data solution that can cause many more problems than choice of language.
But when you create a dataframe in spark, that schema needs to be defined - or if it’s sql takes the form of the columns returned.
Use of Python can create hotspots with data transfers between spark and the Python gateway. Python UDFs are a common culprit.
Either way, my point is there are architectural and design points to your data solution that can cause many more problems than choice of language.