DerbyPy Intro to PySpark
This month at DerbyPy I provided a high
level introduction to PySpark. For this talk I went over the Spark execution
model at a high level, talked about the difference between the PySpark
Dataframe and RDD api, and provided some examples of how to use both. As part
of this I put together a jupyter notebook and some scripts that can be used via
spark-submit along with instructions on how to run
If you're interested in the material and presentation they can be found here.