HOWTO: A Minimalist Pyspark Setup for Getting Things Done

Having tried dozens of other frameworks, Pyspark is the way.

Requirements

  • A computer 💻

Steps:

Install Miniconda3 (Optional, any Python3.8+ should be fine)

Download the appropriate version for your OS.

For example, on my Mac I would do

Install Python Packages

Launch Jupyter Lab

Create a new Spark session and load your data