Having tried dozens of other frameworks, Pyspark is the way.

  • A computer 💻

Install Miniconda3 (Optional, any Python3.8+ should be fine)

Download the appropriate version for your OS.

python -m webbrowser https://docs.conda.io/en/latest/miniconda.html

For example, on my Mac I would do

#!/usr/bin/env bash
curl -O https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-MacOSX-x86_64.sh
bash ./Miniconda3-py39_4.9.2-MacOSX-x86_64.sh

Install Python Packages

conda install --yes pyspark jupyter openjdk

Launch Jupyter Lab

jupyter lab

Create a new Spark session and load your data

from pyspark.shell import *
df = spark.read.csv("data.csv", header=True)

Originally published at kite.com.

Introduction to Pandas

So, what is Pandas — practically speaking? In short, it’s the major data analysis library for Python. For scientists, students, and professional developers alike, Pandas represents a central reason for any learning or interaction with Python, as opposed to a statistics-specific language like R, or a proprietary academic package like SPSS or Matlab. (Fun fact — Pandas is named after the term Panel Data, and was originally created for…

Developing a web application with Python can seem like a daunting task. At first glance, the language’s strengths would appear to be geared towards scripting more so than full-blown application development. Building a production grade web application using only the Standard Library would be a grueling exercise. However, Python’s modular import system, pip, and a handful of external packages can make application development a much more palatable process, maybe even "fun" at times.

Configuring your environment

For Python purists, I highly recommend pipenv. For those looking for maximum flexibility and development platform independence, the only way to go is Docker. …

The easiest way to convert pdf to ePub is by using an application called Calibre.

If you want more control over the process, the following provides step by step instructions using a handful of command line tools.

UNIX command line tools

Homebrew can be used to install the above programs.

Some useful regex commands for cleaning up files converted to Markdown from pdf using:

  1. pdftohtml; then
  2. pandoc

The regex commands below execute as described when using find and replace in Sublime Text 3. 2

FIND: (^\*\*\s+)(CHAPTER\s+\d+ )(\s?\*?\*?)REPLACE: ## \2

Find all instances of the word “CHAPTER” (case insensitive…

Zachary Wilson

Software Engineer

