Phil's BigData Recipes

Follow me on GitHub

Most recent articles:


Apache Spark in Python and Scala (work in progress)


§ 0.1 Preamble: The bdrecipes codebase
§ 0.2 Preamble: Setting up Apache Spark

§1 Apache Spark on Docker



§9 A deep dive into the Catalyst Optimizer


§10.1 Query Plan Hermeneutics
§10.2 Query Plan Exercises




Miscellaneous

Spark memory compartments and formulas


4 BigData Riddles and profiling Spark / PySpark


Benchmarks, Graal, Apache Spark, JMH, Scala


Data structure zoo Runtime complexities of data structures in Scala, Java, Python


Spark APIs Dataframe, Dataset, RDD examples for Apache Spark in Java, Scala, Python