Quiz: Spark basics
Below are a few questions that should come handy in the first go :
- Spark Architecture ? Cluster types, modes and spot instances ? Mounting storage ? Job vs Stage vs Task ?
- Actions vs Transformations ? Directed Acyclic Graphs? Lazy Evaluation ?
- RDD vs Dataframe vs Dataset ? Parquet file vs Avro file ?
- StructType vs StructField? Delta lake ? Time travel ?
- Syntax errors vs Exceptions ?
- startsWith() vs endsWith() ? withColumn vs select vs withColumnRenamed ? Map vs FlatMap ? Why to use ‘literals’ ?
- .collect() ? show vs display ? How to display full values of a column ?
- Create RDD from a list ? Create RDD from a textfile ? Current_date vs current_timestamp ?
- Reading and writing a file ? Create empty dataframe ?
- Convert dataframe to rdd and rdd to dataframe ?
- Broadcast variable, explode, coalesce and repartition ?
- Merge or union two dataframes with different number of columns ?
- Iterate through eachrow of dataframe in pyspark ?
- How to handle NULL values ?