Batch vs Incremental
The idea behind incremental processing is quite simple. Incremental processing extends the semantics of processing streaming data to batch processing pipelines by processing only new data each run and then incrementally updating the new results. This unlocks great cost savings due to much shorter batch pipelines as well as data freshness speedups due to being able to run them much more frequently as well.
👓 Case Study: Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi