(1/2) As promised, @RTeveth and I just published the second #blog post in our series about @ApacheSpark Dynamic Partition Inserts, based on our production experience at @nielsen!
In part 2, we deep dive into how Dynamic Partition Inserts works, the different S3 connectors used...
(1/2) @RTeveth and I just published a post (the first part in a series) about @ApacheSpark Dynamic Partition Inserts, and we think you'll find it interesting:
https://t.co/zL2lNPdZn2
Our #bigdata group at @nielsen uses #ApacheSpark to process 10’s of TBs of raw data...
(2/3) Luckily, my colleague, @RTeveth (@nielsen) is working on contributing his work to integrate the aforementioned operator into #ApacheAirflow, and with the generous help of some its committers (e.g @CzerwonyElmo and @kaxil), we’re hoping to get it merged before #Airflow 2.0!
(1/3) If you want to run @ApacheSpark on @kubernetesio, you have a few alternatives, e.g Spark-on-K8s-operator by @GCPcloud (https://t.co/Z7ZriCgbNy).
But what if you want to schedule your jobs using @ApacheAirflow?
That narrows-down your options.