DatabricksSpark @DatabricksSpark - Twitter Profile

Pinned Tweet

DatabricksSpark

@DatabricksSpark

23 days ago

0

77

DatabricksSpark

@DatabricksSpark

10 days ago

Why should we avoid using Interactive Clusters for production workloads ? Interactive Clusters seem convenient. But convenience can become a risk. Since Interactive Clusters are long-running and often shared: • library changes can affect other workloads • cluster state may persist between runs • debugging code and production code can coexist • resource contention becomes common That’s why most prod ETL pipelines move toward: → Job Clusters → Serverless Jobs → isolated execution environments Data engineering is not only about making pipelines work. It’s about making them predictable, reproducible, and stable at scale. #Databricks #ApacheSpark #DataEngineering #Lakehouse #DatabricksInterviewPrep

0

13

DatabricksSpark retweeted

Apache Spark

@ApacheSpark

13 days ago

Apache Spark 4.1 is out today. 🚀 AI data agents are now common in data engineering. They're also a real risk in production: tool sprawl and the glue code required to run real pipelines create a huge surface area for silent errors. The cost is wasted time and wasted compute on jobs you only notice are broken three hours into a four-hour run. Three architectural changes in 4.1 shrink that surface area. 1️⃣ Spark Declarative Pipelines (SDP) 2️⃣ Real-Time Mode 3️⃣ Spark Connect + Project Feather Three architectural changes. One platform shape. Fewer surfaces for the agent to drift on. Less technical debt as you ship. 👉 Get started: https://t.co/dnakBRz8IE #ApacheSpark #DataEngineering #OSS #AIagents

5

82

20

50

9K

DatabricksSpark

@DatabricksSpark

14 days ago

Can we use Serverless and Classic compute together in Databricks? For a single task, usually you pick one. But in a multi-task Databricks Job, yes each task can use different compute. Example: Bronze load takes 2 mins → use Serverless Heavy transformation takes longer → use Classic job compute for more control That flexibility is actually useful when designing real pipelines. #Databricks #DataEngineering #AzureDatabricks #Lakehouse

DatabricksSpark

@DatabricksSpark

14 days ago

Serverless is great when I want Databricks to handle the compute side and let me focus more on the pipeline logic. Classic / job compute still makes sense when I need more control over cluster config, libraries, runtime, sizing, or cost behavior. Both have their place. The key is not to treat one as a replacement for everything. #Databricks #DataEngineering #PySpark #Lakehouse

0

39

0

18

DatabricksSpark

@DatabricksSpark

14 days ago

External Locations in Unity Catalog made more sense to me when I looked at them from the Azure side. It is not about giving every Databricks user an ADLS key. You create a Storage Credential using an Azure Managed Identity, map it to an ADLS path through an External Location, and then control access through Unity Catalog permissions. Azure RBAC controls the identity. Unity Catalog controls the user access. That separation is what makes it cleaner. #Databricks #UnityCatalog #AzureDatabricks #DataGovernance #Lakehouse

DatabricksSpark

@DatabricksSpark

15 days ago

Unity Catalog is often explained as a security layer, but I think that undersells it. For me, the bigger value is how it brings structure to the whole Databricks environment. Data organization, permissions, lineage, auditing, storage governance, and cross-workspace consistency all start coming together in one place. That is when Databricks starts feeling much easier to manage at scale. #Databricks #UnityCatalog #DataGovernance #Lakehouse

0

44

0

27

DatabricksSpark

@DatabricksSpark

14 days ago

Serverless is great when I want Databricks to handle the compute side and let me focus more on the pipeline logic. Classic / job compute still makes sense when I need more control over cluster config, libraries, runtime, sizing, or cost behavior. Both have their place. The key is not to treat one as a replacement for everything. #Databricks #DataEngineering #PySpark #Lakehouse

0

39

DatabricksSpark

@DatabricksSpark

15 days ago

Unity Catalog is often explained as a security layer, but I think that undersells it. For me, the bigger value is how it brings structure to the whole Databricks environment. Data organization, permissions, lineage, auditing, storage governance, and cross-workspace consistency all start coming together in one place. That is when Databricks starts feeling much easier to manage at scale. #Databricks #UnityCatalog #DataGovernance #Lakehouse

0

44

DatabricksSpark

@DatabricksSpark

16 days ago

Volumes in Unity Catalog helped me understand that not everything in Databricks has to be a table. Sometimes you need governed access to files: CSV, JSON, Images, ML artifacts, Config files, Raw landing data That’s where Volumes fit nicely. #Databricks #UnityCatalog #DataEngineering #PySpark

0

29

DatabricksSpark

@DatabricksSpark

16 days ago

External Locations in Unity Catalog are underrated. They make the connection between cloud storage and Databricks much more controlled. You are not just pointing to an S3/ADLS/GCS path randomly. You define who can access which storage location and under what governance boundary. #Databricks #UnityCatalog #DataLakehouse

0

18

DatabricksSpark

@DatabricksSpark

17 days ago

One thing I like about Unity Catalog is that access control becomes much easier to reason about. Instead of managing permissions separately across workspaces, storage paths, and random tables, you can centralize governance at the catalog/schema/table level. #Databricks #UnityCatalog #DataGovernance #BigData

0

17

DatabricksSpark

@DatabricksSpark

17 days ago

Unity Catalog finally made Databricks feel like a proper governed data platform to me. Earlier, it was easy to create tables, jobs, notebooks, and access patterns everywhere. But UC forces you to think clearly: Catalog → Schema → Table / View / Volume That structure is how control and governance become manageable. #Databricks #UnityCatalog #DataEngineering #Lakehouse

0

23

DatabricksSpark

@DatabricksSpark

18 days ago

One big mental shift with Databricks: Where does the hardware actually live? In a Hybrid setup, your VMs and resources sit under your own Azure/AWS/GCP subscription. But with Serverless, that infrastructure lives in the Databricks subscription instead. It’s the difference between managing the plumbing yourself vs. just turning on the tap. #Databricks #Azure #DataEngineering #CloudComputing

0

1

0

29

DatabricksSpark

@DatabricksSpark

18 days ago

Nothing kills a budget faster than "mystery" cloud costs because someone forgot to label their resources. Stop manually policing cluster usage. Use Azure Policy to enforce tagging if the ProjectID and Environment tags aren't there, the cluster simply doesn't start. Guardrails always beat "asking nicely" when the bill comes due. #Azure #CloudGovernance #DataEngineering #FinOps #Databricks

0

8

DatabricksSpark

@DatabricksSpark

19 days ago

Moving a production pipeline to Databricks Serverless is a total shift for CI/CD. Honestly, it's such a relief to stop obsessing over whether the cluster config is "perfect" and just focus on the logic instead. The only caution: Serverless removes infra friction, not cost discipline. You still need to know your process and resources they can consume using strong monitoring. We’re finally getting close to that "Write Code, Run Data" dream at scale. #PySpark #SoftwareEngineering #DataEngineering #Databricks

DatabricksSpark

@DatabricksSpark

19 days ago

The biggest mental shift with Databricks Serverless is accepting that the compute plane no longer lives inside your Azure subscription. At first, it feels strange not seeing those VMs in the Azure Portal. But not having to troubleshoot “Subscription quota exceeded” errors anymore is a massive productivity win. #Databricks #DataOps #BigData

0

101

0

35

DatabricksSpark

@DatabricksSpark

19 days ago

A little controversial opinion, Serverless isn’t just about speed; it’s a total shift in unit economics. For bursty, 2-minute ETL jobs, paying the DBU premium is actually cheaper than keeping a Classic idle-cluster alive or paying for the 5-minute startup time of a cold VM. In the end it is all about the "Total Cost of Ownership." #DataStrategy #TechStack #Databricks

DatabricksSpark

@DatabricksSpark

19 days ago

The biggest mental shift with Databricks Serverless is accepting that the compute plane no longer lives inside your Azure subscription. At first, it feels strange not seeing those VMs in the Azure Portal. But not having to troubleshoot “Subscription quota exceeded” errors anymore is a massive productivity win. #Databricks #DataOps #BigData

0

101

0

21

DatabricksSpark

@DatabricksSpark

19 days ago

Autoscaling on Hybrid clusters always felt a bit reactive you’re constantly waiting for Azure to provision hardware during peak loads. Serverless compute hits different. The "warm pool" architecture means you actually get the elasticity that cloud marketing has been promising for years. #Scaling #FinOps #Databricks

DatabricksSpark

@DatabricksSpark

19 days ago

The biggest mental shift with Databricks Serverless is accepting that the compute plane no longer lives inside your Azure subscription. At first, it feels strange not seeing those VMs in the Azure Portal. But not having to troubleshoot “Subscription quota exceeded” errors anymore is a massive productivity win. #Databricks #DataOps #BigData

0

101

0

17

DatabricksSpark

@DatabricksSpark

19 days ago

The biggest mental shift with Databricks Serverless is accepting that the compute plane no longer lives inside your Azure subscription. At first, it feels strange not seeing those VMs in the Azure Portal. But not having to troubleshoot “Subscription quota exceeded” errors anymore is a massive productivity win. #Databricks #DataOps #BigData

0

101

DatabricksSpark

@DatabricksSpark

19 days ago

In Azure Databricks, you can choose where your data sits. The "Hybrid" model lets you use your own private network (VNet) for maximum security. The catch? You are now the "landlord." You have to manage the security rules (NSGs) and ensure you don’t run out of IP addresses. Great for security, but heavy on manual maintenance. #AzureDatabricks #DataEngineering #CloudSecurity

0

36

DatabricksSpark

@DatabricksSpark

20 days ago

Why pay for a platform? Because of Photon Engine. Running Spark tasks up to 20x faster because of a C++ vectorized engine isn't just a "nice to have" it’s a massive cost saver for enterprise-scale workloads. #PySpark #TechStack #Databricks

0

26

DatabricksSpark

@DatabricksSpark

20 days ago

Why Databricks ? There comes a point in every data project where manual tuning doesn't scale. If you're tired of OOM errors and wasted spend on idle nodes, it’s time. Features like Serverless and Autoscaling make sure you only pay for what you actually compute. Efficient scaling is the goal. #FinOps #DataPipelines #Databricks

DatabricksSpark

@DatabricksSpark

20 days ago

Why databricks ? The "plumbing" of Big Data is the biggest time-sink. Setting up Spark on-prem or manual VMs is a headache. Moving to a managed SaaS like Databricks means spinning up clusters in seconds across AWS, Azure, or GCP. #CloudComputing #DataArchitecture

0

74

0

35

DatabricksSpark

@DatabricksSpark

20 days ago

Why databricks ? The "plumbing" of Big Data is the biggest time-sink. Setting up Spark on-prem or manual VMs is a headache. Moving to a managed SaaS like Databricks means spinning up clusters in seconds across AWS, Azure, or GCP. #CloudComputing #DataArchitecture

DatabricksSpark

@DatabricksSpark

20 days ago

Why Databricks ? Stop thinking of Databricks as just "Cloud Spark." It’s the difference between buying an engine and buying a Car. With the optimized runtime and Delta Lake, you’re getting a Lakehouse architecture that handles ACID transactions on top of your raw data. #BigData #Lakehouse #Databricks

0

59

0

74

DatabricksSpark

@DatabricksSpark

Last Seen Users on Sotwe

Trends for you

Most Popular Users