CodeBarbarian @SoftwareSiege - Twitter Profile

CodeBarbarian

@SoftwareSiege

2 months ago

What is the difference between a quiet NaN and a signaling NaN?

0

23

CodeBarbarian

@SoftwareSiege

2 months ago

Ans: false - instanceof operator returns false if LHS operand is null, regardless of the type on RHS - It doesn't throw a NPE (null-safe operator by design) or cause a CTE; it simply evaluates to false because a null ref isn't an instance of any class - The only time instanceof causes a CTE is if the types are inconvertible (instanceof only works if a cast between 2 types is theoretically possible - safe cast check - only applies when compiler has enough info) E.g. "10" instanceof Integer --> CTE --- *Static analysis tools (IDEs, linters) may flag such checks as redundant, but this is NOT part of the JLS E.g. if (obj != null && obj instanceof String) --> != null check is functionally redundant In modern #Java (16+), we use Pattern Matching to capitalize on this exact logic: E.g. if (obj instanceof String s) --> 's' is already null-checked + casted if compatible --- Reification: *The instanceof operator requires reifiable type on RHS - Reifiable type = type info is fully available at RT E.g. String, String[], int[], List<?>, raw List - Generic type can be used with instanceof ONLY if the type is reifiable --> We cannot use instanceof with a specific generic type because of Type Erasure E.g. Non-Reifiable (CTE): - list instanceof List<String> - list instanceof List<? super String> - list instanceof List<? extends String> --> use list instanceof List<?> (unbounded wildcard) ⚠️avoid raw type like list instanceof List --- Workaround for Generics (instanceof T): - Since obj instanceof T is a CTE (JVM "forgets" what T is at RT), use a Class Token - Store the class: private final Class<T> type; - Check dynamically: type.isInstance(obj); Example: public class Validator<T> { private final Class<T> type; // capture class at construction public Validator(Class<T> type) { this.type = type; } public boolean isValid(Object obj) { // dynamic equivalent of instanceof return type.isInstance(obj); } } Validator<String> val = new Validator<>(String.class); System.out.println(val.isValid("Hello")); // true --> Functional equivalent of instanceof for generic types *isInstance(null) also returns false

CodeBarbarian

@SoftwareSiege

2 months ago

What will be the output of the following #Java code? String nullText = null; System.out.println(nullText instanceof String);

0

205

0

131

CodeBarbarian

@SoftwareSiege

2 months ago

What will be the output of the following #Java code? var v1 = Integer.toString(-1, 2); var v2 = Integer.toBinaryString(-1); System.out.println(v1.equals(v2));

1

0

157

CodeBarbarian

@SoftwareSiege

2 months ago

@javarevisited Blue-Green Deployment: - Two full ENVs: Blue=old, Green=new - Flip 100% traffic at once - Fast rollback, zero downtime - Not gradual

SoftwareSiege's tweet photo. @javarevisited Blue-Green Deployment:
- Two full ENVs: Blue=old, Green=new
- Flip 100% traffic at once
- Fast rollback, zero downtime
- Not gradual https://t.co/HyKhMIc1B5

0

2

0

958

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

2 months ago

Remembered preference = state placed at the correct persistence layer *Client-side = fast, device-scoped persistence (no NW) *Server-side = durable, user-scoped persistence (cross-device) *SSR Boundary = consistency between server render & client state *Zero-effort/Native = OS-driven default, overridden by app logic *Choose storage based on lifespan (how long it must survive) + scope (who/where should see it) --- Client-side (device-scoped persistence): *Memory (JS state): short-lived (per render/tab) *sessionStorage: survives reloads, dies with tab *localStorage: survives browser restarts --> Fast, no NW, limited to same device *Cookie (hybrid layer) --> sent with every HTTP request --> enables SSR to render correct theme immediately (avoids "flash of wrong theme") *Multi-tab sync (Storage Events): window.addEventListener('storage', ...) --> keeps multiple tabs consistent on same device *** Server-side (user-scoped persistence): Stored in DB against user profile --> survives logout, device change --> enables consistent UX across devices *optimistic sync = write to localStorage instantly (snappy UI) + async sync to server (eventual consistency) *** SSR/Client Boundary (Consistency layer): Hydration mismatch: - Server renders theme (from cookie) - Client overrides (from localStorage) --> mismatch (flicker/inconsistent UI) Fix: resolve theme before first paint (inline script in <head>) OR defer override until mount (useEffect) *** Zero-effort/Native Way: prefers-color-scheme (OS-level preference) --> browser reads OS theme, no app storage needed *override precedence (source of truth): User Override (localStorage) > Server Pref (DB) > System Pref (OS / prefers-color-scheme)

0

2

1

2

433

CodeBarbarian

@SoftwareSiege

2 months ago

Remembered preference = state placed at the correct persistence layer *Client-side = fast, device-scoped persistence (no NW) *Server-side = durable, user-scoped persistence (cross-device) *SSR Boundary = consistency between server render & client state *Zero-effort/Native = OS-driven default, overridden by app logic *Choose storage based on lifespan (how long it must survive) + scope (who/where should see it) --- Client-side (device-scoped persistence): *Memory (JS state): short-lived (per render/tab) *sessionStorage: survives reloads, dies with tab *localStorage: survives browser restarts --> Fast, no NW, limited to same device *Cookie (hybrid layer) --> sent with every HTTP request --> enables SSR to render correct theme immediately (avoids "flash of wrong theme") *Multi-tab sync (Storage Events): window.addEventListener('storage', ...) --> keeps multiple tabs consistent on same device *** Server-side (user-scoped persistence): Stored in DB against user profile --> survives logout, device change --> enables consistent UX across devices *optimistic sync = write to localStorage instantly (snappy UI) + async sync to server (eventual consistency) *** SSR/Client Boundary (Consistency layer): Hydration mismatch: - Server renders theme (from cookie) - Client overrides (from localStorage) --> mismatch (flicker/inconsistent UI) Fix: resolve theme before first paint (inline script in <head>) OR defer override until mount (useEffect) *** Zero-effort/Native Way: prefers-color-scheme (OS-level preference) --> browser reads OS theme, no app storage needed *override precedence (source of truth): User Override (localStorage) > Server Pref (DB) > System Pref (OS / prefers-color-scheme)

0

2

1

2

433

CodeBarbarian

@SoftwareSiege

2 months ago

What will be the output of the following #Java code? String nullText = null; System.out.println(nullText instanceof String);

0

205

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

It's all about layered, memory-first gatekeeping + smart request handling *** *Memory-first layers = latency hiding - L0: Reserved Word Filter = static list/regex, O(1) local check E.g. "admin," "support" - L1: Bloom Filter = probabilistic, in-memory, tiny footprint (~1-2GB for billions of usernames) - L2: Redis = exact hot handles + Negative Cache (taken names cached with TTL), RAM lookup < 1ms - L3: DB = authoritative truth, sharded + indexed --> Most requests never reach DB --> scales to hundreds of millions of users *Debouncing prevents unnecessary pipeline execution Each layer adds a tradeoff: - Reserved Words = limited, static - Bloom = tiny false positives, requires counting/Cuckoo filters for deletions - Redis = memory cost + cache TTL tuning - DB = authoritative but slow *** Handle check = conveyor belt with VIP gates: - Reserved Word Filter = forbidden? Stop! - Bloom filter = Definitely free? Go! - Redis = Hot VIP names, check here - DB = Final authority --> By the time "Taken" is displayed, the request has passed through 4 optimized layers in ms --- Username Lookup = Multi-Layer Pipeline *L0 – Reserved Word Filter (Static Gatekeeper): - Purely local O(1) string check - Blocks reserved/profane handles immediately --> Avoids NW / memory cost *** *L1 - Bloom Filter (Probabilistic Gatekeeper): *Answers "Definitely No" --> stop immediately *Answers "Maybe Yes" --> go to next layer ⚡ Avoids DB reads for ~99% of free usernames ⚠️ Deletions/Staleness: Standard Bloom Filters cannot remove items --> Counting Bloom Filters increment/decrement counters per bit to add/remove items --> Cuckoo Filters store actual fingerprints in slots & can move (relocate) them to make room when deleting, allowing removals --> both incur extra memory overhead *** *L2 - Redis/In-Memory Cache: Stores exact hot handles + Negative Cache for taken names (short TTL) --> RAM lookup < 1ms --> Ensures repeated "taken" queries never hit DB ⚠️ Cache hit ratio matters: keeping hot handles in RAM = massive performance win *** *Smart Request Handling/Debouncing: Client waits 300-500ms after last keystroke before sending request --> Reduces redundant queries/char --> smooth UX + lower system load *** *L3 - Distributed DB (Source of Truth): - Only accessed if cache cannot confirm availability - Sharded + indexed --> fast O(log N) lookup --> Last-resort guarantee; optimized pipeline rarely touches DB! ⚠️ Write-Through Consistency: - When a handle is claimed, Bloom Filter and Redis must be updated --> Bloom Filter: set bits to 1; deletions require Counting/Cuckoo filters OR periodic full rebuilds --> Redis: update/invalidate TTL to maintain correctness https://t.co/MnO3SI7UHq

SoftwareSiege's tweet photo. It's all about layered, memory-first gatekeeping + smart request handling

***

*Memory-first layers = latency hiding
- L0: Reserved Word Filter = static list/regex, O(1) local check
E.g. "admin," "support"
- L1: Bloom Filter = probabilistic, in-memory, tiny footprint (~1-2GB for billions of usernames)
- L2: Redis = exact hot handles + Negative Cache (taken names cached with TTL), RAM lookup < 1ms
- L3: DB = authoritative truth, sharded + indexed
--> Most requests never reach DB
--> scales to hundreds of millions of users

*Debouncing prevents unnecessary pipeline execution

Each layer adds a tradeoff:
- Reserved Words = limited, static
- Bloom = tiny false positives, requires counting/Cuckoo filters for deletions
- Redis = memory cost + cache TTL tuning
- DB = authoritative but slow

***

Handle check = conveyor belt with VIP gates:
- Reserved Word Filter = forbidden? Stop!
- Bloom filter = Definitely free? Go!
- Redis = Hot VIP names, check here
- DB = Final authority
--> By the time "Taken" is displayed, the request has passed through 4 optimized layers in ms

---

Username Lookup = Multi-Layer Pipeline
*L0 – Reserved Word Filter (Static Gatekeeper):
- Purely local O(1) string check
- Blocks reserved/profane handles immediately
--> Avoids NW / memory cost

***

*L1 - Bloom Filter (Probabilistic Gatekeeper):
*Answers "Definitely No"
--> stop immediately

*Answers "Maybe Yes"
--> go to next layer

⚡ Avoids DB reads for ~99% of free usernames

⚠️ Deletions/Staleness: Standard Bloom Filters cannot remove items
--> Counting Bloom Filters increment/decrement counters per bit to add/remove items
--> Cuckoo Filters store actual fingerprints in slots & can move (relocate) them to make room when deleting, allowing removals
--> both incur extra memory overhead

***

*L2 - Redis/In-Memory Cache:
Stores exact hot handles + Negative Cache for taken names (short TTL)
--> RAM lookup < 1ms
--> Ensures repeated "taken" queries never hit DB

⚠️ Cache hit ratio matters: keeping hot handles in RAM = massive performance win

***

*Smart Request Handling/Debouncing:
Client waits 300-500ms after last keystroke before sending request
--> Reduces redundant queries/char
--> smooth UX + lower system load

***

*L3 - Distributed DB (Source of Truth):
- Only accessed if cache cannot confirm availability
- Sharded + indexed
--> fast O(log N) lookup
--> Last-resort guarantee; optimized pipeline rarely touches DB!

⚠️ Write-Through Consistency:
- When a handle is claimed, Bloom Filter and Redis must be updated
--> Bloom Filter: set bits to 1; deletions require Counting/Cuckoo filters OR periodic full rebuilds
--> Redis: update/invalidate TTL to maintain correctness

https://t.co/MnO3SI7UHq

0

1

87

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

Because they solve different bottlenecks: *Replication = same data, multiple copies --> scales reads + availability (writes still hit primary) *Sharding = horizontal partitioning across nodes (rows split across machines) --> scales writes + data size (load distributed across shards) --- *All sharding is horizontal partitioning, but NOT all horizontal partitioning is sharding *Horizontal partitioning (Local single DB) = split rows across partitions (range/hash/list) --> improves manageability + query performance, but NOT true horizontal scaling *Sharding (Distributed) = horizontal partitioning across machines --> enables real scalability (compute + storage) --- Vertical partitioning = split columns E.g. user_core vs user_profile Used when: - Some columns are rarely accessed (cold data) - Reduce row size / I/O cost - Improve cache efficiency - Isolate sensitive/large fields E.g. BLOBs --- ⚠️ Replication is simple but limited; sharding scales but introduces system-level complexity: *Cross-shard joins: extremely expensive/avoided (data must be co-located/denormalized) *Distributed transactions: costly & complex --> often replaced with eventual consistency (ACID to BASE - Basically Available, Soft state, Eventual consistency) *Resharding complexity: redistributing data when shards grow is operationally hard (far more complex than adding replicas)

SoftwareSiege's tweet photo. Because they solve different bottlenecks:
*Replication = same data, multiple copies
--> scales reads + availability (writes still hit primary)

*Sharding = horizontal partitioning across nodes (rows split across machines)
--> scales writes + data size (load distributed across shards)

---

*All sharding is horizontal partitioning, but NOT all horizontal partitioning is sharding

*Horizontal partitioning (Local single DB) = split rows across partitions (range/hash/list)
--> improves manageability + query performance, but NOT true horizontal scaling

*Sharding (Distributed) = horizontal partitioning across machines
--> enables real scalability (compute + storage)

---

Vertical partitioning = split columns
E.g. user_core vs user_profile

Used when:
- Some columns are rarely accessed (cold data)
- Reduce row size / I/O cost
- Improve cache efficiency
- Isolate sensitive/large fields
E.g. BLOBs

---

⚠️ Replication is simple but limited; sharding scales but introduces system-level complexity:
*Cross-shard joins:
extremely expensive/avoided (data must be co-located/denormalized)

*Distributed transactions:
costly & complex
--> often replaced with eventual consistency (ACID to BASE - Basically Available, Soft state, Eventual consistency)

*Resharding complexity:
redistributing data when shards grow is operationally hard (far more complex than adding replicas)

0

2

1

244

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

⚠️ Why even use async if we're going to block immediately? --> BUT it depends where/who blocks 🤔 Are we blocking at the right layer or too early? *Async isn't about avoiding blocking - it's about deferring it (where we block + how long + what + how failures propagate) *get() is fine at the right boundary, but if used inside flow, it collapses async into sync & kills concurrency *** *Async = non-blocking composition + latency hiding --> deferring the wait, not eliminating it *get() = blocking sync point --> decides where the wait happens *** *It's not about syntax - it's about execution model + arch - Async = order food --> keep working --> eat later - get() at boundary = wait when food arrives 👍 - get() in middle = stand at door immediately 🤦‍♂️ --- CompletableFuture = Non-blocking pipelines - supplyAsync() = offloads work - thenApply()/thenCompose() = defines continuations (callbacks) - get() = introduces blocking barrier Similar to streams: - Streams: terminal op triggers execution - CompletableFuture: get() forces sync --- *get() must be at the end of the system (boundary) Execution Models: *Blocking Model (Servlet/ Thread-per-request): HTTP response is sync --> Thread must eventually produce result --> Blocking is inevitable, BUT defer get() as late as possible (at boundary) Example: public String controller() throws Exception { return CompletableFuture.supplyAsync(...) .thenApply(...) .get(); } --> OK: async/sync boundary (needed to return a value, no further async composition expected) --> Blocking is intentional & contained --> Gains: overlapping I/O, better utilization *Modern blocking frameworks (e.g. Spring MVC) can return CompletableFuture, effectively behaving non-blocking at the boundary *** *Non-Blocking Model (Reactive/Event-loop): No thread-per-request (e.g. WebFlux, Netty) --> Framework handles callbacks/continuation --> Never call get() Example: public CompletableFuture<String> controller() { return CompletableFuture.supplyAsync(...) .thenApply(...); } --> No blocking --> Framework: Registers callback + Releases thread + Writes response on completion *** ⚠️ *Blocking Too Early (Problem): Example: public String service() throws Exception { return CompletableFuture.supplyAsync(...) .thenApply(...) .get(); } --> BAD: Async/Sync collapse, no latency hiding --> Breaks composition + introduces hidden blocking upstream What breaks: *Concurrency collapse --> caller thread blocks *Thread inefficiency: Worker thread does work & caller thread waits --> 2 threads for 1 task! *Breaks composability: Cannot chain further async ops --> Forces sync model upstream *Thread pool starvation risk: Blocking inside async flows can exhaust thread pools --> Leads to throughput collapse under load --- *Blocking systems: defer get() to boundary (or preferably return CompletableFuture) *Non-blocking systems: never call .get() - return the future & let the framework handle it --- ⚠️ Timeouts: Never use indefinite .get() --> prefer failing the pipeline itself via .orTimeout(...) (Java 9+) over just timing out the blocking thread .get(timeout) - propagates failure early & keeps the async flow consistent E.g. future.orTimeout(5, TimeUnit.SECONDS) (better) vs future.get(5, TimeUnit.SECONDS) ⚠️ Common Pool Trap: Default ForkJoinPool.commonPool() is JVM-wide --> blocking it can starve unrelated tasks (incl. parallel streams) --> prefer custom executors (requires handling ThreadLocal context propagation explicitly) E.g. CompletableFuture.supplyAsync(task, customExecutor); *Exception Handling (Pipeline): Prefer handling errors inside the async pipeline (exceptionally, handle) OR use join() to avoid checked exception noise & keep composition clean E.g. future.exceptionally(ex -> fallback); ⚠️ Sync Boundary Exception Trap: get() wraps failures in ExecutionException --> when crossing async/sync boundary, always unwrap via e.getCause() to avoid masking real errors

SoftwareSiege's tweet photo. ⚠️ Why even use async if we're going to block immediately?
--> BUT it depends where/who blocks

🤔 Are we blocking at the right layer or too early?

*Async isn't about avoiding blocking - it's about deferring it (where we block + how long + what + how failures propagate)

*get() is fine at the right boundary, but if used inside flow, it collapses async into sync & kills concurrency

***

*Async = non-blocking composition + latency hiding
--> deferring the wait, not eliminating it

*get() = blocking sync point
--> decides where the wait happens

***

*It's not about syntax - it's about execution model + arch
- Async = order food --> keep working --> eat later
- get() at boundary = wait when food arrives 👍
- get() in middle = stand at door immediately 🤦‍♂️

---

CompletableFuture = Non-blocking pipelines
- supplyAsync() = offloads work
- thenApply()/thenCompose() = defines continuations (callbacks)
- get() = introduces blocking barrier

Similar to streams:
- Streams: terminal op triggers execution
- CompletableFuture: get() forces sync

---

*get() must be at the end of the system (boundary)

Execution Models:
*Blocking Model (Servlet/ Thread-per-request):
HTTP response is sync
--> Thread must eventually produce result
--> Blocking is inevitable, BUT defer get() as late as possible (at boundary)

Example:
public String controller() throws Exception {
return CompletableFuture.supplyAsync(...)
.thenApply(...)
.get();
}
--> OK: async/sync boundary (needed to return a value, no further async composition expected)
--> Blocking is intentional & contained
--> Gains: overlapping I/O, better utilization

*Modern blocking frameworks (e.g. Spring MVC) can return CompletableFuture, effectively behaving non-blocking at the boundary

***

*Non-Blocking Model (Reactive/Event-loop):
No thread-per-request (e.g. WebFlux, Netty)
--> Framework handles callbacks/continuation
--> Never call get()

Example:
public CompletableFuture<String> controller() {
return CompletableFuture.supplyAsync(...)
.thenApply(...);
}
--> No blocking
--> Framework: Registers callback + Releases thread + Writes response on completion

***

⚠️ *Blocking Too Early (Problem):
Example:
public String service() throws Exception {
return CompletableFuture.supplyAsync(...)
.thenApply(...)
.get();
}
--> BAD: Async/Sync collapse, no latency hiding
--> Breaks composition + introduces hidden blocking upstream

What breaks:
*Concurrency collapse
--> caller thread blocks

*Thread inefficiency: Worker thread does work & caller thread waits
--> 2 threads for 1 task!

*Breaks composability: Cannot chain further async ops
--> Forces sync model upstream

*Thread pool starvation risk: Blocking inside async flows can exhaust thread pools
--> Leads to throughput collapse under load

---

*Blocking systems: defer get() to boundary (or preferably return CompletableFuture)
*Non-blocking systems: never call .get() - return the future & let the framework handle it

---

⚠️ Timeouts: Never use indefinite .get()
--> prefer failing the pipeline itself via .orTimeout(...) (Java 9+) over just timing out the blocking thread .get(timeout) - propagates failure early & keeps the async flow consistent
E.g. future.orTimeout(5, TimeUnit.SECONDS) (better) vs future.get(5, TimeUnit.SECONDS)

⚠️ Common Pool Trap:
Default ForkJoinPool.commonPool() is JVM-wide
--> blocking it can starve unrelated tasks (incl. parallel streams)
--> prefer custom executors (requires handling ThreadLocal context propagation explicitly)
E.g. CompletableFuture.supplyAsync(task, customExecutor);

*Exception Handling (Pipeline): Prefer handling errors inside the async pipeline (exceptionally, handle) OR use join() to avoid checked exception noise & keep composition clean
E.g. future.exceptionally(ex -> fallback);

⚠️ Sync Boundary Exception Trap: get() wraps failures in ExecutionException
--> when crossing async/sync boundary, always unwrap via e.getCause() to avoid masking real errors

0

1

311

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

401 Unauthorized = Who are you? Prove it (Unauthenticated) --> Invitation to try again with credentials 403 Forbidden = I know you, but you're not allowed (Authenticated but forbidden) --> A hard "No" based on policy 404 Not Found = I know you, but I'm not telling you if this exists (High-security masking/BOLA mitigation) --> Blinding the attacker --- 401 Unauthorized (Unauthenticated/Authn Issue): - Misnomer: "Unauthorized" in HTTP spec actually means Unauthenticated - Server doesn't know who the client is OR credentials are invalid/expired --> Client must authenticate (login, refresh token) E.g. Accessing /user/profile without a valid token - Often accompanied by WWW-Authenticate header to indicate how to authenticate --- 403 Forbidden (Authz Issue): - Server knows the client, but policy denies access (RBAC/ABAC) --> Authn won't help; access is explicitly forbidden - Some systems also use 403 for rate-limiting or IP blocks (WAF - Web App Firewall) Example: Token decoded -- User "Bob" authenticated -- But policy says "Bob does not have scope: admin" --- 404 Not Found (High-Security Variant): - Used to mask existence of resources & mitigate Broken Object Level Authz (BOLA) attacks - Returning 403 could leak resource existence; returning 404 hides it --> Prevents attackers from enumerating resource IDs/mapping DB - Common in APIs handling sensitive objects (docs, user data, payment info) Example: User tries GET /api/documents/123 -- User is authenticated but does not own document 123 *403: "You are not allowed to access document 123" --> leaks existence *404: "Resource not found" --> hides existence

0

1

2

265

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

IllegalStateException on 2nd forEach() Streams = single-pass, lazy, memory-efficient pipelines --> Any intermediate or terminal operation consumes the original stream *** *Streams = single-pass pipelines --> A terminal operation consumes the stream --> Creating a derived stream (e.g. via filter, map, flatMap) also consumes the original stream (becomes upstream for the new pipeline) --> The linkedOrConsumed flag + spliterator ensures single-use semantics *Multi-pass traversal is not supported without materializing elements/recreating the stream *Streams do not store data --> They operate lazily, processing elements ONLY as they flow through the pipeline *Terminal operations are mandatory to trigger computation --> Without them, the stream pipeline does NOTHING! *Design rationale: ensures memory efficiency + avoids hidden state + prevents accidental reprocessing of non-repeatable sources *Stream = conveyor belt, Spliterator = motor that moves elements along --> Once the motor runs or is linked to another belt, the original belt cannot be replayed --- Streams vs Collections: - Streams are not containers. They define a sequence of operations on elements - Elements are processed once, triggered by a terminal operation *** Lazy evaluation & efficiency: - Intermediate operations (e.g. map, filter) do not execute immediately; they just define transformations - Computation happens only when a terminal operation runs --> Enables efficiency: - On-demand processing: no unnecessary computation - Short-circuiting: operations like anyMatch/allMatch/findFirst can stop processing as soon as result is determined Example: boolean hasEven = Stream.of(1, 3, 5, 6, 7).anyMatch(x -> x % 2 == 0); --> stops at 6, does NOT process 7 --> If streams were eager, all elements would have to be processed regardless *** Multi-pass is difficult: - Supporting multiple passes would require buffering all elements --> breaks laziness + increases memory usage - Streams may come from non-repeatable sources (I/O, NW, DB cursors) --> replaying may be impossible/ unsafe *** Terminal operation & source consumption: - After a terminal operation (e.g. forEach, collect) executes, the stream is marked as consumed - Using a stream to create a derived stream (any intermediate operation) also consumes the original stream, which now acts as upstream --> Any further attempt to use it throws IllegalStateException *Internally, this is tracked via the linkedOrConsumed flag in AbstractPipeline --> Monitors whether the spliterator (underlying element provider of the stream) has been linked/consumed --> Once set, the original stream ref is effectively unusable Example: Stream<Integer> s = Stream.of(1, 2, 3); Stream<Integer> s2 = s.filter(n -> n > 1); s.forEach(System.out::println); --> Throws IllegalStateException --- Handling Multiple Passes: *Materialize into a collection: List<Integer> list = Stream.of(1,2,3).collect(Collectors.toList()); list.forEach(System.out::println); list.forEach(System.out::println); *Recreate the stream from the source: Stream.of(1,2,3).forEach(System.out::println); Stream.of(1,2,3).forEach(System.out::println); --- *Stream.teeing() (Java 12+): - Splits a single-pass stream into two consumers simultaneously - Useful for two derived results from one traversal without manual buffering Example - compute sum & count in one pass: Stream<Integer> numbers = Stream.of(1, 2, 3, 4); Map<String, Number> result = numbers.collect( Collectors.teeing( Collectors.summingInt(n -> n), Collectors.counting(), (sum, count) -> Map.of("sum", sum, "count", count) ) ); System.out.println(result); --> {sum=10, count=4} ⚠️ Even with teeing(), the original stream is consumed - it's just that both downstream collectors are applied in one pass

SoftwareSiege's tweet photo. IllegalStateException on 2nd forEach()

Streams = single-pass, lazy, memory-efficient pipelines
--> Any intermediate or terminal operation consumes the original stream

***

*Streams = single-pass pipelines
--> A terminal operation consumes the stream
--> Creating a derived stream (e.g. via filter, map, flatMap) also consumes the original stream (becomes upstream for the new pipeline)
--> The linkedOrConsumed flag + spliterator ensures single-use semantics
*Multi-pass traversal is not supported without materializing elements/recreating the stream
*Streams do not store data
--> They operate lazily, processing elements ONLY as they flow through the pipeline
*Terminal operations are mandatory to trigger computation
--> Without them, the stream pipeline does NOTHING!
*Design rationale: ensures memory efficiency + avoids hidden state + prevents accidental reprocessing of non-repeatable sources

*Stream = conveyor belt, Spliterator = motor that moves elements along
--> Once the motor runs or is linked to another belt, the original belt cannot be replayed

---

Streams vs Collections:
- Streams are not containers. They define a sequence of operations on elements
- Elements are processed once, triggered by a terminal operation

***

Lazy evaluation & efficiency:
- Intermediate operations (e.g. map, filter) do not execute immediately; they just define transformations
- Computation happens only when a terminal operation runs
--> Enables efficiency:
- On-demand processing: no unnecessary computation
- Short-circuiting: operations like anyMatch/allMatch/findFirst can stop processing as soon as result is determined

Example:
boolean hasEven = Stream.of(1, 3, 5, 6, 7).anyMatch(x -> x % 2 == 0);
--> stops at 6, does NOT process 7
--> If streams were eager, all elements would have to be processed regardless

***

Multi-pass is difficult:
- Supporting multiple passes would require buffering all elements
--> breaks laziness + increases memory usage
- Streams may come from non-repeatable sources (I/O, NW, DB cursors)
--> replaying may be impossible/ unsafe

***

Terminal operation & source consumption:
- After a terminal operation (e.g. forEach, collect) executes, the stream is marked as consumed
- Using a stream to create a derived stream (any intermediate operation) also consumes the original stream, which now acts as upstream
--> Any further attempt to use it throws IllegalStateException

*Internally, this is tracked via the linkedOrConsumed flag in AbstractPipeline
--> Monitors whether the spliterator (underlying element provider of the stream) has been linked/consumed
--> Once set, the original stream ref is effectively unusable

Example:
Stream<Integer> s = Stream.of(1, 2, 3);
Stream<Integer> s2 = s.filter(n -> n > 1);
s.forEach(System.out::println);
--> Throws IllegalStateException

---

Handling Multiple Passes:
*Materialize into a collection:
List<Integer> list = Stream.of(1,2,3).collect(Collectors.toList());
list.forEach(System.out::println);
list.forEach(System.out::println);

*Recreate the stream from the source:
Stream.of(1,2,3).forEach(System.out::println);
Stream.of(1,2,3).forEach(System.out::println);

---

*Stream.teeing() (Java 12+):
- Splits a single-pass stream into two consumers simultaneously
- Useful for two derived results from one traversal without manual buffering

Example - compute sum & count in one pass:
Stream<Integer> numbers = Stream.of(1, 2, 3, 4);

Map<String, Number> result = numbers.collect(
Collectors.teeing(
Collectors.summingInt(n -> n),
Collectors.counting(),
(sum, count) -> Map.of("sum", sum, "count", count)
)
);

System.out.println(result);
--> {sum=10, count=4}

⚠️ Even with teeing(), the original stream is consumed - it's just that both downstream collectors are applied in one pass

0

6

1

5

652

CodeBarbarian

@SoftwareSiege

3 months ago

It's all about layered, memory-first gatekeeping + smart request handling *** *Memory-first layers = latency hiding - L0: Reserved Word Filter = static list/regex, O(1) local check E.g. "admin," "support" - L1: Bloom Filter = probabilistic, in-memory, tiny footprint (~1-2GB for billions of usernames) - L2: Redis = exact hot handles + Negative Cache (taken names cached with TTL), RAM lookup < 1ms - L3: DB = authoritative truth, sharded + indexed --> Most requests never reach DB --> scales to hundreds of millions of users *Debouncing prevents unnecessary pipeline execution Each layer adds a tradeoff: - Reserved Words = limited, static - Bloom = tiny false positives, requires counting/Cuckoo filters for deletions - Redis = memory cost + cache TTL tuning - DB = authoritative but slow *** Handle check = conveyor belt with VIP gates: - Reserved Word Filter = forbidden? Stop! - Bloom filter = Definitely free? Go! - Redis = Hot VIP names, check here - DB = Final authority --> By the time "Taken" is displayed, the request has passed through 4 optimized layers in ms --- Username Lookup = Multi-Layer Pipeline *L0 – Reserved Word Filter (Static Gatekeeper): - Purely local O(1) string check - Blocks reserved/profane handles immediately --> Avoids NW / memory cost *** *L1 - Bloom Filter (Probabilistic Gatekeeper): *Answers "Definitely No" --> stop immediately *Answers "Maybe Yes" --> go to next layer ⚡ Avoids DB reads for ~99% of free usernames ⚠️ Deletions/Staleness: Standard Bloom Filters cannot remove items --> Counting Bloom Filters increment/decrement counters per bit to add/remove items --> Cuckoo Filters store actual fingerprints in slots & can move (relocate) them to make room when deleting, allowing removals --> both incur extra memory overhead *** *L2 - Redis/In-Memory Cache: Stores exact hot handles + Negative Cache for taken names (short TTL) --> RAM lookup < 1ms --> Ensures repeated "taken" queries never hit DB ⚠️ Cache hit ratio matters: keeping hot handles in RAM = massive performance win *** *Smart Request Handling/Debouncing: Client waits 300-500ms after last keystroke before sending request --> Reduces redundant queries/char --> smooth UX + lower system load *** *L3 - Distributed DB (Source of Truth): - Only accessed if cache cannot confirm availability - Sharded + indexed --> fast O(log N) lookup --> Last-resort guarantee; optimized pipeline rarely touches DB! ⚠️ Write-Through Consistency: - When a handle is claimed, Bloom Filter and Redis must be updated --> Bloom Filter: set bits to 1; deletions require Counting/Cuckoo filters OR periodic full rebuilds --> Redis: update/invalidate TTL to maintain correctness https://t.co/MnO3SI7UHq

0

1

87

CodeBarbarian

@SoftwareSiege

3 months ago

@javarevisited https://t.co/Clz4g0jQmN

CodeBarbarian

@SoftwareSiege

4 months ago

true (both references point to the same cached Integer instance) Integer Caching: - Integer objects between -128 and 127 are cached by the JVM by default - This is a heap-level optimization, implemented in java.lang.Integer.IntegerCache - Autoboxing and Integer.valueOf() reuse the same cached instance in this range - new Integer(...) always allocates a fresh object (bypasses the cache, deprecated since Java 9) Cache Configuration: - Upper bound can be extended via: -XX:AutoBoxCacheMax=<value> - Lower bound (-128) is fixed - Use .equals() for value comparison --- Other Wrapper Caches: Byte, Short, Long, Character: - All have caches for a similar small range - Unlike Integer, their upper bounds are not configurable via JVM flags Boolean: - Only two instances exist: Boolean.TRUE and Boolean.FALSE - Always cached Float and Double: - No caching at all - Floating-point caching is complex due to precision/representation issues

2

13

1

11

5K

0

1

0

352

CodeBarbarian

@SoftwareSiege

3 months ago

Because they solve different bottlenecks: *Replication = same data, multiple copies --> scales reads + availability (writes still hit primary) *Sharding = horizontal partitioning across nodes (rows split across machines) --> scales writes + data size (load distributed across shards) --- *All sharding is horizontal partitioning, but NOT all horizontal partitioning is sharding *Horizontal partitioning (Local single DB) = split rows across partitions (range/hash/list) --> improves manageability + query performance, but NOT true horizontal scaling *Sharding (Distributed) = horizontal partitioning across machines --> enables real scalability (compute + storage) --- Vertical partitioning = split columns E.g. user_core vs user_profile Used when: - Some columns are rarely accessed (cold data) - Reduce row size / I/O cost - Improve cache efficiency - Isolate sensitive/large fields E.g. BLOBs --- ⚠️ Replication is simple but limited; sharding scales but introduces system-level complexity: *Cross-shard joins: extremely expensive/avoided (data must be co-located/denormalized) *Distributed transactions: costly & complex --> often replaced with eventual consistency (ACID to BASE - Basically Available, Soft state, Eventual consistency) *Resharding complexity: redistributing data when shards grow is operationally hard (far more complex than adding replicas)

0

2

1

244

CodeBarbarian

@SoftwareSiege

3 months ago

⚠️ Why even use async if we're going to block immediately? --> BUT it depends where/who blocks 🤔 Are we blocking at the right layer or too early? *Async isn't about avoiding blocking - it's about deferring it (where we block + how long + what + how failures propagate) *get() is fine at the right boundary, but if used inside flow, it collapses async into sync & kills concurrency *** *Async = non-blocking composition + latency hiding --> deferring the wait, not eliminating it *get() = blocking sync point --> decides where the wait happens *** *It's not about syntax - it's about execution model + arch - Async = order food --> keep working --> eat later - get() at boundary = wait when food arrives 👍 - get() in middle = stand at door immediately 🤦‍♂️ --- CompletableFuture = Non-blocking pipelines - supplyAsync() = offloads work - thenApply()/thenCompose() = defines continuations (callbacks) - get() = introduces blocking barrier Similar to streams: - Streams: terminal op triggers execution - CompletableFuture: get() forces sync --- *get() must be at the end of the system (boundary) Execution Models: *Blocking Model (Servlet/ Thread-per-request): HTTP response is sync --> Thread must eventually produce result --> Blocking is inevitable, BUT defer get() as late as possible (at boundary) Example: public String controller() throws Exception { return CompletableFuture.supplyAsync(...) .thenApply(...) .get(); } --> OK: async/sync boundary (needed to return a value, no further async composition expected) --> Blocking is intentional & contained --> Gains: overlapping I/O, better utilization *Modern blocking frameworks (e.g. Spring MVC) can return CompletableFuture, effectively behaving non-blocking at the boundary *** *Non-Blocking Model (Reactive/Event-loop): No thread-per-request (e.g. WebFlux, Netty) --> Framework handles callbacks/continuation --> Never call get() Example: public CompletableFuture<String> controller() { return CompletableFuture.supplyAsync(...) .thenApply(...); } --> No blocking --> Framework: Registers callback + Releases thread + Writes response on completion *** ⚠️ *Blocking Too Early (Problem): Example: public String service() throws Exception { return CompletableFuture.supplyAsync(...) .thenApply(...) .get(); } --> BAD: Async/Sync collapse, no latency hiding --> Breaks composition + introduces hidden blocking upstream What breaks: *Concurrency collapse --> caller thread blocks *Thread inefficiency: Worker thread does work & caller thread waits --> 2 threads for 1 task! *Breaks composability: Cannot chain further async ops --> Forces sync model upstream *Thread pool starvation risk: Blocking inside async flows can exhaust thread pools --> Leads to throughput collapse under load --- *Blocking systems: defer get() to boundary (or preferably return CompletableFuture) *Non-blocking systems: never call .get() - return the future & let the framework handle it --- ⚠️ Timeouts: Never use indefinite .get() --> prefer failing the pipeline itself via .orTimeout(...) (Java 9+) over just timing out the blocking thread .get(timeout) - propagates failure early & keeps the async flow consistent E.g. future.orTimeout(5, TimeUnit.SECONDS) (better) vs future.get(5, TimeUnit.SECONDS) ⚠️ Common Pool Trap: Default ForkJoinPool.commonPool() is JVM-wide --> blocking it can starve unrelated tasks (incl. parallel streams) --> prefer custom executors (requires handling ThreadLocal context propagation explicitly) E.g. CompletableFuture.supplyAsync(task, customExecutor); *Exception Handling (Pipeline): Prefer handling errors inside the async pipeline (exceptionally, handle) OR use join() to avoid checked exception noise & keep composition clean E.g. future.exceptionally(ex -> fallback); ⚠️ Sync Boundary Exception Trap: get() wraps failures in ExecutionException --> when crossing async/sync boundary, always unwrap via e.getCause() to avoid masking real errors

0

1

311

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

*Big-O = asymptotic growth, ignores constants *Real systems = constants matter (CPU, RAM, I/O, NW) *Algorithm choice should balance asymptotics + constants *A theoretically faster algorithm can actually be slower if the hidden constant is large --- *Real-world latency dominates small datasets E.g. Linear scan in RAM O(n) is faster than binary search on disk O(log n) for moderate n because disk I/O constant is huge *HW effects: Memory hierarchy, CPU cache, disk I/O & NW latency multiply the cost per operation E.g. Sequential RAM access vs random disk access --> same O(n), RT differs 1_00_000x *Parallelism & CPU efficiency: Algorithms with lower asymptotic complexity may underperform if they cannot leverage CPU pipelines, vectorization or threading efficiently *Cost in production: Larger constants --> more CPU, memory & NW usage --> higher cost, even if Big-O is better

SoftwareSiege's tweet photo. *Big-O = asymptotic growth, ignores constants
*Real systems = constants matter (CPU, RAM, I/O, NW)
*Algorithm choice should balance asymptotics + constants
*A theoretically faster algorithm can actually be slower if the hidden constant is large

---

*Real-world latency dominates small datasets
E.g. Linear scan in RAM O(n) is faster than binary search on disk O(log n) for moderate n because disk I/O constant is huge

*HW effects:
Memory hierarchy, CPU cache, disk I/O & NW latency multiply the cost per operation
E.g. Sequential RAM access vs random disk access
--> same O(n), RT differs 1_00_000x

*Parallelism & CPU efficiency:
Algorithms with lower asymptotic complexity may underperform if they cannot leverage CPU pipelines, vectorization or threading efficiently

*Cost in production:
Larger constants
--> more CPU, memory & NW usage
--> higher cost, even if Big-O is better

0

1

2

194

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

@javarevisited @override *equals(): 1. Check this == o first (performance) 2. Be null-safe + avoid blind casts (prevent NPE, ClassCastException) *Not overriding hashCode() --> User objs that are equal can end up in different buckets (duplicates in HashSet) https://t.co/QbKUOdGiEr

0

1

0

357

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

a == b is true: "Ja" + "va" is CT constant --> auto-interned in SCP (same ref) c == b is false: part + "va" is RT concatenation --> creates new heap obj (different ref) *** *If part were declared final --> part + "va" becomes CT constant expression --> auto-interned --> c == b true --- *Constant Folding: Compiler evaluates CT constant expressions & replaces them with result in bytecode E.g. "Ja" + "va" = "Java" at CT *String Interning: CT constants are auto-interned in SCP --> == can return true for CT constant Strings pointing to the same pool object *RT Concatenation: Any concatenation involving vars or non-final values is done at RT --> Creates new heap obj, not interned --> == returns false; .equals() still works --- *CT constants + constant folding --> interned + ref equality possible *RT concatenation --> new object + ref equality fails unless .intern() is used *Applies Beyond Strings: Enums: behave like interned objects --> == works Primitives (int, boolean, etc.): --> == compares values, always works Wrapper classes (Integer, Long, etc): small values (-128 to 127) are cached (like SCP but configurable), beyond that == can fail --- - Use .equals() for content comparison; == for ref - To make RT-concatenated Strings to match a pooled literal, call .intern()

SoftwareSiege's tweet photo. a == b is true:
"Ja" + "va" is CT constant
--> auto-interned in SCP (same ref)

c == b is false:
part + "va" is RT concatenation
--> creates new heap obj (different ref)

***

*If part were declared final
--> part + "va" becomes CT constant expression
--> auto-interned
--> c == b true

---

*Constant Folding:
Compiler evaluates CT constant expressions & replaces them with result in bytecode
E.g. "Ja" + "va" = "Java" at CT

*String Interning:
CT constants are auto-interned in SCP
--> == can return true for CT constant Strings pointing to the same pool object

*RT Concatenation:
Any concatenation involving vars or non-final values is done at RT
--> Creates new heap obj, not interned
--> == returns false; .equals() still works

---

*CT constants + constant folding
--> interned + ref equality possible

*RT concatenation
--> new object + ref equality fails unless .intern() is used

*Applies Beyond Strings:
Enums: behave like interned objects
--> == works

Primitives (int, boolean, etc.):
--> == compares values, always works

Wrapper classes (Integer, Long, etc): small values (-128 to 127) are cached (like SCP but configurable), beyond that == can fail

---

- Use .equals() for content comparison; == for ref
- To make RT-concatenated Strings to match a pooled literal, call .intern()

1

2

1

605

SoftwareSiege retweeted

CodeBarbarian

@SoftwareSiege

3 months ago

Bloom filter = Probabilistic, memory-efficient data structure (bit array + hash functions) for testing set membership *Purpose: Quickly identify what is definitely NOT in a set --> avoids unnecessary DB queries/ expensive searches *Guarantees: - No false negatives (an element that exists is never missed) - Controlled false positives (may say "possibly present" for elements not actually in the set) *Efficiency: Minimal memory, scales to millions/billions of elements --- *Structure: - Bit array (m bits): stores compact representation - Hash functions (k): map elements to multiple bits *Name: - Named after Burton Bloom (1970) - Filter = filters out elements definitely NOT present - Bloom = bit array progressively "blooms" as elements are added *** Adding elements: Compute k hashes -- set the corresponding bits --> creates a probabilistic fingerprint *** Checking membership: Compute k hashes -- check bits *Any bit = 0 --> definitely not present *All bits = 1 --> possibly present, check further *** Example (username check): *dave123 -- some bits 0 --> definitely available, skip DB query *alice -- all bits 1 --> might exist, check DB to confirm --- Parameter Tuning (n, p, m, k): n = expected elements p = desired false positive probability m = #bits in array k = #hash functions *Goal: balance memory usage vs false positive rate (p) for expected # elements (n) *More elements n --> larger array needed *Lower desired false positive rate p --> larger array needed $ m = -(n * ln(p)) / (ln(2) ^ 2) $ k = (m / n) * ln(2) Example: *Expected usernames: n = 1_000_000 *Desired false positive rate: p = 0.01 (1%) --> m = 9_585_000 bits = 1.2 MB --> k = 7 hash functions --> we need a bit array of ~1.2 MB and 7 hash functions to store 1 million usernames with only 1% false positives & no false negatives --- *Double Hashing: compute two base hashes & combine them to simulate k hashes --> avoids computing many independent hashes --> improves performance *Storage: in RAM for fast access or on disk/distributed memory for very large sets *Silent False Positives: Bloom Filter does NOT guarantee correctness, only memory efficiency + speed --> app must handle false positives gracefully *Practical Implementation: use battle-tested libraries/frameworks (Guava, Redis, Cassandra) for hashing, concurrency & persistence *Use Case: pre-filter for expensive DB/NW lookups, NOT a replacement for the DB

SoftwareSiege's tweet photo. Bloom filter = Probabilistic, memory-efficient data structure (bit array + hash functions) for testing set membership

*Purpose: Quickly identify what is definitely NOT in a set
--> avoids unnecessary DB queries/ expensive searches

*Guarantees:
- No false negatives (an element that exists is never missed)
- Controlled false positives (may say "possibly present" for elements not actually in the set)

*Efficiency: Minimal memory, scales to millions/billions of elements

---

*Structure:
- Bit array (m bits): stores compact representation
- Hash functions (k): map elements to multiple bits

*Name:
- Named after Burton Bloom (1970)
- Filter = filters out elements definitely NOT present
- Bloom = bit array progressively "blooms" as elements are added

***

Adding elements:
Compute k hashes -- set the corresponding bits
--> creates a probabilistic fingerprint

***

Checking membership:
Compute k hashes -- check bits

*Any bit = 0
--> definitely not present

*All bits = 1
--> possibly present, check further

***

Example (username check):
*dave123 -- some bits 0
--> definitely available, skip DB query

*alice -- all bits 1
--> might exist, check DB to confirm

---

Parameter Tuning (n, p, m, k):
n = expected elements
p = desired false positive probability
m = #bits in array
k = #hash functions

*Goal: balance memory usage vs false positive rate (p) for expected # elements (n)

*More elements n
--> larger array needed

*Lower desired false positive rate p --> larger array needed

$ m = -(n * ln(p)) / (ln(2) ^ 2)
$ k = (m / n) * ln(2)

Example:
*Expected usernames: n = 1_000_000
*Desired false positive rate: p = 0.01 (1%)
--> m = 9_585_000 bits = 1.2 MB
--> k = 7 hash functions
--> we need a bit array of ~1.2 MB and 7 hash functions to store 1 million usernames with only 1% false positives & no false negatives

---

*Double Hashing: compute two base hashes & combine them to simulate k hashes
--> avoids computing many independent hashes
--> improves performance

*Storage: in RAM for fast access or on disk/distributed memory for very large sets

*Silent False Positives: Bloom Filter does NOT guarantee correctness, only memory efficiency + speed
--> app must handle false positives gracefully

*Practical Implementation: use battle-tested libraries/frameworks (Guava, Redis, Cassandra) for hashing, concurrency & persistence

*Use Case: pre-filter for expensive DB/NW lookups, NOT a replacement for the DB

0

2

1

781

CodeBarbarian

@SoftwareSiege

Last Seen Users on Sotwe

Trends for you

Most Popular Users