Luigi @fundyc - Twitter Profile

fundyc retweeted

2 days ago

But the query was fast in development and staging!?! What are the signs to look for that you've created a query that is fast locally, but needs performance considerations in production? There are two inflection points where a simple query gets meaningfully slower: 1) When the sort spills out of memory 2) When the table scan moves to disk The examples below are in resource constrained environments, but the pattern of behavior and output is similar to what you'd see even in larger environments, only with much, much larger numbers of rows. A simple query ```sql SELECT user_id, event_type, created_at FROM events WHERE user_id = 1 ORDER BY created_at DESC ``` Without an index, Postgres must scan the entire table to find the user's events even if there are only a few rows (Narrator: if there are zero rows for a user, Postgres may choose not to scan any rows due to table and column statistics, which we talked about a while back). Phase 1) everything fits in memory With 10,000 rows, and user_id=1 has 5,000 events, and `work_mem = 1MB`: ``` Sort (cost=506.19..518.69 rows=5000 width=20) (actual time=1.422..1.587 rows=5000 loops=1) Sort Key: created_at DESC Sort Method: quicksort Memory: 427kB Buffers: shared hit=74 -> Seq Scan on events (cost=0.00..199.00 rows=5000 width=20) (actual time=0.006..0.599 rows=5000 loops=1) Filter: (user_id = 1) Rows Removed by Filter: 5000 Buffers: shared hit=74 ``` • `Sort Method: quicksort Memory: 427kB`: the 5,000 rows are sorted in RAM • `Rows Removed by Filter: 5000`: the other 5,000 rows were scanned and discarded • `Buffers: shared hit=74`: all 74 table pages were in shared_buffers Phase 2) sort spills to disk With 200,000 rows. and user_id=1 now has 100,000 events, and `work_mem = 1MB`, the sort keys for 100k rows no longer fit in memory: ``` Sort (cost=14314.82..14564.48 rows=99867 width=20) (actual time=25.069..29.488 rows=100000 loops=1) Sort Key: created_at DESC Sort Method: external merge Disk: 3352kB Buffers: shared hit=1471, temp read=836 written=843 -> Seq Scan on events (cost=0.00..3971.00 rows=99867 width=20) (actual time=0.005..8.135 rows=100000 loops=1) Filter: (user_id = 1) Rows Removed by Filter: 100000 Buffers: shared hit=1471 ``` • `Sort Method: external merge Disk: 3352kB`: 3.3MB spilled because 100k rows × ~20 bytes > 1MB `work_mem` • `temp read=836 written=843`: 843 temp pages written to disk and read back during the merge • `Rows Removed by Filter: 100000`, but scanned prior to sort Phase 3) table scan moves to disk 600,000 rows total. user_id=1 still has 100,000 events. The other 500,000 rows belong to other users and Postgres scans all of them anyway. The table now overflows shared_buffers (16MB = 2,048 pages; table is ~4,412 pages). ``` Gather Merge (cost=12663.14..22545.49 rows=84700 width=19) (actual time=13.577..23.650 rows=100000 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=1477 read=3049 written=109, temp read=422 written=429 -> Sort (cost=11663.11..11768.99 rows=42350 width=19) (actual time=12.298..14.186 rows=33333 loops=3) Sort Key: created_at DESC Sort Method: external merge Disk: 1416kB Buffers: shared hit=1477 read=3049 written=109, temp read=422 written=429 Worker 0: Sort Method: external merge Disk: 968kB Worker 1: Sort Method: external merge Disk: 992kB -> Parallel Seq Scan on events (cost=0.00..7537.00 rows=42350 width=19) (actual time=0.008..7.273 rows=33333 loops=3) Filter: (user_id = 1) Rows Removed by Filter: 166667 Buffers: shared hit=1373 read=3039 written=99 ``` The planner launched 2 parallel workers which means it ran faster by splitting the seq scan and sort across 3 processes. Each worker sorted ~33k rows instead of 100k, so each spill was smaller. The things to be concerned about here are: • `shared read=3049`: the table overflowed shared_buffers; 3,049 pages were read from disk • `Rows Removed by Filter: 166,667 × 3 workers = 500,001`: 500k rows scanned and discarded • All three sort operations still spilled: parallel didn't eliminate the sort, just distributed it Solution The solution to this problem should be driven by the needs of the application it serves. In most simplistic terms, an index on events (user_id, created_at DESC) would reduce sort load. However, solutions may also include application-level caching, a materialized view, or table partitioning.

crunchydata's tweet photo. But the query was fast in development and staging!?!

What are the signs to look for that you've created a query that is fast locally, but needs performance considerations in production?

There are two inflection points where a simple query gets meaningfully slower:

1) When the sort spills out of memory
2) When the table scan moves to disk

The examples below are in resource constrained environments, but the pattern of behavior and output is similar to what you'd see even in larger environments, only with much, much larger numbers of rows.

A simple query

```sql
SELECT user_id, event_type, created_at
FROM events
WHERE user_id = 1
ORDER BY created_at DESC
```

Without an index, Postgres must scan the entire table to find the user's events even if there are only a few rows (Narrator: if there are zero rows for a user, Postgres may choose not to scan any rows due to table and column statistics, which we talked about a while back).

Phase 1) everything fits in memory

With 10,000 rows, and user_id=1 has 5,000 events, and `work_mem = 1MB`:

```
Sort (cost=506.19..518.69 rows=5000 width=20) (actual time=1.422..1.587 rows=5000 loops=1)
Sort Key: created_at DESC
Sort Method: quicksort Memory: 427kB
Buffers: shared hit=74
-> Seq Scan on events (cost=0.00..199.00 rows=5000 width=20) (actual time=0.006..0.599 rows=5000 loops=1)
Filter: (user_id = 1)
Rows Removed by Filter: 5000
Buffers: shared hit=74
```

• `Sort Method: quicksort Memory: 427kB`: the 5,000 rows are sorted in RAM
• `Rows Removed by Filter: 5000`: the other 5,000 rows were scanned and discarded
• `Buffers: shared hit=74`: all 74 table pages were in shared_buffers

Phase 2) sort spills to disk

With 200,000 rows. and user_id=1 now has 100,000 events, and `work_mem = 1MB`, the sort keys for 100k rows no longer fit in memory:

```
Sort (cost=14314.82..14564.48 rows=99867 width=20) (actual time=25.069..29.488 rows=100000 loops=1)
Sort Key: created_at DESC
Sort Method: external merge Disk: 3352kB
Buffers: shared hit=1471, temp read=836 written=843
-> Seq Scan on events (cost=0.00..3971.00 rows=99867 width=20) (actual time=0.005..8.135 rows=100000 loops=1)
Filter: (user_id = 1)
Rows Removed by Filter: 100000
Buffers: shared hit=1471
```

• `Sort Method: external merge Disk: 3352kB`: 3.3MB spilled because 100k rows × ~20 bytes > 1MB `work_mem`
• `temp read=836 written=843`: 843 temp pages written to disk and read back during the merge
• `Rows Removed by Filter: 100000`, but scanned prior to sort

Phase 3) table scan moves to disk

600,000 rows total. user_id=1 still has 100,000 events. The other 500,000 rows belong to other users and Postgres scans all of them anyway. The table now overflows shared_buffers (16MB = 2,048 pages; table is ~4,412 pages).

```
Gather Merge (cost=12663.14..22545.49 rows=84700 width=19) (actual time=13.577..23.650 rows=100000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1477 read=3049 written=109, temp read=422 written=429
-> Sort (cost=11663.11..11768.99 rows=42350 width=19) (actual time=12.298..14.186 rows=33333 loops=3)
Sort Key: created_at DESC
Sort Method: external merge Disk: 1416kB
Buffers: shared hit=1477 read=3049 written=109, temp read=422 written=429
Worker 0: Sort Method: external merge Disk: 968kB
Worker 1: Sort Method: external merge Disk: 992kB
-> Parallel Seq Scan on events (cost=0.00..7537.00 rows=42350 width=19) (actual time=0.008..7.273 rows=33333 loops=3)
Filter: (user_id = 1)
Rows Removed by Filter: 166667
Buffers: shared hit=1373 read=3039 written=99
```

The planner launched 2 parallel workers which means it ran faster by splitting the seq scan and sort across 3 processes. Each worker sorted ~33k rows instead of 100k, so each spill was smaller.

The things to be concerned about here are:

• `shared read=3049`: the table overflowed shared_buffers; 3,049 pages were read from disk
• `Rows Removed by Filter: 166,667 × 3 workers = 500,001`: 500k rows scanned and discarded
• All three sort operations still spilled: parallel didn't eliminate the sort, just distributed it

Solution

The solution to this problem should be driven by the needs of the application it serves. In most simplistic terms, an index on events (user_id, created_at DESC) would reduce sort load. However, solutions may also include application-level caching, a materialized view, or table partitioning.

0

11

4

7

593

fundyc retweeted

elhacker.NET @elhackernet

3 days ago

OWASP CVE Lite CLI: nueva herramienta de escaneo de vulnerabilidades CVE Lite CLI es un escáner de vulnerabilidades gratuito y de código abierto, reconocido oficialmente como un proyecto de la incubadora de OWASP https://t.co/nqMctqsHj8

1

184

42

148

7K

Luigi @fundyc

12 days ago

@david_bonilla @tarugoconf Simple es, pero seguro que se acaba jugando un montón con niños, family edition

0

7

Luigi @fundyc

16 days ago

@oyabun Jaque mate!! ¿Ha sido fácil?

1

0

9K

Who to follow

Tinonino 🇪🇦🌍💻

@HgaTino

Software Engineer🤖 Dad👨‍👩‍👦 Learning from the trenches how to lead and manage a team

xgqfrms

@xgqfrms

xgqfrms Web FullStack Architect https://t.co/U7rTOugM0K

Champak Roy🙂

@VaranasiSoft

Web Development, App Development, Software Training & Education. Phone and Whatsapp 919335874326

Luigi @fundyc

about 1 month ago

@fjahijado @LaLiga @movistar_es Seguro que es una imagen de docker para piratear el furbo. En docker solo hay cosas turbias

0

29

fundyc retweeted

Miguel Ángel Durán

@midudev

about 1 month ago

¡Google acaba de hacer DESIGN.md de código abierto! Un formato para decirle a la IA cómo debe diseñar tu UI. Colores, tipografías, espacios, componentes y reglas visuales... Para que la IA genere interfaces siguiendo tu estilo: → https://t.co/rCZWdEux27

midudev's tweet photo. ¡Google acaba de hacer DESIGN.md de código abierto!

Un formato para decirle a la IA cómo debe diseñar tu UI.

Colores, tipografías, espacios, componentes y reglas visuales...

Para que la IA genere interfaces siguiendo tu estilo:
→ https://t.co/rCZWdEux27 https://t.co/WhLDmwMp5s

17

2K

234

2K

76K

fundyc retweeted

Vlad Mihalcea

@vlad_mihalcea

about 2 months ago

Agile and Scrum evolved in an era when software development took significant time, and two-week sprints enabled us to gather feedback and adapt the product to meet customer demands. Nowadays, AI compresses the development time so much that it makes more sense to use Kanban instead. The larger the company, the more difficult the AI adoption will be since it will become political, rather than practical.

13

119

16

39

20K

Luigi @fundyc

2 months ago

@root_rat TurquIA

0

1

0

78

fundyc retweeted

Erick

@ErickSky

2 months ago

🚨 Git diff está oficialmente muerto. En vez de escupirte 300 líneas para que adivines qué cambió… [sem] te dice la verdad: → Función login() fue modificada. → Clase UserService renombrada. → Método validateToken() se movió. Diffs a nivel de funciones, clases y métodos reales. No más ruido. No más pérdida de tiempo. 21 lenguajes. Se integra directo con Git. Code review 10x más rápido. Esto es el futuro del control de versiones. REPOOO👇

ErickSky's tweet photo. 🚨 Git diff está oficialmente muerto.

En vez de escupirte 300 líneas para que adivines qué cambió…

[sem] te dice la verdad:

→ Función login() fue modificada.
→ Clase UserService renombrada.
→ Método validateToken() se movió.

Diffs a nivel de funciones, clases y métodos reales.
No más ruido. No más pérdida de tiempo.
21 lenguajes.
Se integra directo con Git.
Code review 10x más rápido.
Esto es el futuro del control de versiones.

REPOOO👇

6

359

31

397

29K

Luigi @fundyc

2 months ago

@flopezluis No es lo mismo la jerarquía de una empresa de 50 personas que en una de 5000

0

17

Luigi @fundyc

3 months ago

@AjarePink @miquelroig Que luego les la noticia y el titular y las negritas están puestas a mala leche de la buena

0

22

Luigi @fundyc

3 months ago

@ignacio_arriaga Si gracias a la IA los ingenieros se acercan cada vez más a producto y su capacidad de resolver problemas ¿Va a hacer que los expertos en productos se desplacen o van a ir convergiendo en el mismo rol?

0

146

fundyc retweeted

sudox

@kmcnam1

3 months ago

29

3K

371

208

83K

Luigi @fundyc

3 months ago

@flopezluis @dei_biz @psluaces También indica que durante el proceso de selección no se hicieron las preguntas oportunas ni se explicaron cuáles iban a ser las herramientas con las que se iba a trabajar.

0

2

0

20

Luigi @fundyc

3 months ago

@ferrenet O directamente una bici estática

0

1

0

6

Luigi @fundyc

3 months ago

Tengo la sensación de que todas las plataformas de IA están enfocadas en que gastes tokens sin parar para luego poder crujirte. Daja a X trabajando toda la noche, deja a Y trabajando todo el finde. En vez de enfocarse en lo importante.

0

10

Luigi @fundyc

3 months ago

@dani_avila7 @thejsnation Good luck!

0

22

Luigi @fundyc

3 months ago

@jcesarperez Ya había un capítulo de black mirror de este palo. El futuro se vuelve cada vez más oscuro

1

0

79

Luigi @fundyc

3 months ago

@david_bonilla Con ese nombre el "truco" estaba claro

0

311

Luigi

@fundyc

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users