The Go runtime includes a scheduler that multiplexes many goroutines onto a small number of OS threads. This M:N scheduling (M goroutines on N OS threads) is what makes goroutines so cheap and Go's concurrency so scalable. Understanding it explains goroutine performance.
The G-M-P model
G (Goroutine) — your concurrent task (lightweight, ~2KB stack to start)
M (Machine) — an OS thread (the actual thread the OS schedules)
P (Processor) — a logical processor / scheduling context; holds a queue of runnable Gs
(the number of P's = GOMAXPROCS, default = number of CPU cores)
The scheduler runs G's on M's, coordinated through P's:
Each P has a local run queue of goroutines; an M must hold a P to run G's.
