உற்பத்தி சம்பவத்தில் உங்கள் பங்கை விவரிக்கவும்.

Question

Accepted Answer

அவர்கள் உங்கள் கீழ் **शांत, முறையான மற்றும் குற்றமற்ற** நிலை வைத்திருப்பதை பார்க்க விரும்புகிறார்கள் — முதலில் சேவையை மீட்டெடுக்கவும், இரண்டாவதாக கண்டறிக, மூன்றாவது மீண்டும் நிகழ்வதைத் தடுக்கவும். **STAR** ஐ பயன்படுத்தவும்.

## எவ்வாறு அணுக வேண்டும்

```text
INCIDENT ORDER
1. Stabilize — stop the bleeding (rollback, failover, mitigate)
2. Communicate — keep stakeholders updated on a clear channel
3. Diagnose — root cause once it's stable, not during
4. Prevent — a blameless post-mortem with action items
```

## செயல்பாட்டு உதாரணம்

```text
S: A deploy caused checkout errors for ~15% of users.
T: I was on call and had to restore service fast.
A: I rolled back the deploy first (service recovered in minutes), posted updates
   every 10 minutes, then traced the cause to an unhandled null from a new API
   field. I added a guard and a contract test.
R: Downtime stayed under 20 minutes. The post-mortem added the missing test to
   CI so it can't recur.
```

## நல்ல மற்றும் பலவீனமான

```text
✓ Mitigate first, communicate, blameless follow-up
✗ Debugging live while users are down
✗ Blaming the person who deployed
```

## ஏன் இது முக்கியம்

சம்பவங்கள் சமாதான சக்தியை சோதிக்கின்றன — விஷயங்கள் உடைந்திருக்கும் போது குழுவிற்கு நிலையான கைகள் தேவை, பீதியல்ல.

க்ขு்ற்றம்ற்ற அணுகுமுறை மக்களை காரணங்களைப் பற்றி நேர்மையாக வைத்திருக்கிறது, இது மீண்டும் நிகழ்வதைத் தடுக்க ஒரே வழி.

আপনার மோசமான நாளை நீங்கள் எவ்வாறு கையாளுகிறீர்கள் என்பது உங்கள் பணிக்கு ஆற்றல் மேற்பரப்பு குறிப்பை விட அதிகம் சொல்கிறது.