The app works on your machine, but in Production, the CPU is at 100% and memory is climbing forever. Here is the Architect's guide to saving the day.
Don't guess. Use the tools. - **dotnet-counters:** Shows real-time GC rates and CPU usage. If 'Gen 2 Collects' are high, you have a memory issue. - **dotnet-dump:** Takes a snapshot of the memory. You can then use Visual Studio or Windbg to see exactly which objects (usually strings or leaked event handlers) are taking up all the space.
The most common memory leak in .NET history: Creating a new `HttpClient` inside a `using` block for every request. This leaves sockets in a `TIME_WAIT` state, eventually exhausting the server's ports. We solve this by using **IHttpClientFactory**, which manages a pool of underlying handlers correctly.
Use **dotnet-trace**. It records every method call and how long it took. Often, a CPU spike is just a single inefficient LINQ query running inside a loop, or a 'Regular Expression' that is suffering from catastrophic backtracking.
Q: "What is your process for root-cause analysis of a production crash?"
Architect Answer: "First, I check the **Observability** dashboard (OpenTelemetry) to see if it's a spike or a slow crawl. Second, if it's a memory issue, I grab a Heap Dump. Third, I analyze the dump to find the 'Root' object that is not being collected. Finally, I write a **Regression Test** that mimics the leak condition to ensure the fix actually works and the bug never comes back. An architect doesn't just fix the bug; they fix the process."
You are now a .NET Elite. Build something legendary.