AI & LLM Engineering for .NET Architects

Memory Management: Ephemeral vs Long-term Semantic memory

1 Views Updated 5/4/2026

Architecting AI Memory

A truly smart AI doesn't just respond to a prompt; it Remembers you. Managing memory is the key to building personal assistants that truly understand their users.

1. Ephemeral (Chat) Memory

Stored in the current session. It's the "Recent" conversation. This is usually managed by passing the last 10-20 messages back with every request. It allows the AI to know what "It" means when you say "Tell me more about it."

2. Long-term (Semantic) Memory

Stored in a Vector Database. When a user says "Remember my wife's birthday is June 5th," we save that as a vector. When the user asks "When should I buy a gift?", we search the Vector DB for 'Birthday', find the June 5th fact, and feed it into the prompt. This gives the AI 'Infinite Recall'.

4. Interview Mastery

Q: "What is the 'Fog of Memory' in LLMs?"

Architect Answer: "The 'Fog of Memory' (or Reordering Bias) refers to the fact that LLMs struggle to recall information buried in the middle of a very long prompt. As architects, we solve this by using **Summarization Chains**. We don't just dump all 50 memories into the prompt; we use a separate 'Memory Manager' agent to pick the 3 most relevant memories and present them clearly at the end of the prompt where attention is highest."

AI & LLM Engineering for .NET Architects
1. AI Foundations & Prompt Engineering
The LLM Landscape: Transformers, Attention, and Tokens Advanced Prompt Engineering: Few-shot, Chain-of-Thought, and ReAct Prompt Versioning & Management in Production LLM Cost Estimation: Token accounting and budget strategies
2. Semantic Kernel & Integration
Introduction to Microsoft Semantic Kernel (SK) Skills & Plugins: Extending the LLM with native C# functions Planner & Orchestration: Automating complex multi-step AI tasks Connectors: Switching between OpenAI, Azure OpenAI, and HuggingFace
3. Vector Databases & RAG
The RAG Pattern: Solving the 'Static Knowledge' problem Embeddings Deep Dive: Converting text to math Vector DBs: Azure AI Search vs Pinecode vs Milvus Hybrid Search: Combining Keyword and Semantic search for accuracy
4. Advanced RAG Techniques
Document Chunking Strategies: Overlap, Slidewindow, and Semantic splitting Recursive Document Processing for massive knowledge bases Context Window Management: Summarization vs Truncation Citations & Grounding: Ensuring the AI doesn't hallucinate
5. AI Safety & Guardrails
Content Moderation: Azure AI Content Safety integration Prompt Injection: Defending against adversarial attacks Punitiveness & Bias: Evaluating and mitigating model behavior Self-Correction Patterns: Letting the AI check its own work
6. Small Language Models (SLMs) & Local AI
The rise of SLMs: Phi-3, Llama-3-8B, and Mistral Running AI Locally with ONNX and LocalLLM Quantization: Running 70B models on 16GB RAM Edge AI: Deploying models to local devices and private clouds
7. Multimodal & Agentic AI
Multimodal AI: Processing Images, PDFs, and Audio in C# Agentic Workflows: Multi-agent collaboration with AutoGen Function Calling: Letting the LLM use your SQL and API tools Memory Management: Ephemeral vs Long-term Semantic memory
8. FAANG AI Engineer Interview
Case Study: Designing a Global Enterprise AI Knowledge Assistant Case Study: Building an Autonomous AI Agent for Software Dev