AI & LLM Engineering for .NET Architects

Memory Management: Ephemeral vs Long-term Semantic memory

1 Views Updated 5/4/2026

Architecting AI Memory

A truly smart AI doesn't just respond to a prompt; it Remembers you. Managing memory is the key to building personal assistants that truly understand their users.

1. Ephemeral (Chat) Memory

Stored in the current session. It's the "Recent" conversation. This is usually managed by passing the last 10-20 messages back with every request. It allows the AI to know what "It" means when you say "Tell me more about it."

2. Long-term (Semantic) Memory

Stored in a Vector Database. When a user says "Remember my wife's birthday is June 5th," we save that as a vector. When the user asks "When should I buy a gift?", we search the Vector DB for 'Birthday', find the June 5th fact, and feed it into the prompt. This gives the AI 'Infinite Recall'.

4. Interview Mastery

Q: "What is the 'Fog of Memory' in LLMs?"

Architect Answer: "The 'Fog of Memory' (or Reordering Bias) refers to the fact that LLMs struggle to recall information buried in the middle of a very long prompt. As architects, we solve this by using **Summarization Chains**. We don't just dump all 50 memories into the prompt; we use a separate 'Memory Manager' agent to pick the 3 most relevant memories and present them clearly at the end of the prompt where attention is highest."

Previous Part Next Part

AI & LLM Engineering for .NET Architects

Memory Management: Ephemeral vs Long-term Semantic memory

Architecting AI Memory

1. Ephemeral (Chat) Memory

2. Long-term (Semantic) Memory

4. Interview Mastery

Toolliyo Code Playground

AI & LLM Engineering for .NET Architects