Processing 1 million documents is an Engineering Problem, not just an AI one. You need a robust pipeline that can handle failures, rate limits, and updates.
Don't index documents in the UI thread. Use a **Background Worker** (Azure Function / Hangfire). Use a **Message Queue** to store document IDs. This allows you to retry individual documents if the embedding API is down or rate-limited.
You don't want to re-index 1 million documents if only 1 document changed. Use **Hashes**. Before indexing, compare the hash of the current document to the one stored in your SQL DB. Only generate new embeddings if the hash is different.
Q: "How do you handle 'Large Document' RAG where the answer is scattered across 10 pages?"
Architect Answer: "We use a **Two-Stage Retrieval** or **Map-Reduce** pattern. First, we summarize each page/chunk. Then, we use the summaries to find the relevant chunks. Finally, we pass the *full* text of only those specific chunks to the model. This allows us to handle documents that are physically larger than the LLM's context window."