Troubleshooting: ChatGPT losing context on long docs
ChatGPT summarizes early sections fine but then drops references and misattributes quotes in 50k+ token docs. Looking for strategies or tool combos to maintain attribution.
Best tools for this use case
Based on the workflow in this discussion, these tools are useful starting points to review.
ChatGPT
Best all-round AI assistant for broad knowledge work and workflow acceleration.
Claude
Excellent for careful reasoning, long-form thinking and structured analysis.
Gemini
Strong AI assistant for users already working inside Google's ecosystem.
Answers
Approved replies, operator insight, and tactical follow-up from the community.
Chunk the document (500–1,000 token chunks with 100–200 token overlap) and store embeddings + metadata (doc ID, section, offsets). Create section-level summaries and an index map; run hierarchical summarization (section → chapter → full) instead of loading the whole doc at once. For quotes, always retrieve the original chunk and paste the exact excerpt with its source ID—don’t rely on the model’s memory. Use a long-context model like Claude or a RAG pipeline for best attribution.