MarkTechPost:DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throu…
原文摘要:UC San Diego's DFlash replaces autoregressive drafting with a lightweight block diffusion model for speculative decoding. It drafts whole token blocks in a single forward pass and 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。