Sglang PD分离中的Mooncake
Prefill 和Decode 同时收到一个请求,
Decode节点先预留好KVCache的空间,然后Decode节点来bootstrap Prefill节点中的请求,Prefill结束后直接发送KV Cache到Decode节点。
sequenceDiagram autonumber box Prefill participant forward_A participant forward_B participant CPULoop participant waiting_queue_p as waiting_queue participant sender end box Decode participant reciever participant transfer_queue participant prealloc participant waiting_queue_d as waiting_queue participant scheduler end Note over forward_A,scheduler: requests arrived,TTFT start activate scheduler activate scheduler activate scheduler loop prealloc->>prealloc: alloc for KV cache end prealloc->>transfer_queue: pop_prealloc activate transfer_queue activate transfer_queue activate transfer_queue Note over transfer_queue,reciever: init reciever activate reciever activate reciever activate reciever reciever->>waiting_queue_p: bootstrap prealloc waiting_queue_p->>CPULoop: new-seqs CPULoop->>+forward_A: batch 0 start waiting_queue_p->>CPULoop: new-seqs CPULoop->>+forward_B: batch 1 start forward_A->>-CPULoop: batch 0 sync CPULoop->>+sender: trans batch 0 loop Every chunk sender->>reciever: send chunk end sender->>reciever: last chunk send aux sender->>-CPULoop: trans batch 0 finish reciever->>-transfer_queue: start decode transfer_queue->>-waiting_queue_d: pop_transfer Note right of scheduler: TTFT for req 0 waiting_queue_d->>scheduler: first token deactivate scheduler waiting_queue_p->>CPULoop: new-seqs CPULoop->>+forward_A: batch 2 start forward_B->>-CPULoop: batch 1 sync CPULoop->>+sender: trans batch 1 loop Every chunk sender->>reciever: send chunk end sender->>reciever: last chunk send aux sender->>-CPULoop: trans batch 1 finish reciever->>-transfer_queue: start decode transfer_queue->>-waiting_queue_d: pop_transfer Note right of scheduler: TTFT for req 1 waiting_queue_d->>scheduler: first token deactivate scheduler forward_A->>-CPULoop: batch 2 sync CPULoop->>+sender: trans batch 2 loop Every chunk sender->>reciever: send chunk end sender->>reciever: last chunk send aux sender->>-CPULoop: trans batch 0 finish reciever->>-transfer_queue: start decode transfer_queue->>-waiting_queue_d: pop_transfer Note right of scheduler: TTFT for req 2 waiting_queue_d->>scheduler: first token deactivate scheduler
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 JMY Space!