1.Flash Atten 分块

2.online sofxmax的原理以及公式的推导：

对比原始的softmax，safe softmax 改进的点：online softmax将safe softmax需要二次遍历寻找最大值和求和进行了优化，将其优化成使用一次遍历求和