idea

  • for images, why not make the attention 4D ?
  • is there a way to exploit spatial relation with this approach
  • even better, for n-d data, can use nxn-d attention using the same idea as above ?