I need to optimise a PyTorch convolution operation described here [login to view URL] (EquiConv) and implemented in PyTorch here
[login to view URL]
The goal is to understand this code and suggest routes to optimise it, maybe by pre-compute as much as possible (using LUT ?) or finding a way to implement it differently / faster.
Here is an idea using meshgrid
[login to view URL]
Another approach could be to look into Deformable Convolutions v2 instead of v1 (see
[login to view URL])
A good way to test your speed improvements is to train EfficientNet modified to support EquiConv
[login to view URL]
with a few dummy images and compare time taken to both train and run inferences
Standard Conv2D CPU
Standard Conv2D GPU
EquiConv CPU
EquiConv GPU