This page has moved
One day there will be a comprehensive post and paper describing sharpened cosine similarity (SCS), but today is not that day. The concept and its application is still congealing. In the meantime, here is a summary of activity to date.
Tips and Tricks
These are somethings that have been reported to work so far. If you discover any new tricks please let me know so I can add them to the list!
- ► Here's a PyTorch implementation and here are a handful of examples. Links to TensorFlow implementations and other examples below.
- ► The big benefit of SCS appears to be parameter efficiency and architecture simplicity. It doesn't look like it's going to beat any accuracy records, and it doesn't run very fast.
- ► Skip the nonlinear activation layers, like ReLU and sigmoid, after SCS layers.
- ► Skip the dropout layers after SCS layers.
- ► Skip the normalization layers, like batch normalization or layer normalization, after SCS layers.
- ► Use MaxAbsPool instead of MaxPool. It selects the element with the highest magnitude of activity, even if it's negative.
- ► Raising activities to the power p generally doesn't parallelize well on GPUs and TPUs. It will slow your code down a LOT compared to straight convolutions.
- ► Disabling the p parameters results in a huge speedup on GPUs, but this takes the "sharpened" out of SCS. Regular old cosine similarity is cool, but it is its own thing with its own limitations.
Reverse Chronology
2022-03-11 code by Phil Sodmann . PyTorch Lightning demo on the Fashion MNIST data.
2022-02-25 experiments and analysis by Lucas Nestler . TPU implementation of SCS. Runtime performance comparison with and without the p parameter
2022-02-24 code and analysis by Dr. John Wagner. Head to head comparison with convnet on American Sign Language alphabet dataset.
2022-02-22 code by Håkon Hukkelås . Reimplementation of SCS in PyTorch with a performance boost from using Conv2D. Achieved 91.3% CIFAR-10 accuracy with a model of 1.2M parameters.
2022-02-21 code by Zimonitrome . An SCS-based GAN, the first of its kind.
2022-02-20 code by Michał Tyszkiewicz. Reimplementation of SCS in PyTorch with a performance boost from using Conv2D.
2022-02-20 code by Lucas Nestler. Reimplementation of SCS in PyTorch with a performance boost and CUDA optimizations.
2022-02-18 blog post by Raphael Pisoni. SOTA parameter efficiency on MNIST. Intuitive feature interpretation.
2022-02-17 PyTorch code by Brandon Rohrer. SCS model with 95.3k parameters and 15.9% error on CIFAR-10.
2022-02-16 PyTorch code by Brandon. SCS model with 68k parameters and 18.4% error on CIFAR-10.
2022-02-14 PyTorch code by Brandon. PyTorch implementation of SCS running on Fashion MNIST.
2022-02-01 PyTorch code by Stephen Hogg. PyTorch implementation of SCS. MaxAbsPool implementation.
2022-02-01 PyTorch code by Oliver Batchelor. PyTorch implementation of SCS.
2022-01-31 PyTorch code by Ze Wang. PyTorch implementation of SCS.
2022-01-30 Keras code by Brandon. Keras implementation of SCS running on Fashion MNIST.
2022-01-17 code by Raphael. Implementation of SCS in paired depthwise/pointwise configuration, the key element of the ConvMixer architecture.
2022-01-06 Keras code by Raphael. Keras implementation of SCS.
2020-02-24. Twitter thread by Brandon. Justification and introduction of SCS.