digantamisra98 / Mish

Official Repository for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]

1,303 stars

128 forks

0 issues

Jupyter NotebookPython

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing digantamisra98/Mish in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/digantamisra98/Mish)

Preview:

Repository Overview (README excerpt)

Crawler view

Mish: Self Regularized Non-Monotonic Activation Function BMVC 2020 (Official Paper) Notes: (Click to expand) • A considerably faster version based on CUDA can be found here - Mish CUDA (All credits to Thomas Brandon for the same) • Memory Efficient Experimental version of Mish can be found here • Faster variants for Mish and H-Mish by Yashas Samaga can be found here - ConvolutionBuildingBlocks • Alternative (experimental improved) variant of H-Mish developed by Páll Haraldsson can be found here - H-Mish (Available in Julia) • Variance based initialization method for Mish (experimental) by Federico Andres Lois can be found here - Mish_init Changelogs/ Updates: (Click to expand) • [07/17] Mish added to OpenVino - Open-1187, Merged-1125 • [07/17] Mish added to BetaML.jl • [07/17] Loss Landscape exploration progress in collaboration with Javier Ideami and Ajay Uppili Arasanipalai • [07/17] Poster accepted for presentation at DLRLSS hosted by MILA, CIFAR, Vector Institute and AMII • [07/20] Mish added to Google's AutoML - 502 • [07/27] Mish paper accepted to 31st British Machine Vision Conference (BMVC), 2020. ArXiv version to be updated soon. • [08/13] New updated PyTorch benchmarks and pretrained models available on PyTorch Benchmarks. • [08/14] New updated Arxiv version of the paper is out. • [08/18] Mish added to Sony Nnabla - Merged-700 • [09/02] Mish added to TensorFlow Swift APIs - Merged - 1068 • [06/09] Official paper and presentation video for BMVC is released at this link. • [23/09] CSP-p7 + Mish (multi-scale) is currently the SOTA in Object Detection on MS-COCO test-dev while CSP-p7 + Mish (single-scale) is currently the 3rd best model in Object detection on MS-COCO test dev. Further details on paperswithcode leaderboards. • [11/11] Mish added to TFLearn - Merged 1159 (Follow up 1141) • [17/11] Mish added to MONAI - Merged 1235 • [20/11] Mish added to plaidml - Merged 1566 • [10/12] Mish added to Simd and Synet - Docs • [14/12] Mish added to OneFlow - Merged 3972 • [24/12] Mish added to GPT-Neo • [21/04] Mish added to TensorFlow JS • [02/05] Mish added to Axon • [26/05] 🔥 Mish is added to PyTorch. Will be added in PyTorch 1.9. 🔥 • [27/05] Mish is added to PyTorch YOLO v3 • [09/06] 🔥 Mish is added to MXNet. • [03/07] Mish is added to TorchSharp. • [05/08] Mish is added to KotlinDL. News/ Media Coverage: • (02/2020): Podcast episode on Mish at Machine Learning Café is out now. Listen on: &emsp; &emsp; • (02/2020): Talk on Mish and Non-Linear Dynamics at Sicara is out now. Watch on: &emsp; &emsp; • (07/2020): CROWN: A comparison of morphology for Mish, Swish and ReLU produced in collaboration with Javier Ideami. Watch on: &emsp; &emsp; • (08/2020): Talk on Mish and Non-Linear Dynamics at Computer Vision Talks. Watch on: &emsp; &emsp; • (12/2020): Talk on *From Smooth Activations to Robustness to Catastrophic Forgetting* at Weights & Biases Salon is out now. Watch on: &emsp; &emsp; • (12/2020) Weights & Biases integration is now added 🔥. Get started. • (08/2021) Comprehensive hardware based computation performance benchmark for Mish has been conducted by Benjamin Warner. Blogpost. MILA/ CIFAR 2020 DLRLSS (Click on arrow to view) Contents : (Click to expand) • Mish a. Loss landscape • ImageNet Scores • MS-COCO • Variation of Parameter Comparison a. MNIST b. CIFAR10 • Significance Level • Results a. Summary of Results (Vision Tasks) b. Summary of Results (Language Tasks) • Try It! • Acknowledgements • Cite this work Mish: Minimum of *f(x)* is observed to be ≈-0.30884 at *x*≈-1.1924 Mish has a parametric order of continuity of: C ∞ Derivative of Mish with respect to Swish and Δ(x) preconditioning: Further simplifying: Alternative derivative form: where: We hypothesize the Δ(x) to be exhibiting the properties of a pre-conditioner making the gradient more smoother. Further details are provided in the paper. Loss Landscape: To visit the interactive Loss Landscape visualizer, click here. Loss landscape visualizations for a ResNet-20 for CIFAR 10 using ReLU, Mish and Swish (from L-R) for 200 epochs training: Mish provides much better accuracy, overall lower loss, smoother and well conditioned easy-to-optimize loss landscape as compared to both Swish and ReLU. For all loss landscape visualizations please visit this readme. We also investigate the output landscape of randomly initialized neural networks as shown below. Mish has a much smoother profile than ReLU. ImageNet Scores: *For Installing DarkNet framework, please refer to darknet(Alexey AB)* *For PyTorch based ImageNet scores, please refer to this readme* |Network|Activation|Top-1 Accuracy|Top-5 Accuracy|cfg|Weights|Hardware| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| |ResNet-50|Mish|74.244%|92.406%|cfg|weights|AWS p3.16x large, 8 Tesla V100| |DarkNet-53|Mish|77.01%|93.75%|cfg|weights|AWS p3.16x large, 8 Tesla V100| |DenseNet-201|Mish|76.584%|93.47%|cfg|weights|AWS p3.16x large, 8 Tesla V100| |ResNext-50|Mish|77.182%|93.318%|cfg|weights|AWS p3.16x large, 8 Tesla V100| |Network|Activation|Top-1 Accuracy|Top-5 Accuracy| |:---:|:---:|:---:|:---:| |CSPResNet-50|Leaky ReLU|77.1%|94.1%| |CSPResNet-50|Mish|**78.1%**|**94.2%**| ||||| |Pelee Net|Leaky ReLU|70.7%|90%| |Pelee Net|Mish|71.4%|90.4%| |Pelee Net|Swish|**71.5%**|**90.7%**| ||||| |CSPPelee Net|Leaky ReLU|70.9%|90.2%| |CSPPelee Net|Mish|**71.2%**|**90.3%**| Results on CSPResNext-50: |MixUp|CutMix|Mosaic|Blur|Label Smoothing|Leaky ReLU|Swish|Mish|Top -1 Accuracy| Top-5 Accuracy|cfg|weights| |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| ||||||:heavy_check_mark:|||77.9%(=)|94%(=)||| |:heavy_check_mark:|||||:heavy_check_mark:|||77.2%(-)|94%(=)||| ||:heavy_check_mark:||||:heavy_check_mark:|||78%(+)|94.3%(+)||| |||:heavy_check_mark:|||:heavy_check_mark:|||78.1%(+)|94.5%(+)||| ||||:heavy_check_mark:||:heavy_check_mark:|||77.5%(-)|93.8%(-)||| |||||:heavy_check_mark:|:heavy_check_mark:|||78.1%(+)|94.4%(+)||| |||||||…