arduano / simdeez

easy simd

371 stars

28 forks

23 issues

RustShell

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing arduano/simdeez in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/arduano/simdeez)

Preview:

Repository Overview (README excerpt)

Crawler view

A library that abstracts over SIMD instruction sets, including ones with differing widths. SIMDeez is designed to allow you to write a function one time and produce SSE2, SSE41, AVX2, AVX-512, Neon and WebAssembly SIMD versions of the function. You can either have the version you want chosen at compile time or automatically at runtime. Originally developed by @jackmott, however I volunteered to take over ownership. If there are intrinsics you need that are not currently implemented, create an issue and I'll add them. PRs to add more intrinsics are welcome. Currently things are well fleshed out for i32, i64, f32, and f64 types. AVX-512 support is now included for x86/x86_64 targets with , , and . Runtime dispatch will select it ahead of AVX2 when those features are available. Refer to the excellent Intel Intrinsics Guide for documentation on these functions: Features • SSE2, SSE41, AVX2, AVX-512, Neon, WebAssembly SIMD and scalar fallback • Can be used with compile time or run time selection • No runtime overhead • Uses familiar intel intrinsic naming conventions, easy to port. • becomes • Fills in missing intrinsics in older APIs with fast SIMD workarounds. • ceil, floor, round, blend, etc. • Can be used by projects • Operator overloading: or • Extract or set a single lane with the index operator: • Falls all the way back to scalar code for platforms with no SIMD or unsupported SIMD SIMD math revival status SIMDeez now includes a native, pure-Rust math surface for the restored historical SLEEF-backed families: • , , , • , , • , , , • , , • , , • , • (named without to reflect remainder semantics rather than an explicit ULP contract tier) These are exposed via extension traits in and re-exported in : The old feature remains historical/deprecated and is **not** the primary implementation path for this revived surface. Kernel layering blueprint (v0.1) The restored path now demonstrates the intended extension architecture: • **Portable SIMD kernels** ( ) implement reduction + polynomial logic with backend-agnostic simdeez primitives. • **Backend override dispatch** ( ) selects architecture-tuned kernels without changing the public API. • **Hand-optimized backend implementation** ( ) provides a real AVX2/FMA override for . • **Scalar fallback patching** remains centralized in the portable layer for exceptional lanes, preserving special-value semantics. To add the next SLEEF-style function, follow the same pattern: start portable, wire dispatch, then add optional backend overrides only where profiling justifies complexity. Benchmarking restored math An in-repo Criterion benchmark target is available for this revived surface: This benchmark reports per-function throughput for: • native scalar loop baseline ( ) • simdeez runtime-selected path • forced backend variants ( , , , , and when available on host) Current expectation: and should show clear speedups on SIMD-capable backends (notably AVX2 on x86 hosts), / / should now also show meaningful SIMD wins on realistic finite ranges, while / remain scalar-reference quality-first baselines. Use these benches to validate both performance and dispatch behavior as new kernels/overrides are added. Compared to packed_simd • SIMDeez can abstract over differing simd widths. packed_simd does not • SIMDeez builds on stable rust now, packed_simd does not Compared to Faster • SIMDeez can be used with runtime selection, Faster cannot. • SIMDeez has faster fallbacks for some functions • SIMDeez does not currently work with iterators, Faster does. • SIMDeez uses more idiomatic intrinsic syntax while Faster uses more idomatic Rust syntax • SIMDeez builds on stable rust now, Faster does not. All of the above could change! Faster seems to generally have the same performance as long as you don't run into some of the slower fallback functions. Example This will generate the following functions for you: • the generic version of your function • a scalar fallback • SSE2 version • SSE41 version • AVX2 version • AVX-512 version • Neon version • WebAssembly SIMD version • // picks the fastest of the above at runtime You can use any of these you wish, though typically you would use the runtime_select version unless you want to force an older instruction set to avoid throttling or for other arcane reasons. Optionally you can use the macro in the same way. This will produce 2 active functions via the attribute feature: • the generic version of your function • the fastest instruction set availble for the given compile time feature set You may also forgo the macros if you know what you are doing, just keep in mind there are lots of arcane subtleties with inlining and target_features that must be managed. See how the macros expand for more detail.