spotify / basic-pitch
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing spotify/basic-pitch in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewBasic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, -able and -able via its sibling repo. Basic Pitch may be simple, but it's is far from "basic"! is efficient and easy to use, and its multipitch support, its ability to generalize across instruments, and its note accuracy competes with much larger and more resource-hungry AMT systems. Provide a compatible audio file and basic-pitch will generate a MIDI file, complete with pitch bends. Basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time. Research Paper This library was released in conjunction with Spotify's publication at ICASSP 2022. You can read more about this research in the paper, A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation. If you use this library in academic research, consider citing it: **Note that we have improved Basic Pitch beyond what was presented in this paper. Therefore, if you use the output of Basic Pitch in academic research, we recommend that you cite the version of the code that was used.** Demo If, for whatever reason, you're not yet completely inspired, or you're just like so totally over the general vibe and stuff, checkout our snappy demo website, basicpitch.io, to experiment with our model on whatever music audio you provide! Installation is available via PyPI. To install the current release: pip install basic-pitch To update Basic Pitch to the latest version, add to the above command. Compatible Environments: • MacOS, Windows and Ubuntu operating systems • Python versions 3.7, 3.8, 3.9, 3.10, 3.11 • **For Mac M1 hardware, we currently only support python version 3.10. Otherwise, we suggest using a virtual machine.** Model Runtime Basic Pitch comes with the original TensorFlow model and the TensorFlow model converted to CoreML, TensorFlowLite, and ONNX. By default, Basic Pitch will _not_ install TensorFlow as a dependency *unless you are using Python>=3.11*. Instead, by default, CoreML will be installed on MacOS, TensorFlowLite will be installed on Linux and ONNX will be installed on Windows. If you want to install TensorFlow along with the default model inference runtime, you can install TensorFlow via . Usage Model Prediction Model Runtime By default, Basic Pitch will attempt to load a model in the following order: • TensorFlow • CoreML • TensorFlowLite • ONNX Additionally, the module variable ICASSP_2022_MODEL_PATH will default to the first available version in the list. We will explain how to override this priority list below. Because all other model serializations were converted from TensorFlow, we recommend using TensorFlow when possible. N.B. Basic Pitch does not install TensorFlow by default to save the user time when installing and running Basic Pitch. Command Line Tool This library offers a command line tool interface. A basic prediction command will generate and save a MIDI file transcription of audio at the to the : For example: To process more than one audio file at a time: Optionally, you may append any of the following flags to your prediction command to save additional formats of the prediction output to the : • to additionally save a audio rendering of the MIDI file. • to additionally save raw model outputs as an NPZ file. • to additionally save the predicted note events as a CSV file. If you want to use a non-default model type (e.g., use CoreML instead of TF), use the argument. The CLI will change the loaded model to the type you prefer. To discover more parameter control, run: Programmatic **predict()** Import into your own Python code and run the functions directly, providing an and returning the model's prediction results: • & (*float*s) set the maximum and minimum allowed note frequency, in Hz, returned by the model. Pitch events with frequencies outside of this range will be excluded from the prediction results. • is the raw model inference output • is the transcribed MIDI data derived from the • is a list of note events derived from the Note: As mentioned previously, ICASSP_2022_MODEL_PATH will default to the runtime first supported in the list TensorFlow, CoreML, TensorFlowLite, ONNX. **predict() in a loop** To run prediction within a loop, you'll want to load the model yourself and provide with the loaded model object itself to be used for repeated prediction calls, in order to avoid redundant and sluggish model loading. **predict_and_save()** If you would like orchestrate the generation and saving of our various supported output file types, you may use instead of using directly: where: • & • directory paths for to read from/write to. • - *bool* to control generating and saving a MIDI file to the • - *bool* to control saving a WAV audio rendering of the MIDI file to the • - *bool* to control saving the raw model output as a NPZ file to the • - *bool* to control saving predicted note events as a CSV file • - *str* or *pathlib.Path* local path from where to load the model, can eg: use the path obtained with Model Input **Supported Audio Codecs** accepts all sound files that are compatible with its version of , including: • • • • • **Mono Channel Audio Only** While you may use stereo audio as an input to our model, at prediction time, the channels of the input will be down-mixed to mono, and then analyzed and transcribed. **File Size/Audio Length** This model can process any size or length of audio, but processing of larger/longer audio files could be limited by your machine's available disk space. To process these files, we recommend streaming the audio of the file, processing windows of audio at a time. **Sample Rate** Input audio maybe be of any sample rate, however, all audio will be resampled…