
Lalals
AI-powered audio tools for music creation, voice manipulation, and audio enhancement.

Professional-grade polyphonic piano transcription with high-fidelity onset and velocity detection.

Onsets and Frames is a state-of-the-art automatic music transcription (AMT) model developed by the Google Magenta team. Built on a sophisticated neural network architecture, it specifically addresses the 'onset-offset' problem in polyphonic music transcription. By utilizing separate heads for detecting the beginning of notes (onsets) and the duration (frames), the system achieves significantly higher precision than traditional frame-based classifiers. In 2026, it remains the industry benchmark for piano transcription, utilizing a combination of Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (LSTMs or Transformers in newer iterations) for temporal modeling. The model also regresses note velocity, allowing it to capture the expressive dynamics of a performance. This architecture effectively mitigates the common error where long notes are fragmented into multiple short ones. It is primarily distributed via the Magenta library and TensorFlow, making it a favorite for developers building DAW plugins, music education platforms, and digital archival tools that require high-accuracy conversion of acoustic audio into editable MIDI data.
Onsets and Frames is a state-of-the-art automatic music transcription (AMT) model developed by the Google Magenta team.
Explore all tools that specialize in onset and offset detection. This domain focus ensures Onsets and Frames delivers optimized results for this specific requirement.
Explore all tools that specialize in velocity regression. This domain focus ensures Onsets and Frames delivers optimized results for this specific requirement.
Explore all tools that specialize in editable midi generation. This domain focus ensures Onsets and Frames delivers optimized results for this specific requirement.
Uses a dedicated loss term for the start of notes to prevent temporal blurring.
Predicts a MIDI velocity value (0-127) for every detected note onset.
While optimized for piano, the architecture can be re-trained for drums and other percussive instruments.
Supports GAN-based training cycles to improve realism in low-quality audio conditions.
Combines 2D Convolutions with bidirectional LSTMs for spatial-temporal accuracy.
Processes audio via Log-mel spectrograms with high frequency resolution.
Weights can be quantized and exported for mobile and browser-based real-time transcription.
Install Python 3.10+ environment.
Install TensorFlow and Magenta library via pip.
Download the pre-trained 'onsets_frames_transcription' checkpoints from Google Cloud Storage.
Prepare a high-quality 16kHz mono WAV file of a piano performance.
Configure the transcription script parameters (threshold, frame-stacking).
Run the inference command-line tool on the target audio file.
Analyze the generated .mid file in a DAW or MIDI visualizer.
Optional: Fine-tune the model on custom datasets using the provided training scripts.
Export the model to TFLite for edge-device deployment if required.
Integrate the Python wrapper into your application backend.
All Set
Ready to go
Verified feedback from other users.
"Highly praised by the research community for its breakthrough in note-on precision and expressive velocity capture."
Post questions, share tips, and help other users.

AI-powered audio tools for music creation, voice manipulation, and audio enhancement.

The industry-standard, high-fidelity MP3 encoding engine for precision audio compression.

The industry-standard monophonic vocal transformer for pitch, formant, and saturation.

AI-powered voice isolation for crystal-clear communication in any environment.

Lightweight, open-source noise gate for zero-latency audio suppression.
Real-time, AI-powered microphone noise suppression for Linux environments.

Professional-grade AI music source separation and stem extraction for producers and DJs.
Opus is a totally open and royalty-free audio codec designed for versatile audio applications over the internet, including speech and music transmission.