1. Introduction
Soundsketcher is an interactive web-based tool for visualizing sound through dynamic, data-driven sketches. It translates perceptual audio features into abstract visual forms that help users "see" the texture, structure, and evolution of sound.
Whether you're a musician, sound designer, researcher, or student, Soundsketcher allows you to explore audio from a fresh, visual perspective β combining traditional audio features (e.g., loudness, roughness, pitch) with creative interpretations based on spectromorphological thinking.
- Upload audio and generate real-time visualizations.
- Choose between multiple sketching modes and feature sets.
- Examine sound structure using region and cluster analysis.
- Export your sketches for presentation or further editing.
Soundsketcher supports creative analysis, educational exploration, and artistic experimentation β providing an intuitive and perceptually grounded way to engage with sound.
2. Uploading Audio
To begin using Soundsketcher, youβll first need to upload an audio file. The system supports a wide range of audio formats, including .wav, .mp3, .ogg, and more. For best results, we recommend using uncompressed formats like .wav, but compressed formats are also supported.
π How to Upload
- Simply drag and drop your audio file onto the upload area on the homepage.
- Alternatively, click the upload button and select a file from your device.
Once the upload is complete, Soundsketcher will check whether the file has already been analyzed:
- If the file has been previously processed, youβll be asked whether to:
- Use cached data (faster loading with existing analysis)
- Reanalyze the file (useful if feature settings β like frame size, hop length, or enabled features β have changed)
- If itβs a new file, the system will automatically begin analyzing it, extracting features and segmenting the audio.
π Tip
Analysis time depends on the fileβs length and the selected processing options (e.g., onset detection, embeddings, traditional features).
3. Choosing Visualizations
After uploading and analyzing your audio, Soundsketcher offers several ways to visualize the sound. Each visualization highlights different aspects of the audio based on selected features, segmentation, and perceptual mappings.
π¨ Available Visualization Modes
πΉ Line-Based Sketch
Draws time-aligned vertical lines where each line represents a short audio frame. Features are mapped to line height, angle, thickness, and color. Use this for micro-level timbral detail and to observe frame-by-frame changes.
πΉ Polygon-Based Sketch
A variation of the line sketch mode that uses filled polygons instead of vertical lines. For each frame, a polygon is drawn with characteristics (shape, size, rotation, color) mapped from feature values. This mode provides a richer visual mass and stronger shape identity for each sound segment.
πΉ Trajectory Sketch
This mode plots individual data points for selected features (e.g., spectral centroid, roughness) over time and connects them into continuous paths. It emphasizes feature evolution over time rather than discrete frames.
πΉ Region-Based Blob Sketch
Visualizes macro segments (regions) as soft, blobby shapes, where:
- Width = duration
- Height = spectral bandwidth
- Vertical position = average spectral centroid
- Opacity = loudness
Helpful for grasping overall structure and texture of the sound.
πΉ Gesture-Based Inner Sketch
Displays internal subregions or gestures within larger regions. These shapes are drawn based on feature shifts or detected onsets and emphasize internal variation inside sound objects.
4. Configuring Features
Soundsketcher lets you customize how audio features are used in each visualization. This makes it possible to focus on specific perceptual qualities or technical aspects of the sound.
ποΈ Feature Selection Panel
After uploading a file, youβll find a feature configuration panel where you can:
- β
Enable or disable features
Each visualization mode uses one or more features (e.g., spectral centroid, roughness, brightness, pitch). You can toggle them on or off depending on which dimensions you want to emphasize.
- π― Assign features to visual dimensions
For example, in line or polygon modes, you can assign features to:
- Line or shape height
- Angle or rotation
- Thickness
- Color hue, saturation, or brightness
- βοΈ Adjust scaling and clamping
- You can choose to clamp selected features using robust min/max values (to avoid outliers distorting the sketch).
- Unclamped features use raw values, useful for highlighting extremes.
- Logarithmic scaling is available for features like loudness or spectral centroid.
π Feature Presets
Soundsketcher may include presets that load recommended feature mappings for specific use cases (e.g., βTexture Explorationβ, βPitch & Brightnessβ, or βNoise vs Tonalityβ). You can also define and save your own.
β οΈ Important Notes
- Feature mappings apply independently to each visualization mode.
- If you modify core analysis parameters (e.g., frame size or hop length), consider reanalyzing the file.
- Some features (like roughness or brightness) are computed per frame, while others (like cluster IDs or gestures) are region-based.
5. Playing and Navigating the Sketch
Once the visualization is generated, Soundsketcher allows you to interact with both the sound and its visual representation. This is key to understanding the relationship between audio events and the visual forms they produce.
βΆοΈ Playback Controls
- Play / Pause
- Seek Bar β Click or drag to move to a specific point in time.
- Loop / Repeat (optional in some modes)
Playback is synchronized with the visual sketch, allowing you to see the active frame or region as the audio progresses.
π±οΈ Visual Navigation
- Hover or click on lines, polygons, or regions to highlight corresponding sound segments.
- Explore visual gestures by scrubbing through the timeline.
- Inspect feature values through tooltips or information panels (if enabled).
6. Reading the Sketch
Each Soundsketcher visualization is built from audio features that have been mapped to visual parameters β such as size, position, color, and shape. Understanding how to read these visualizations is essential for interpreting the sonic qualities they reflect.
π§± General Principles
- The horizontal axis always represents time (left = earlier, right = later).
- The vertical axis varies depending on the visualization mode and feature mapping.
π¦ In Line and Polygon Sketches
Each frame of audio is represented by a line or polygon whose characteristics are derived from feature values. For example:
- Line height β Spectral centroid (brightness)
- Line angle β Periodicity or pitch variation
- Line width β Roughness
- Color saturation β Loudness
- Polygon complexity β Multi-feature interaction
π¨ In Region-Based Sketches (Blob Mode)
- Width = Region duration
- Height = Spectral bandwidth
- Y-position = Average spectral centroid
- Opacity = Loudness or energy
- Texture or pattern (if enabled) = Roughness or granularity
This mode helps you see macro-structure, density, and contrast between sound events.
π§ In Gesture-Based Sketches
Subregions or gestures within each region are drawn as expressive inner shapes β strokes, curves, or blobs β indicating onsets, articulations, or changes in timbre.
π§ Interpretation Tips
- Treat the sketch as a graphic score: it visually reflects perceptual aspects of the sound.
- Noisy or inharmonic regions often appear rough, irregular, or chaotic.
- Tonal or pitched regions tend to look smooth, regular, or centered.
7. Exporting Results
After exploring and refining your sketch, Soundsketcher allows you to export your work for documentation, presentations, publications, or creative projects.
π€ Export Options
- πΌοΈ SVG Export (Recommended)
Saves the visualization as a scalable vector graphic (.svg) β ideal for high-resolution printing, editing in design tools, or academic use.
- πΌοΈ PNG Export
Saves a raster image (bitmap) of the current view β useful for quick sharing or screenshots.
- π CSV or JSON Data (Advanced)
Exports the underlying feature values, segmentation data, and timestamps for use in data analysis environments (e.g., Python, MATLAB, Excel).
If you plan to remix, manipulate, or overlay your sketch in other media, the SVG export is the most flexible option.
π How to Export
- Use the export menu or download buttons near the sketch canvas.
- Select between full sketch export, current view, or region-specific export (if supported).
- Some visual modes may allow overlay or combined layer exports.
π‘ Tip
For best results, consider exporting:
- The sketch itself
- A screenshot of the feature configuration
- The original audio file or a waveform thumbnail
8. Advanced Options
Beyond basic visualization, Soundsketcher provides several advanced tools for analyzing the internal structure of sound and customizing how it's represented.
π§ Clustering & Region Grouping
Soundsketcher can group similar regions using clustering algorithms based on multi-feature similarity (e.g., spectral shape, roughness, pitch):
- Each cluster is visually distinguished (e.g., by color, opacity, or shape style).
- Clusters help identify recurring sound objects, motifs, or categories.
- Visualized as background blobs, outlines, or overlay markers.
You can adjust:
- Clustering resolution (number of clusters or similarity threshold)
- Which features are used for clustering
βοΈ Subregions and Gestures
If onset detection or event segmentation is enabled, regions can be split into subregions:
- These reflect fine-grained internal divisions within a sound event
- Useful for highlighting dynamics, rhythm, or micro-gesture structures
- Visualized with expressive inner shapes inside main regions
π§© Overlay and Comparison Tools
- Overlay pitch, brightness, or roughness as line graphs on top of other visual layers
- Compare different sketches or versions side-by-side
- Enable opacity blending or animated transitions for smoother exploration
βοΈ Configuration Parameters
Advanced users can:
- Adjust frame size, hop length, smoothing filters
- Use logarithmic scaling or normalization presets
- Experiment with custom feature sets
These options provide greater precision and flexibility in how the sketch represents sound perception and structure.
9. Troubleshooting and Tips
While Soundsketcher is designed for ease of use, you might occasionally encounter issues depending on your file type, browser, or settings. Here are some solutions and best practices:
π οΈ Common Issues
β My file wonβt upload
- Make sure itβs a supported format (
.wav, .mp3, .ogg, etc.).
- Ensure the file isnβt corrupted or zero-length.
- Try a different browser or disable browser extensions.
π File uploads but nothing happens
- Make sure feature extraction has completed (watch for progress indicators).
- If you changed core analysis settings, reanalyze instead of loading cached data.
π¨ Sketch looks blank or strange
- Some feature combinations may result in minimal output (e.g., flat feature values).
- Try enabling other features or adjusting clamping/scaling settings.
- Zoom in or reset the sketch view.
π’ Sketch is very slow
- Large audio files or high-res analysis (small hop size) may slow rendering.
- Try shorter excerpts or reduce analysis resolution.
π‘ Pro Tips
- Combine line-based sketches with region blobs for depth and clarity.
- Use clustering to spot recurring motifs or textures.
- Save and reuse presets to avoid repeating configurations.
- Export both the sketch and the raw feature data for deeper analysis.
π§ͺ Browser Compatibility
- Best supported on: Chrome, Firefox, and Edge (latest versions).
- Safari is supported but may have limited Web Audio API performance.
10. Project Team & Acknowledgements
Soundsketcher is the result of a collaborative research and development effort that brought together experts in music technology, perception, composition, and contemporary music theory. The project was developed within an academic and creative ecosystem supported by multiple institutions and individuals.
π Project Team
- Emilios Cambouropoulos β Aristotle University of Thessaloniki
- Danae Stefanou β Aristotle University of Thessaloniki
- Maximos Kaliakatsos-Papakostas β Hellenic Mediterranean University
- Dimitris Maronidis β Aristotle University of Thessaloniki
- Konstantinos Giannos β Aristotle University of Thessaloniki
- Asterios Zacharakis β Aristotle University of Thessaloniki
- Savvas Kazazis β Aristotle University of Thessaloniki
- Alexandra Karamoutsiou β Aristotle University of Thessaloniki
- Konstantinos Velenis β Aristotle University of Thessaloniki
- Eva Matsigkou β Aristotle University of Thessaloniki
- Vicky Zioga β PHENO
- Nikos Kostopoulos β PHENO
π International Advisors
- George Athanasopoulos β Humboldt University, Berlin
- Richard Barrett β Institute of Sonology, The Hague / Leiden University
- Emmanouil Benetos β Queen Mary University of London
Appendix
A. Audio Features Explanation
- Spectral Centroid: Indicates where the "center of mass" of the spectrum is located. Commonly used to describe the brightness of a sound β higher values represent more high-frequency content.
- F0-SC Combo: A hybrid metric combining fundamental frequency (F0) and spectral centroid. It balances pitch perception with the brightness of the sound.
- F0 | CREPE: Fundamental frequency estimation using the CREPE model, offering high-precision pitch tracking.
- Yin F0 | Librosa: YIN pitch detection as implemented in Librosa β effective for estimating fundamental frequency in musical signals.
- Yin F0 | Aubio: Similar to Librosa's YIN method, but using the Aubio library.
- F0 | CREPE Confidence: Confidence level of CREPEβs pitch estimation. Higher values indicate more trustworthy pitch detection.
- Spectral Flux: Measures how rapidly the spectrum is changing β good for detecting onsets or dynamic shifts.
- Spectral Flatness: A measure of how noise-like a sound is. Higher values = more noise-like; lower values = more tonal.
- Spectral Bandwidth: The spread or width of the spectrum β indicates how wide the frequency range is.
- Zero Crossing Rate: Counts how often the signal crosses zero. Often used to assess percussiveness or noisiness.
- Amplitude: Represents the energy or loudness of the signal. Higher values = louder sounds.
- Brightness: Related to the amount of high-frequency content β contributes to perceived clarity or sharpness.
- Sharpness: A perceptual measure of how βcuttingβ or harsh the sound feels β often linked to higher frequencies.
- Loudness: Perceptual intensity of the sound β accounts for frequency sensitivity of human hearing.
- Loudness-ZCR: A hybrid feature that captures both loudness and temporal noisiness.
- None: Disables the mapping for a particular visual dimension (e.g., height or color).
B. Mapping Settings Explanation
ποΈ Functionality
- Inversion: Reverses the feature mapping. High values are shown at the bottom of the sketch instead of the top.
- Sliders: Define the minimum and maximum values of each mapped feature, useful for focusing on specific value ranges.
- Randomizer: Randomly assigns features to each available visual mapping dimension β useful for playful discovery.
- Reset: Restores all settings to default, including sliders, mappings, and inversions.
β¨οΈ Hotkeys
- C: Toggle settings menu (fold/unfold).
- A: Randomize all mappings.
- S: Resketch the current score.
- D: Reset all feature and visual settings.
- Spacebar: Play/pause the audio.