Speechdft-16-8-mono-5secs.wav < 2025 >
# ------------------------------------------------- # 2️⃣ Convert 8‑bit unsigned PCM to float [-1, 1] # ------------------------------------------------- # 8‑bit PCM in wav files is typically unsigned (0‑255) audio_float = (audio_int.astype(np.float32) - 128) / 128.0 # now in [-1, 1]
# Frequency axis (Hz) freqs = np.fft.rfftfreq(N, d=1/sr) speechdft-16-8-mono-5secs.wav
import numpy as np from scipy.io import wavfile import matplotlib.pyplot as plt sr = librosa.load('speechdft-16-8-mono-5secs.wav'
# Quick sanity check – plot the waveform plt.figure(figsize=(10, 2)) plt.plot(np.arange(len(audio_float))/sr, audio_float, lw=0.5) plt.title('Waveform (5 s of speech)') plt.xlabel('Time (s)') plt.ylabel('Amplitude') plt.show() a familiar “wiggly” speech trace, with a modest amount of quantisation “step‑noise” that is typical of 8‑bit audio. 3. A First‑Look Discrete Fourier Transform (DFT) The DFT is the workhorse that turns a time‑domain signal into its frequency‑domain representation. Let’s compute a single‑sided magnitude spectrum and visualise it. speechdft-16-8-mono-5secs.wav
y, sr = librosa.load('speechdft-16-8-mono-5secs.wav', sr=16000)