ESPHome voice assistant with local wake word
At the beginning of 2024, I built a voice assistant to celebrate the end of Home Assistant’s Year of the Voice and the beginning of an era of locally controlled voice assistants.
The team at Home Assistant team has introduced microWakeWord which allows ESP32-S3 microcontrollers to detect the wake word (e.g. Alexa or Hey Jarvis) on the device and LLM’s as conversation agents which Home assistant uses as “the brains of your assistant and will process the incoming text commands“. These powerful new features are paving the way for Home Assistant to replace your Google Home and Amazon Echo smart speakers.
The voice assistant that we’re going to build is based on three key devices – an ESP32-S3 (Amazon US, UK, DE) which is the brains of the operation, a MAX98357 audio amplifier (Amazon US, UK, DE) and an INMP441 microphone (Amazon US, UK, DE). These are combined with a 3D printed enclosure, a Dayton Audio DMA45-4 speaker (Amazon – US, UK) and a WS2812 based RGB LED Stick (Amazon – US, UK, DE) give you a locally controlled voice assistant for less than US$50.
To get started, head over to my Printables project to download the .stl files so you can print the enclosure. I printed it in eSun matte black PLA (linked in my toolbox essentials).
Assembling the enclosure – you’ll want to start off by inserting the various brass insert nuts (linked in my toolbox essentials). There are four M3 x 5mm inserts in the front of the enclosure for the speaker to screw into, two M2.5 x 4mm inserts for the amplifier to screw into at the back of the enclosure and two M2.5 x 5mm inserts for back panel to screw into. I’d recommend you don’t glue the lid onto the enclosure or the port in place until you’ve connected everything and tested it.
Wiring – Now we can start connecting all of the components. Here is the wiring diagram for the build. I added a 10ohm resistor to the speaker to make it a little quieter since the MAX98357 audio amplifier doesn’t let you reduce the gain. I used 20 AWG/0.5mm2 stranded silicone wire for all of the connections (once again, this is linked on my toolbox essentials page).
Time for some code – This project is based on ESPHome in Home Assistant so here is the .yaml config that I used. There are tons of great ESPHome setup guides so I’ll leave that part to you.
esphome:
name: "smart-speaker"
friendly_name: smart-speaker
name_add_mac_suffix: false
platformio_options:
board_build.flash_mode: dio
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_AUDIO_BOARD_CUSTOM: "y"
# Enable logging
logger:
# Enable Home Assistant API
api:
encryption:
key: "<<your key>>"
on_client_connected:
then:
- delay: 50ms
- micro_wake_word.start:
on_client_disconnected:
then:
- voice_assistant.stop:
ota:
- platform: esphome
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
captive_portal:
web_server:
psram:
mode: octal
speed: 80MHz
light:
- platform: esp32_rmt_led_strip
id: led_bar
rgb_order: GRB
chipset: ws2812
pin: GPIO16
num_leds: 8
rmt_channel: 0
name: "LED bar"
effects:
- pulse:
- addressable_scan:
name: scan
move_interval: 100ms
scan_width: 1
switch:
- platform: template
id: mute
name: "Mute microphone"
optimistic: true
on_turn_on:
- micro_wake_word.stop:
- voice_assistant.stop:
- light.turn_on:
id: led_bar
red: 100%
green: 0%
blue: 0%
brightness: 30%
on_turn_off:
- micro_wake_word.start:
- delay: 2s
- light.turn_off:
id: led_bar
i2s_audio:
- id: i2s # Microphone
i2s_lrclk_pin: GPIO6 #WS
i2s_bclk_pin: GPIO7 #SCK
microphone:
- platform: i2s_audio
id: va_mic
adc_type: external
i2s_din_pin: GPIO4 #SD
channel: left
pdm: false
i2s_audio_id: i2s
bits_per_sample: 32bit
speaker:
platform: i2s_audio
id: va_speaker
i2s_audio_id: i2s
dac_type: external
i2s_dout_pin: GPIO8 #MAX98357A DIN
mode: mono
micro_wake_word:
models:
- model: hey_jarvis
on_wake_word_detected:
- voice_assistant.start:
- light.turn_on:
id: led_bar
red: 100%
green: 100%
blue: 100%
brightness: 40%
effect: scan
voice_assistant:
id: va
microphone: va_mic
speaker: va_speaker
noise_suppression_level: 2.0
volume_multiplier: 4.0
on_stt_end:
then:
- light.turn_off: led_bar
on_error:
- micro_wake_word.start:
on_end:
then:
- light.turn_off: led_bar
- wait_until:
not:
voice_assistant.is_running:
- micro_wake_word.start:
Time for some testing. Your speaker should look somewhat like this now. I’ve used some hot glue to secure the connectors for the microphone and fill the gaps around any wires that pass through the enclosure. Hopefully it works and you can control your Home Assistant Entities. Note that the USB C connector is temporarily connected here – you’ll need to pass it through the hole in the back panel before permanently connecting it to the ESP32.
If everything works as expected, you can add some batting to the inside of the enclosure to help with acoustics and then glue the port and lid into place. I used some gorilla glue that is designed to work with PLA. The speaker is held in with four M3 x 8mm screws and both the amp and the back panel are held in with two M2.5 x 5mm screws.
Lastly, here’s a short video of it in action.
*The product links in this post may contain affiliate links. Any commission earned is used to keep the servers running and the gin cool.
Thanks for making it to the end of the post!
Thank for the write-up! I was looking for something like this.
If I would like to run in MIC only mode, can I leave out the audio amplifier or is the amplifier necessary for the setup to work?
Hey Patrick, looking through the ESPHome docs (https://esphome.io/components/voice_assistant.html), it looks like the speaker component is optional so you’ll be fine to build one without a speaker.
Good luck!
This write-up is awesome! Keep up the good work! I’ve been looking for something like this.
I just got all my hardware yesterday, and I started soldering. One problem I’m having is that whenever the microphone is on (when the device is not muted), there is a loud buzzing sound coming from the speaker. I’m wondering if this could be a grounding issue since it subsides if I could the ESP32 module. Any other ideas?
Weirdly enough, it seems it only buzzes the speaker when the device first turns on and starts it’s wake-word loop. I do any voice command, and the buzzing stops.
Minor comment – your LEDs are connected to DOUT – that’s not gonna work, they need to be connected to DIN 🙂
Came here to ask why my LED did not work – thanks for this!
Now I need to figure out why I can’t get it to DO anything, other than start the LEDs when I say Hey Jarvis!!
How’s the range? I’ve been following these projects for a while now with the intent to migrate from a home full of echoes to HA with homebrew smart speakers for wake word but the holdup has mostly been that nothing we can slap together in this price range can pick up the commands from 60ft away in another room like the echo devices can, nor respond with good volume to hear responses. I can’t wait until we can make stuff at that level
Awesome. Any chance of software gain control, either with the MAX98357 or a similar part?
I’m not knowledgeable enough to be able to answer your question, but I did see a pin on the MAX98357 labeled gain. I assume that can be used to do wat you want (I’m going to try getting it to, once I get the parts and get a testing rig set up)
Oops, I posted below, but just found the reply button. You can control the gain of the input signal using the GAIN pin on the MAX98357. The datasheet breaks it all down, but here’s the gist:
15 dB: Connect to GND through a 100kΩ ±5% resistor
12 dB: Directly connect to GND
9 dB: Unconnected (floating)
6 dB: Connect to VDD
3 dB: Connect to VDD through a 100kΩ ±5% resistor
Software volume control (output signal amplitude) is also possible via I2S, but it looks like it’s not currently implemented in ESPHome’s i2s_audio component. There is a third party “external component” for ESPHome though that looks like it has volume control:
https://github.com/gnumpi/esphome_audio/tree/main
Disclaimer: I’ve never used ESPHome, and therefore haven’t tested any of this, so ya know, here be monsters and whatnot.
Why glue the lid on instead of making it removable? Also, acoustics might improve with a series of internal baffles 3d printed (like a tuned port or something).
This is great! Thanks for writing it up.
I just printed the enclosure and there is no hole in the cutout for the LED strip for the connectors to pass through. A drill will fix it but there should be a hole there I believe. Also, it seems the cutout for the speaker is 2-3 mm too low for the speaker to be centered in it.
Should have used an ESP-32 with an onboard battery charger and added a lithium battery so the device would still work when power is out. Something like an Adafruit ESP32 Feather
I get a lot of errors when I try to compile the code.
Have it all assembled, it responds to commands properly, but nothing is output from the speaker. The MAX98357 is receiving 3.3v, I can detect a signal on the data line when ESP32 is sending audio, and obviously the i2s pins are working b/c they’re bound to the mic as well and that is functional.
Really, my question is: should I just assume the MAX is bad or does anyone have any ideas for add’l testing?
Thank you for this write-up! I’m in the process of putting one together for myself.
Which USB-C connector did you use? I don’t see it listed in the article or the Toolbox essentials page.
Cool project! I don’t use ESPHome, so I’m not sure whether it’s feasible here, but the MAX98357 supports software volume control over I2S.
The GAIN pin on the MAX98357 lets you boost the input signal prior to amplification.
I appreciate the simplicity of it, but adding a resistor in series with the speaker on the output side is a bad idea for various reasons.