ESP32 based voice assistant with wake word

Last year (2023) was Home Assistant’s Year of the Voice so I thought there’d be no better way to start 2024 than by building my own Home Assistant powered smart speaker.

It’s based on an ESP32 Lyrat dev board which doesn’t seem to be widely available so you can grab one on Aliexpress (affiliate link) or Amazon US / DE. Have a look at my toolbox essentials for links to the other items used in this project.

The Lyrat board has two 3w @ 4 ohm audio outputs which is enough for this use case but it means that we’ll want to stick to a pretty small 4 ohm speaker. I wanted the sound quality to be as good as possible without breaking the bank so after a bit of research, I grabbed a Dayton Audio DMA45-4 1.5″ Full-Range Driver for €18 (Amazon – US, UK).

The last item you’ll want is an Adafruit Neopixel Stick (Amazon US, UK). These are great, multi-purpose addressable LED strips and for this project, we’ll use it as an indicator for when the wake word has been detected.

Right, lets get started. Head over to Printables to grab the .stl files for this project. I printed it in eSun matte black PLA (link on my toolbox essentials page) but you can really print it in whatever colour/filament you want.

Once you’ve printed the main body of the enslosure, use a soldering iron to insert four M3 x 5mm brass inserts at the front of the ensclosure for the speaker and glue the port into place at the back of the enclosure. I used a Dremel to ensure that the hole for the port is perfectly round before glueing it in. I found that Loctite Super Glue Power Gel (linked in my toolbox essentials) worked well for this project because it didn’t harden immediately but you’ll want to buy a few bottles of it.

Not pictured – if you can, add some batting to the speaker enclosure before you glue the lid on.

I chose to make the speaker cable out of some 0.5mm2 silicone wire. This speaker requires a small spade connector for the negative and a slightly larger spade connector for the positive connector. Once you’ve made the cable, glue it in to place with the spade connectors on the bottom of the lid. I used hot melt glue which seems to be holding up fine.

Once the glue has dried, use superglue to stick the lid/midframe onto the top of the speaker enclosure. I used some painters tape to hold align it and a pair of clamps to apply pressure while the glue cures.

After about 10 minutes, the superglue should have cured. Now you can break out the soldering iron again to install six M2.5 x 4mm brass insert nuts – four of them will be installed vertically into the midframe/enclosure lid for the Lyrat board and two more will be installed horizontally to hold the lid on.

Now it’s time for some wiring! I wanted to keep the enclosure as short as possible so I soldered all of the connections (apart from the Vcc/positive connection for the LED bar) to the bottom of the PCB. The wiring is pretty straight forward but you’ll need to turn on DIP switch #5 which seems to reroute GPIO12 from the micro SD reader to that header pin.

Please don’t do what I did – make sure to install the LED bar into the enclosure before you solder the wires to the Lyrat. I found it easiest to solder the power cables to the right hand side of the LED bar so you only have the data cable that needs to go behind the LED bar.

Once the wiring has been done, you can screw the Lyrat board onto the midframe with 4 x M2.5 x 6mm screws before we move on to flash some firmware onto the ESP32. 🥳 You’ll need two micro USB cables for this – one that will connect the serial port to your computer and one that will power the Lyrat board.

You’ll need an instance of Home Assistant, the ESPHome add-on and a Voice Pipeline to be setup for this. Have a look at Home Assistant’s guide for setting up a voice pipeline. Here is the ESPHome YAML that I used, you’ll want to update certain parts of it like your API encryption key and OTA password before flashing it to the ESP32.

YAML
esphome:
  name: bedroom-speaker
  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - delay: 10s
         - if:
             condition:
               switch.is_on: use_wake_word
             then:
               - voice_assistant.start_continuous:

esp32:
  board: esp-wrover-kit
  framework:
    type: arduino
    version: recommended

logger:

api:
  encryption:
    key: ""<<your key here>>""

ota:
  password: "<<your password here>>"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Speaker Fallback Hotspot"
    password: ""<<your password here>>""

captive_portal:

i2c:
  sda: GPIO18
  scl: GPIO23

external_components:
  - source: github://pr#3552
    components: [es8388]
    refresh: 0s

es8388:

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIO35
    pdm: false
    channel: left

i2s_audio:
  i2s_lrclk_pin: GPIO25
  i2s_bclk_pin: GPIO5

media_player:
  - platform: i2s_audio
    name: speaker
    id: media_player_speaker
    dac_type: external
    i2s_dout_pin: GPIO26
    mode: stereo

switch:
  - platform: gpio
    pin: GPIO21
    name: "AMP Switch"
    id: amp_switch
    restore_mode: ALWAYS_ON

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_on:
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop

voice_assistant:
  microphone: mic
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  media_player: media_player_speaker
  id: assist
  on_wake_word_detected: 
    - light.turn_on:
        id: led_bar
        brightness: 30%
        effect: "Custom Scan Effect"
        red: 100%
        green: 0%
        blue: 0%
  on_listening: 
    - light.turn_on:
        id: led_bar
        brightness: 50%
        effect: "Custom Scan Effect"
        red: 0%
        green: 100%
        blue: 0%
  on_end:
    - light.turn_off:
        id: led_bar

light:
  - platform: neopixelbus
    id: led_bar
    type: GRB
    variant: WS2812
    pin: GPIO12
    num_leds: 8
    name: "NeoPixel Light" 
    effects:
      - addressable_scan:
          name: Custom Scan Effect
          move_interval: 100ms
          scan_width: 2 

esp32_touch:

binary_sensor:
  - platform: esp32_touch
    pin: GPIO33
    threshold: 1000
    name: "Play"

  - platform: esp32_touch
    pin: GPIO32
    threshold: 1000
    name: "Set"
    on_press:
      then:
        - switch.toggle: use_wake_word

  - platform: esp32_touch
    pin: GPIO27
    threshold: 1000
    name: "Vol Up"

  - platform: esp32_touch
    pin: GPIO13
    threshold: 600
    name: "Vol Down"      
Expand

Once the firmware has been installed, you can pop the lid on with two M2.5 x 10mm screws and we’re almost done!

The back of the lid has a cutout for a panel mount USB C connector like this one from Aliexpress. Unfortunately my order didn’t arrive in time but I’ll make sure to post an update when they arrive.

The last, and most fiddly bit is up next – the speaker grille. Print the frame .stl that’s on printables and then glue some black (or whatever colour you want) fabric onto it. I used the same superglue and some clothes pegs to hold the fabric in place while the glue cures.

And you’re done! You should now have a local voice assistant that’ll let you control your smart home without the worry of Mr Bezos listening in or the reliance on an internet connection!

Here’s a video of a prototype in action.

*The product links in this post may contain affiliate links. I donate 20% of these earnings to the Good Work Foundation to help innovate learning in South Africa’s rural communities.

Thanks for making it to the end of the post!

20 Comments

  1. I love it. This project made me finally buy a 3d printer. What infill did you use for the speaker box. I coild see reasons for stiff and also for light.

  2. Wired but not working yet. My panel mount USB C connector came today and I could not figure out how you were going to wire it into the Vcc and ground. Same places you powered the LED array from? Is that 5V or 3.3V? If you have wired yours, I would appreciate pictures.

  3. I had a lot of issues with the audio from the speaker being choppy and ununderstandable. Seemed to be chopped up into 1/2 second segments or less.

    I looked around for different solutions, maybe reducing the sampling rate of the output. I could not figure out how to do that, but I stumbled on this (https://github.com/home-assistant/core/issues/92969) and replaced the speaker part of the yaml with

    speaker:
    – platform: i2s_audio
    id: media_player_speaker
    dac_type: external
    i2s_dout_pin: GPIO26
    mode: mono

    Now I can hear the responses very clearly. Nice speaker selection!

    The wakeword and voice recognition are not good yet, but I I will be looking at it more closely.

    If you have made any improvements in the yaml configuration, I would love hearing about them.

  4. Was there anything special needed to get ESPHome to flash the ESP32? The webpage is reporting back that it can’t find any device.

  5. Tristam, thank you so much for this!

    I was able to source a couple of these boards from Mouser electronics in the states.

    I did have an issue getting this running, and I wanted to share what I found in case it is helpful to others: my device had very (very very very) choppy audio and would not recognize any keywords. I had to disable the esp32_touch components (essentially comment from that line down) before I could get the device to work. It worked flawlessly after that.

      • No, I wasn’t able to. I didn’t really put any effort into it though.

        I left a note about this on the ESPHome discord, but it didn’t really get any traction.

    • So the deafening silence implies that no one knows, (or cares). But thanks for taking an interest.

      • My unit has been in operation and powered on for nearly a couple of months now and so far there has been no damage.

  6. Sadly this board doesn’t seem to be available any more. I’ve tried to get it working with the ESP32-LyraT Mini but that uses different audio codecs (ES8311 for output, ES7243 for input), there’s no first party support for either and the third party code I found for the ES8311 didn’t seem to work for me.

    • Hey @Chris, I’m currently working on an ESP32-S3 based smart speaker so keep an eye out for that article. It uses easily accessible components like a MAX98357 amplifier and INMP441 microphone so you should be able to get everything you need to build one.

  7. Just an added note: I had very copy audio as well. Commenting out the touch switch code fixed the problem. Thanks @Brantley! And thanks Tristam for taking the time to publish this project!

  8. Question: @Tristam, over in HA you said: “The devs say microWakeWord requires an esp32-c3 with 2mb of PSRAM so the lyrat board will not work out of the box. It has 4mb of PSRAM but I need to do further research to see what differs between the ESP32-WROOM (which the Lyrat uses) and the ESP32-C3.”
    So, does any HA wake word add-on / integration work w/the ESP32 LyraT board? I am new to HA and have not been able to figure out how the wake word “works”. I don’t even know how to test it.
    I am able to TTS to the ESP32 LyraT board. So that much works.

Leave a Reply

Your email address will not be published. Required fields are marked *