Skip to content

2020

Use data structures for your business logic

A few months ago I was reviewing a PR that handled relationships between entities. As I was working through the code I started to notice a pattern that made me go back to the original feature ticket for a quick review of the acceptance criteria. As I suspected there was a list of around 10 “if this then that” scenarios detailed, all of which manifested as conditions in the code. Grabbing a pen and paper I started to draw out the criteria and as I suspected all the scenarios were captured by relationships and operations for a Tree.

Going back with this information I paired with the team on an update to the PR where we reduced the amount of conditions tied directly to the business domain, and refactored names so that future maintainers could interact with the code understanding a tree, but maybe not understanding all the business logic around the entities.

in case it’s helpful the C5 project has some collections not found in the .NET Standard library for interacting with Trees. In general an interesting project I’m glad I learned about.A similar opportunity emerged on the same project when we needed to make sure a value was unique over a series of operations. In this scenario while working on a collection of objects we were able to use a HashSetto exit if Add returned false instead of setting up a LINQ query. This resulted in less nesting, less code, and a simplified condition.

The Point

The reason I am writing this is that we should be using data structures to represent the business logic of our applications. This seems obvious, but too often I have seen implementations brute force conditions leaving data structures as an optimization, or a concern for “technical” projects. While we can use a series of conditions and predicates to meet requirements in a crude way, using data structures provides an abstraction that can elevate terse business logic to a construct future maintainers can derive extra meaning from.

Self Hosting

Over the last few years I built up a sprawling list of dependencies for my home project and blog workflow. Earlier this year I decided it was time to cut down on that list and host my service dependencies locally where I could. While it took me a while I reached a point where I no longer tweak the setup week to week and decided it was time to write up the process.

A quick list of the tools I used for orchestration:

  • shell
  • DNS
  • Traefik2
  • docker/docker-compose
  • alpine linux

(D)DNS

The first thing I needed to do was make my services easy to reach local and remote. Since this is all running behind my home router that also means that my IP can change from time to time. To handle this I made use of Gandi’s DNS API, and setup a shell script to run with cron on my router to keep my DNS records up to date. With DNS ready I moved on to Traefik.

Traefik

Traefik is a really nice routing/proxy service that can inspect container labels and setup route forwarding while handling certificate management, traffic metrics and more. The main callout (other than what you will find in the docs ) is to keep an eye on what version you are using versus what others used in examples, and that non http based traffic (for instance ssh) requires a little more setup. Beyond that Traefik has been really nice to use and made adding/removing various services easy when coupled with docker.

docker-compose

While k8s is the current hot orchestration tool I wanted to keep things simple. I don’t have a need to cluster any of my home tools, and while distributed systems are interesting they also require a lot of work. I left those at my day job and use compose + duplicity for my home setup. This makes service management easy, the labels allow traefik to detect and handle traffic management while my duplicty ensures I won’t lose much work and can quickly restore my data and restart any services in a few minutes on any box with docker.

Services

A quick list of the services I’m hosting:

  • git
  • cgit
  • minio
  • teamcity
  • youtrack
  • rust home services API

The service management can be found here.

Wrap Up

I’ve started to self host a few times in the past and backed away. This time I think it’s here to stay. With my current setup I’m not worried about what happens when something crashes, certificate management is automated away and everything just works. I’ve linked to my orchestration code above, but if you have any questions, or suggestions send them my way. If you are starting out on your own self hosted setup, good luck, have fun it’s easier now than ever and I imagine it will continue to get better.

Roll your own git hook

As part of setting up tools to run in our CI pipeline I also setup a git pre-pushhook to run the same tools automatically in the local context. Git provides a variety of hooks as documented in the scm book, and they can be used to reliably automate different parts of your workflow.

In addition to the official docs this page has a nice summary

Code

The pre-push hook I built for our project looks like:

#!/bin/sh
PROJECTROOT=$(git rev-parse --show-toplevel)  
echo "Running pre-push hook for${PROJECTROOT}"
dotnet restore $PROJECTROOTecho "Running resharper formatter"  
dotnet jb cleanupcode --verbosity=ERROR --config=$PROJECTROOT/.config/cleanup --settings=$PROJECTROOT/.editorconfig --no-buildin-settings $PROJECTROOT/AMS.sln
formatted=$(git status --porcelain=v1 2>/dev/null | wc -l)  

$formatted
echo "Running dotnet resharper inspector"  
dotnet jb inspectcode --verbosity=ERROR AMS.sln -p=$PROJECTROOT/.editorconfig -o=$PROJECTROOT/reports/resharperInspect.xmlpwsh $PROJECTROOT/tools/CheckResharperInspection.ps1  

if [[ $? -eq 0 ]]  
then  
 echo "Running resharper dupe finder"  
else  
 echo "Inspector Errors Found"  
 exit $?  
fi

dotnet jb dupfinder --verbosity=ERROR AMS.sln -o=$PROJECTROOT/reports/resharperDupFinder.xmlpwsh $PROJECTROOT/tools/CheckDupeFinder.ps1  

if [[ $? -eq 0 ]]  
then  
 echo "Running dotnet test"  
else  
 echo "Dupe Errors Found"  
 exit $?  
fi

dotnet cake --target=docker-bg  
dotnet cake --target=dotnet-test  

if [[ $? -eq 0 ]]  
then  
 dotnet cake --target=docker-down  
 echo "Go go go!"  
else  
 dotnet cake --target=docker-down  
 echo "Test failed"  
 exit 1  
fi

The first thing that should stand out is that this is just a shell script. Git hooks are just that, making it easy to use shell, python, powershell or other tools with your hook. Write the script, link it to .git/hooks.

Script Breakdown

In this script the first thing I do is find the root of our project. This makes it easy to reference paths in a manner compatible with scripts and tools that are used throughout in other parts of our workflow.

Install

Since the hook above is just a shell script I like to keep it (and other hooks) in a tools subdirectory in the root project directory. Because git expects hooks to be under .git/hooks we can make it executable with a symlink.

ln -s -f ../../tools/pre-push.sh .git/hooks/pre-pushWith this in place we get feedback before each push so that we don’t have to correct linting issues later, and we have can be confident our commit(s) will run through CI successfully.

Wrapping Up

While you may have heard of projects like pre-commit or husky rolling your own hook is relatively straight forward. While wrappers may help with complex hook setups I personally like the low amount of indirection and abstraction that helps with debugging when rolling your own.

Getting started with Resharper Global Tools

For a while now I’ve been interested in build tools, CI and code quality. I think I got a taste for it as a member of the PyMSSQL project and it has continued on from there. Recently I worked on the initial CI setup for a C# project. As part of the setup I took the time to look at what lint and analysis tools we wanted to integrate into our project.

For C# some of the more common tools appear to be:

  • Roslyn Analyzers
  • Sonarsource
  • NDepend
  • Jetbrains Resharper I won’t go into the full criteria for our choice of Resharper (I’ll update this post if I end up writing that up one day), instead I’ll summarize that Resharper provided:

  • easy cross platform setup

  • ide/editor and shell agnostic
  • works the same locally and in CI.
  • opinionated by default

Resharper Command Line Tools

From the docs

ReSharper Command Line Tools is a set of free cross-platform standalone tools
that help you integrate automatic code quality procedures into your CI, version
control, or any other server.You can also run coverage analysis from the command line.The Command Line Tools package includes the following tools:- InspectCode, which executes hundreds of ReSharper code inspections

  • dupFinder, which detects duplicated code in the whole solution or narrower
    scope
  • CleanupCode, which instantly eliminates code style violations and ensures a
    uniform code base### Install

To get started with Resharper tools (assuming you already have .NET Core installed) run

cd <project>
dotnet new tool-manifest
dotnet tool install JetBrains.ReSharper.GlobalTools --version 2020.2.4Which installs the [Resharper Global Tools](https://www.nuget.org/packages/JetBrains.ReSharper.GlobalTools/2020.2.4) at the project level. This then allows CI and other contributors to use dotnet tool restore in the future.

Configuration

Out of the box inspect, format, and dupefinder all have default configurations that work well. That said each team has their own needs and preferences you may want these tools to promote. While there are a few ways to configure these tools I found using editorconfig to be the most human readable approach.

For additional details on the editorconfig format see the docs and this property index.

Running

Running the tools from a shell is relatively easy:

jb cleanupcode --verbosity=ERROR --config=./.config/cleanup --settings=./.editorconfig --no-buildin-settings ./Project.sln
jb inspectcode --verbosity=ERROR Project.sln -o=./reports/resharperInspect.xml
jb dupfinder --verbosity=ERROR Project..sln -o=./reports/resharperDupFinder.xmlOne thing to note is that by default the autoformatting will attempt to enforce line endings. If you have a team working across multiple platforms and using git to automatically handle line endings these can come into conflict. It's up to you and your team to decide if you want to handle this by tweaking git behavior,editorconfig or another method.

CI

If your using Team City see this doc for details.

With everything running in our shell locally we can also set things up to run in our CI pipeline. Running the tools is easy as long as your CI platform has a shell like task/step/operator:

- script: |  
  dotnet tool restore  
  jb cleanupcode --verbosity=ERROR --config=./.config/cleanup --settings=./.editorconfig --no-buildin-settings ./Project.sln  
  jb inspectcode --verbosity=ERROR Project.sln -o=./reports/resharperInspect.xml  
  jb dupfinder --verbosity=ERROR Project..sln -o=./reports/resharperDupFinder.xml  
  displayName: 'Resharper'

Of course you will probably break these up for easier maintenance and reporting.

Running the tools is easy. The trick is detecting when these tools find an issue. I’ll share what I did in case it’s helpful, but long term it would be great if Jetbrains had the tools exit with documented status codes for different issues. As it stands the tools only exit with an error if the tool fails, not when issues are reported.

CleanupCode

Since CleanupCode will format our file rewriting it on disk we can use git to detect the change.

formatted=$(git status --porcelain=v1 2>/dev/null | wc -l)  
exit $formatted

dupFinder

dupFinder outputs an XML file highlighting any issues found. Powershell's built in XML support makes it easy enough to query this file and see if any issues exist.

$Result = Select-Xml -Path $(System.DefaultWorkingDirectory)/reports/resharperDupFinder.xml -XPath "/DuplicatesReport/Duplicates/*"
If ($Result -eq $null) { [Environment]::Exit(0) } Else { [Environment]::Exit(1) }### InspectCode

Similar to dupFinder InspectCode documents issues with an XML file, and once again we can use Powershell to detect if there are any issues to fix.

$Result = Select-Xml -Path $(System.DefaultWorkingDirectory)/reports/resharperInspect.xml -XPath "/Report/Issues/Project/*"
If ($Result -eq $null) { [Environment]::Exit(0) } Else { [Environment]::Exit(1) }And since dupFinder and InspectCode output XML it can be useful to save these as CI artifacts for review. In Azure Pipelines this looks like:

yaml - task: PublishPipelineArtifact@1 inputs: targetPath: '$(System.DefaultWorkingDirectory)/reports/' artifactName: 'measurements' condition: always() displayName: 'Save ReSharper Results For Review.'### Wrapping Up

We’ve been using the ReSharper tools for a few months now and I have to say they provided what I was looking for in the beginning. The tools have been easy to use, help us maintain our code and haven’t boxed us in or required a lot of extra time on configuration and unseen gotchas. The only criticism I have is cold start time is pretty slow for cleanupcode, and the return exit codes could be better. Both of these would also help with CI, and our git hook setup. Otherwise I think these will continue to serve us well and let us focus on our project delivery.

GroupBy Fun with SQL and Python

A few months ago I had the opportunity to collaborate with some Data Scientist porting PySpark queries to raw Python. One of the primary areas of concern was aggregation statements. These were seen as functionality that would be particularly troublesome to write in Python. As an example I was provided a Spark SQL query similar to this:

text = ocrdata.filter(col('lvl')==5).filter(f.length(f.trim(col'txt'))) > 0).select(txtcols) \  
 .withColumn('ctxt' casestd(col('txt'))) \  
 .drop('txt') \  
 .withColumn('tkpos', f.struct('wrdnm', 'ctxt')) \  
 .groupBy(lncols) \  
 .agg(f.sortarray(f.collectlist('tkpos')).alias('txtarray')) \  
 .withColum('txtln', f.concatws(' ', col('txtarray.casestd'))) \  
 .drop('txtarray')This query was transforming token data generated by Tesseract into lines. Beyond the aggregation operation there was also some concern that the operation may be ran against quite large datasets depending on how much [Tesseract](https://github.com/tesseract-ocr) output was being manipulated at once.

Outside of the raw functionality I was asked if the data could be structured to provide an interface with named columns in a style similar to SQL rather than having to reference positional data.

All of this seemed fairly straightforward. Provided with some sample data I pulled in the UDF that was already in Python and set out to apply the transformations first illustrating how we could interact with the data in a way similar to SQL with pipeline transformations and named references.

Porting Transformations

import itertoolsfrom csv import DictReader  
from collections import namedtupledef casestd(x):  
 if x.isupper():  
 return x.title()  
 elif not x.islower() and not x.isupper():  
 return x.title()  
 else:  
 return xwith open("sampledata/export.csv") as sample:  
 reader = DictReader(sample)  
 data = [row for row in reader]a = filter(lambda x: int(x["level"]) == 5, data)  
filtered = filter(lambda x: len(x["text"].strip()) > 0, a)  
fixed = ({**row, 'text':casestd(row["text"])} for row in filtered)tkpos = namedtuple("tkpos", "wordnum, text")  
result = (dict(row, **{"tkpos":tkpos(row["wrdnum"], row["text"])}) for row in fixed)To start I read the data in with a [DictReader](https://docs.python.org/3.7/library/csv.html#csv.DictReader) which allowed me to reference values by name like “level” and “text”. I then applied similar data transformations making use of [filter](https://docs.python.org/3.7/library/functions.html#filter), [comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions), and [unpacking](https://docs.python.org/3/reference/expressions.html) to try and keep a style similar to some PySpark operations.

Finally I put the rest of the transformations into a generator expression containing a dict of namedtuplevalues so that later operations could continue working on named values in a manner similar to SQL columns.

GROUPBY

With the transformation and named values part out of the way I moved onto the GROUPBY aggregations. Thinking about GROUPBY the goal is to apply an aggregation function to a unique value. That unique value can be represented multiple ways, but I wanted to show the idea behind what was happening to help with future port efforts. So on my first pass I wrote:

grouped = []  
seen = set()*# Order is known because it represents data generated by tesseract  
*for row in fixed:  
 key = (row["pagenum"], row["blocknum"], row["parnum"], row["linenum"]) if key in seen:  
 continue seen.add(key) line = [] for r in fixed:  
 rkey = (r["pagenum', r["blocknum"], r["parnum"], r["linenum"])  
 if key == rkey:  
 line.append(r["ctxt"]) txt = " ".join(line)  
 cleantxt = txt.strip() if cleantxt:  
 grouped.append(  
 {  
 "pagenum": row["pagenum"],  
 "blocknum": row["blocknum"],  
 "parnum": row["parnum"],  
 "lnnum": row["linenum"],  
 "text": cleantxt,  
 }  
 )

Keeping in mind that this was to conceptualize what could be happening behind the scenes for the GROUPBYand AGG operation here we loop over our rows generating a hash from some values. Once we have this hash we check if we have seen it before by referencing a set. If this is a new value we find all values of the hash in our transformed data, append the tokens, handle empty tokens and finally add the data to our final dataset. At the end we have lines of text (instead of individual tokens) that can be referenced by page, block, paragraph and line number.

While this works it’s horribly inefficient. It stands out that we are reiterating our transformed data every time we find a new key. But the goal for this wasn’t to be efficient. It was to show the ideas expressed in SQL with Python. Specifically it was highlighting how to express a GROUPBY/AGG operation manually using hashes of values and tracking what we have and have not seen providing a final dataset that was the same as the output of the SQL statement.

itertools

Continuing on from that point one of my favorite Python modules is itertools. If you haven't spent much time with it I highly recommend taking some of your existing code and looking over it while scanning the itertools docs. I've used islice, chain and ziplongest innumberable times. Because of that I knew there was a handy groupby function stowed in there too:

Make an iterator that returns consecutive keys and groups from the iterable.
The key is a function computing a key value for each element. If not specified
or is None, key defaults to an identity function and returns the element
unchanged.Generally, the iterable needs to already be sorted on the same key function.Replacing the block above:

final = []for key, group in itertools.groupby(  
 req, key=lambda x: (x["pagenum"], x["blocknum"], x["parnum"], x["linenum"])  
):  
 line = "".join([row['text'] + " " for row in group])  
 final.append({"pagenum": key[0],  
 "blocknum": key[1],  
 "parnum": key[2],  
 "linenum": key[3],  
 "text": line,  
 })And with that change we have a clean, faster implementation. Additionally since this was a port of Spark SQL if the data was to get truly large it wouldnt be much work to start iterating through all of the pipeline in batches since we can use generators all the way through.

Conclusion

So what was the point of sharing that here? Nothing specific. It was a fun exercise at the time, and it made me pause to consider how I would express GROUPBY on my own. The exercise also helped introduce some of my colleagues to the filter expression and in turn map and reduce. Using those they were able to express a lot of their pipeline concepts without a lot of the iteration structures they were used to having abstracted away. If you find yourself doing a lot of pipelining I recommend checking out itertools and functools. Both are built into the Python stdlib and provide a lot of helpful functionality.

Train All the Things — Speed Bumps

As part of getting started on my project a couple months back I took a look at what boards were supported byTensorflow lite . Seeing an esp board I went that route since I’ve heard alot from the maker/hacker community and thought it would be a good opportunity to learn more. Additionally it’s been quite a while since I had a project that was primarily C/C++ so that was exciting. Like any good project I ran into multiple unexpected bumps, bugs and issues. Some were minor, others were frustrating. I'm capturing some of those here for anybody else that may be starting down the path of using Tensorflow Lite and an ESP32 board.

Tensorflow speed bumps

Getting started with TF Lite is easy enough, but something I noticed as I continued to work on the project is just how little things are designed specific to the platform. Instead the examples are setup with Arduino as a default, and then work is done to make that run on X target. In the case of the ESP-EYE this looks like packing everything into an Arduino compatible loop, and handling that in a single FreeRTOS task. I get the reason for this, but it's also a bit of a headache later on as it feels like an anti pattern when addin in new task and event handlers.

Another bump you are likely to notice is that the TF Lite examples rely on functionality present in the TF 1.xbranch for training, but require TF >= 2.2 for micro libs. Not the end of the world, but it means your going to manage multiple environts. If managing this using venv/virtualenv keep in mind you're going to need the esp-idf requirements in the 2.x environment, or just install in both as you may find yourself switching back and forth. In addition to python lib versions the examples note esp-idf 4.0, but you will want to use > =4.0with this commit or you will run into compiler failures. I ended up using 4.1 eventually, but something to note.

Finally interaction with the model feels flaky. It’s an example so this kind of makes sense, but I found that while the word detected was pretty accurate the newcommand and some of the attributes of the keyword being provided by the model weren't matching my expectation/use. I ended up using the score value and monitoring the model to setup the conditionals for responding to commands in my application.

Overall the examples are great to have, and walking you through the train, test and load cycle is really helpful. The main thing I wish I had known was that the TF Arduino path for ESP was pretty much the same as the ESP native path with regards to utility and functionality just using the esp-idf toolchain.

ESP speed bumps

From the ESP side of things the core idf tooling is nice. I like how open it is and how much I can understand the different pieces. This helped a few times when I ran into unexpected behavior. One thing to note is if you follow the documented path of cloning esp-idf you will want to consider how you manage the release branch you use and when you merge updates. Updates are not pushed into minor/bug fix branches instead they go into the release branch targeted on merge.

Being new to the esp platform something I didn’t know when I got started was that esp-idf 4.x released in February of 2020. Because of this alot of the documentation and examples such as ESP-WHO and esp-skainetare still based on 3.x which has a variety of differences and changes in things like the TCP/network stack. Because of this checking the version used in various docs, examples etc is (as usual) important. Since the TF examples reference version 4 that's where I started, but a lot of what's out there is based on v3.

One other bump somebody may run into is struct initialization in a modern toolchain when calling the underlying esp C libraries from C++. I spent some time digging around after transitioning the http request example into the TF C++ commandresponder code and the compiler told me I was missing uninitialized struct fields and their order made them required.

The example code:

esphttpclientconfigt config = {  
 .url = "http://httpbin.org/get",  
 .eventhandler = httpeventhandler,  
 .userdata = localresponsebuffer,  
};  
esphttpclienthandlet client = esphttpclientinit(&config);  
esperrt err = esphttpclientperform(client);

And how I had to do it in C++:

esphttpclientconfigt* config = (esphttpclientconfigt*)calloc(sizeof(esphttpclientconfigt), 1);  
config->url = URL;  
config->certpem = burningdaylightiorootcertpemstart;  
config->eventhandler = httpeventhandler;esphttpclienthandlet client = esphttpclientinit(config);  
esphttpclientsetmethod(client, HTTPMETHODPUT);  
esperrt err = esphttpclientperform(client);

I had a similar issue with wifi and you can see the solution here.

I really enjoyed my lite trip into idf. It's an interesting set of components and followed a workflow that I use and appreciate. I wrote a couple aliases that somebody might find useful:

alias adf="export ADFPATH=$HOME/projects/esp-adf"  
alias idf-refresh="rm -rf $HOME/projects/esp-idf && git clone --recursive git@github.com:espressif/esp-idf.git $HOME/projects/esp-idf && $HOME/projects/esp-idf/install.sh"  
alias idf=". $HOME/projects/esp-idf/export.sh"  
alias idf3="pushd $HOME/projects/esp-idf && git checkout release/v3.3 && popd && . $HOME/projects/esp-idf/export.sh"  
alias idf4x="pushd $HOME/projects/esp-idf && git checkout release/v4.0 && popd && . $HOME/projects/esp-idf/export.sh"  
alias idf4="pushd $HOME/projects/esp-idf && git checkout release/v4.1 && popd && . $HOME/projects/esp-idf/export.sh"  
alias idf-test="idf.py --port /dev/cu.SLABUSBtoUART flash monitor"And I look forward to writing more about esp as I continue to use it in new projects.

Approaching the end of this project it’s been a larger undertaking than I expected, but I’ve learned a lot. It’s definitely generated a few new project ideas. The code, docs, images etc for the project can be found here and I’ll be posting updates as I continue along to HackadayIO and this blog. If you have any questions or ideas reach out.

Train All the Things — Wrapping Up

And now I’m at v0.1 of the on-air project. I was able to achieve what I was hoping to along the way. I learned more about model development, tensorflow and esp. While this version has some distinct differences from what I outlined for the logic flow (keywords, VAD) it achieves the functional goal. The code, docs, images etc for the project can be found in this repo, and the project details live on HackadayIO. When I get back to this project and work on v1.x I'll make updates available to each.

A couple thoughts having worked through this in the evening for a couple months:

  • I really should have outlined the states that the esp program was going to cycle through, and then mapped those into task on the FreeRTOS event loop. While the high level flow captures the external systems behavior the esp has the most moving parts at the applications level, and is where most of the state is influenced.
  • I want to spend some more time with C++ 14/17 understanding the gotchas of interfacing with C99. I ran into a few different struct init issues and found a few ways to solve them. I’m sure there is a good reason for different solutions, but it’s not something I’ve spent a lot of time dealing with so I need to learn.
  • While continuing to learn about esp-idf I want to look into some of the esp hal work too. I briefly explored esp-adf and skainet while working through on-air. Both focus on a couple boards but seems to have functionality that would be interesting for a variety of devices. Understanding the HAL and components better seems to be where to start.
  • Data, specifically structured data is going to continue to be a large barrier for open models and for anybody to be able to train a model for their own want/need. While sources like Kaggle, arvix, data.world and others have worked to help this there’s still a gulf between what I can get at home and what I can get at work. Additionally many open datasets are numeric or text datasets while video, audio and other sources are still lacking.
  • Document early, document often. Too many times I got so caught up in writing code, or just getting one more thing done that by the time I did that getting myself to do a thorough write up of issues I experienced, interesting findings, or even successful moments was difficult. I know that I put this off sometimes, and different parts of the project are not as well documented, or details have been lost to the days in between.
  • There’s a lot of fun stuff left to explore here I can see why I’ve heard a lot about esp and look forward to building more.

Train All the Things — Version 0.1

My first commit to on-air shows March 3, 2020. I know that the weeks leading up to that commit I spent some time reading through the TF Lite documentation, playing with Cloudflare Workers K/V and getting my first setup of esp-idf squared away. After that it was off to the races. I outlined my original goal in the planning post. I didn't quite get to that goal. The project currently doesn't have a VAD to handle the scenario where I forget to activate the display before starting a call or hangout. Additionally I wasn't able to train a custom keyword as highlighted in the custom model post. I was however able to get a functional implementation of the concept. I am able to hang the display up, and then in my lab with the ESP-EYEplugged in I can use the wake word visual followed by on/off to toggle the display status.

While it’s not quite what I had planned it’s a foundation. I’ve got a lot more tools and knowledge under my belt. Round 2 will probably involved Skainet just due to the limitations in voice data that’s readily available. Keep an eye out for a couple more post highlighting some bumps along the way and summary of lessons learned.

The code, docs, images etc for the project can be found here and I’ll be posting any further updates to HackadayIO. For anybody that might be interested in building this the instructions below provide a brief outline. Updated versions will be hosted in the repo. If you have any questions or ideas reach out.

Required Hardware:

  1. ESP-EYE
  2. Optional ESP-EYE case
  3. PyPortal
  4. Optional PyPortal case
  5. Two 3.3v usb to outler adapters and two usb to usb mini cables

OR

  1. Two 3.3v micro usb wall outlet chargers

Build Steps:

  1. Clone the on-air repo.

Cloudflare Worker:

  1. Setup Cloudflare DNS records for your domain and endpoint, or setup a new domain with Cloudflare if you don’t have one to resolve the endpoint.
  2. Setup a Cloudflare workers account with worker K/V.
  3. Setup the Wrangler CLI tool.
  4. cd into the on-air/sighandler directory.
  5. Update toml
  6. Run wrangler preview
  7. wrangler publish
  8. Update Makefile with your domain and test calling.

PyPortal:

  1. Setup CircuitPython 5.x on the PyPortal.
  2. If you’re new to CircuitPython you should read this first.
  3. Go to the directory where you cloned on-air.
  4. cd into display.
  5. Update secrets.py` with your wifi information and status URL endpoint.
  6. Copy code.py, secrets.py and the bitmap files in screens/ to the root of the PyPortal.
  7. The display is now good to go.

ESP-EYE:

  1. Setup esp-idf using the 4.1 release branch.
  2. Install espeak and sox.
  3. Setup a Python 3.7 virtual environment and install Tensorflow 1.15.
  4. cd into on-air/voice-assistant/train
  5. chmod +x orchestrate.sh and ./orchestrate.sh
  6. Once training completes cd ../smalltalk
  7. Activate the esp-idf tooling so that $IDFPATH is set correctly and all requirements are met.
  8. idf.py menuconfig and set your wifi settings.
  9. Update the URL in toggle\status.cc
  10. This should match the host and endpoint you deployed the Cloudflare worker to above
  11. idf.py build
  12. idf.py --port \<device port> flash monitor
  13. You should see the device start, attach to WiFi and begin listening for the wake word “visual” followed by “on” or “off”.

Train All the Things — Model Training

Recently I spent some time learning how to generate synthetic voices using espeak. After working with the tools to aligning with the Tensorflow keyword models expectations I was ready for training, and to see how well the synthetic data performed. TLDR: not well :)

I started by training using the keywords hi, smalltalk and on. This let me have a known working word while testing two synthetic words. Although training went well:

INFO:tensorflow:Saving to "/Users/n0mn0m/projects/on-air/voice-assistant/train/model/speechcommandstrain/tinyconv.ckpt-18000"  
I0330 10:34:28.514455 4629171648 train.py:297] Saving to "/Users/n0mn0m/projects/on-air/voice-assistant/train/model/speechcommandstrain/tinyconv.ckpt-18000"  
INFO:tensorflow:setsize=1445  
I0330 10:34:28.570324 4629171648 train.py:301] setsize=1445  
WARNING:tensorflow:Confusion Matrix:  
 [[231 3 3 0 4]  
 [ 2 178 6 29 26]  
 [ 3 12 146 2 2]  
 [ 4 17 2 352 21]  
 [ 2 16 7 16 361]]  
W0330 10:34:32.116044 4629171648 train.py:320] Confusion Matrix:  
 [[231 3 3 0 4]  
 [ 2 178 6 29 26]  
 [ 3 12 146 2 2]  
 [ 4 17 2 352 21]  
 [ 2 16 7 16 361]]  
WARNING:tensorflow:Final test accuracy = 87.8% (N=1445)  
W0330 10:34:32.116887 4629171648 train.py:322] Final test accuracy = 87.8% (N=1445)The model didn’t respond well once it was loaded onto the ESP-EYE. I tried a couple more rounds with other keywords and spectrogram samples with similar results.

Because of the brute force nature that I used to generate audio the synthetic training data isn’t very representative of real human voices. While the experiment didn’t work out, I do think that generating data this way could be useful with the right amount of time and research. Instead of scaling parameters in a loop I think researching the characteristic of various human voices and using those to tune the data generated via espeak could actually work out well. That said it’s possible the model may pick up on characteristics of the espeak program too. Regardless, voice data that is ready for training is still a hard problem in need of more open solutions.

Along with the way I scaled the espeak parameters another monkey wrench is that the microspeech model makes use of a CNN and spectrogram of the input audio instead of full signal processing. This means it’s highly likely the model will work with voices around the comparison spectrogram well, but not generalize. This makes picking the right spectrogram relative to the user another key task.

Because of these results and bigger issues I ended up tweaking my approach and used visual as my wake word followed by on/off. All of these are available in the TF command words dataset, and visual seems like an ok wake word when controlling a display. For somebody working on a generic voice assistant you will want to work on audio segmentation since many datasets are sentences, or consider using something like Skainet. All of this was less fun than running my own model from synthtetic data, but I needed to continue forward. After a final round of training with all three words I followed the TF docs to represent the model as a C array and then flashed it onto the board with the rest of the program. Using idf monitor I was able to observe the model working as expected:

I (31) boot: ESP-IDF v4.1  
I (31) boot: compile time 13:35:43  
I (704) wifi: config NVS flash: enabled  
I (734) WIFI STATION: Setting WiFi configuration SSID Hallow...  
I (824) WIFI STATION: wifiinitsta finished.  
I (1014) TFLITEAUDIOPROVIDER: Audio Recording started  
Waking up  
Recognized on  
I (20434) HTTPSHANDLING: HTTPS Status = 200, contentlength = 1  
I (20434) HTTPSHANDLING: HTTPEVENTDISCONNECTED  
I (20444) HTTPSHANDLING: HTTPEVENTDISCONNECTED  
Going back to sleep.  
Waking up  
Recognized off  
I (45624) HTTPSHANDLING: HTTPS Status = 200, contentlength = 1  
I (45624) HTTPSHANDLING: HTTPEVENTDISCONNECTED  
I (45634) HTTPSHANDLING: HTTPEVENTDISCONNECTEDThis was an educational experiment. It helped me put some new tools in my belt while thinking further about the problem of voice and audio processing. I developed some [scripts](https://github.com/n0mn0m/on-air/tree/main/voice-assistant/train) to run through the full data generation, train and export cycle. Training will need to be done based on the architecture somebody is using, but hopefully it’s useful.

The code, docs, images etc for the project can be found here and I’ll be posting updates as I continue along to HackadayIO and this blog. If you have any questions or ideas reach out.

Train All the Things — Synthetic Generation

After getting the display and worker up and running I started down the path of training my model for keyword recognition. Right now I’ve settled on the wake words Hi Smalltalk. After the wake word is detected the model will then detect silence, on, off, or unknown.

My starting point for training the model was the microspeech and speechcommands tutorials that are part of the Tensorflow project. One of the first things I noticed while planning out this step was the lack of good wake words in the speech command dataset. There are many voice datasets available online, but many are unlabeled or conversational. Since digging didn't turn up much in the way of open labeled word datasets I decided to use on and off from the speech commands dataset since that gave me a baseline for comparison with my custom words. After recording myself saying hi and smalltalk less then ten times I knew I did not want to generate my own samples at the scale of the other labeled keywords.

Instead of giving up on my wake word combination I started digging around for options and found an interesting project where somebody had started down the path of generating labeled words with text to speech. After reading through the repo I ended up using espeak and sox to generate my labeled dataset.

The first step was to generate the phonemes for the wake words:

$ espeak -v en -X smalltalk  

then stored the phoneme in a word file that will be used by generate.sh.

$ cat words  
hi 001 
busy 002 
free 003
smalltalk 004

After modifying generate.sh from the spoken command repo (eliminating some extra commands and extending the loop to generating more samples) I had everything I needed to synthetically generate a new labeled word dataset.

#!/bin/bash  
# For the various loops the variable stored in the index variable  
# is used to attenuate the voices being created from espeak.*lastwordid=""cat words | while read word wordid phonemedo  
echo $word  
mkdir -p db/$word 

if [[ $word != $lastword ]]; then  
 versionid=0  
 fi 

lastword=$word 

# Generate voices with various dialects  
for i in english english-north en-scottish englishrp englishwmids english-us en-westindies  
do  
    # Loop changing the pitch in each iteration  
    for k in $(seq 1 99); do  
        # Change the speed of words per minute  
        for j in 80 100 120 140 160; do  
            echo $versionid "$phoneme" $i $j $k  
            echo "$phoneme" | espeak -p $k -s $j -v $i -w db/$word/$versionid.wav  
            # Set sox options for Tensorflow  
            sox db/$word/$versionid.wav -b 16 --endian little db/$word/tf$versionid.wav rate 16k  
            ((versionid++))  
        done  
     done  
done

After the run I have samples and labels with a volume comparable to the other words provided by Google. The pitch, speed and tone of voice changes with each loop which will hopefully provide enough variety to make this dataset useful in training. Even if this doesn’t work out learning about espeak and sox was interesting. I've already got some future ideas on how to use those. If it does work the ability to generate training data on demand seems incredibly useful.

Next up, training the model and loading to the ESP-EYE. The code, docs, images etc for the project can be found here and I’ll be posting updates as I continue along to HackadayIO and this blog. If you have any questions or ideas reach out.