Building an Environment Monitor for my Apartment – patsapartment.com

We moved to a new apartment this fall and I wanted to investigate ways to monitor air quality/humidity. Our new place has radiator heat that we don’t have much control over. As a result the air is incredibly dry and goes through intense temperature swings daily. The plants and my skin weren’t doing the best in the desert like environment. In short, the air temperate/humidity levels are unpredictable.

Because I’m crazy, I decided to assemble some hardware to detect various air quality metrics. Then I built a website to track this data: PatsApartment.com

I just bought a bunch of I2C sensors and I thought logging a bunch of different air quality metrics would be fun. One of the goals was to try and figure out if the radiators are set on some sort of timer/schedule or if they are responding to temperature changes in some sort of PID system. Additionally, I wanted to see what effect a humidifier might have on air quality and if my plants decrease CO2 levels throughout the day.

Here is is during construction

Monitor Setup

The main monitor was built using a Raspberry Pi. It had a bunch of sensors connected via I2C and an analog to digital converter. I setup a cron job to run a Python script that would read the sensor values for 10-15 seconds and calculate an average value from each sensor. The script would then send this data to an API I wrote using Laravel. The Laravel site shows the current sensor readings and a chart showing CO2 levels over time.

There is a tiny OLED display on the main board that gives you a readout of all the levels.

Basic diagram showing all system parts

Air Quality Metrics

I got a little carried away on Adafruit/eBay and ended up with a ton of different sensors. Right now the system can sense

  • CO2 Levels
  • Volatile Organic Compounds (plastics, combustable stuff, certain household cleaners, other nasty stuff)
  • Dust levels
  • Temperature
  • Humidity (this is a tricky one to get accurately for various reasons)
  • Relative light levels

Website/API

I built simple Laravel site that has an API endpoint for logging sensor values. There is a single page that loads a basic listing of current values and then a chart that shows CO2 levels and other data gathered.

Future Plans

Some future plans:

  • Move project off bread board to some perf-board
  • Add more charts to the website
  • Run some sensors outside for humidity/temp
  • Setup alerts when levels reach a certain threshold

Building an 8-bit Computer From Scratch: Part 1 of ?

It was really cold this winter in Chicago. For a few days, the wind chill was around -50F. Which is pretty crazy. To pass the time, I decided to start a new electronics hardware project. The past year or two I’ve mostly been doing hardware projects in my spare time because I never have to chance to any at work. This project is going to take a long time.

Inspiration for this project came from an amazing Youtube Channel by Ben Eater. One of the big series he has on his channel is a step by step guide to building a computer using logic integrated circuits. He goes through all the steps needed to build the CPU clock, registers, arithmetic logic unit, system bus, RAM, ROM logic for displaying numbers and loads of other neat stuff. Ben’s personal website has a full list of parts you can get online and circuit schematics for each of the modules.

Computer Design

The basic structure of the planned computer is pretty simple. As a result it isn’t able to do much. In the end, I want it to add/subtract 8 bit numbers. Use 2’s complement to work with negative integers. It will have about 5 or 6 assembly instructions and be able to run programs that are around 14 instructions long.

Saying it is an 8-bit computer is somewhat misleading because the memory address space will only be 4-6 bits depending on how things are built. The clock speed maxes out at a few hertz so you’re not going to be calculating too fast.

At this point, I have a pretty basic binary adder/subtractor built. It has 5 modules. It can added 2 8-bit numbers and that is about it.

  • System Clock
  • A Register
  • B Register
  • Instruction Register
  • Arithmetic Logic Unit

System Clock

The system clock is made up of a few 555 timer chips setup in their various operating states (astable, monostable, and bi-stable). They are set up to blink an LED at a rate that is adjustable with a variable pot. It runs in two modes. A simple latch system is built with some logic chips.

  • Auto-mode (clock signal runs over and over again at a set rate)
  • Manual mode (clock signal sent every time you click a button)

It runs at a few hertz. You can overclock it by turning the pot to adjust the timer circuit……

Auto/manual system clock built using 555 timer chips

Here’s the schematic from Ben’s website:

System clock circuit diagram

The Registers…

The registers are built using 2 4-bit flip-flops to hold values loaded onto them. There is a lock and clock signal fed in to store the values. You can read them by toggling a read/write pin. Theses are using SN74LS173 chips from Texas Instruments. One of many logic chips from TI that were used.

Each register holds an 8 bit value you can load onto it.

I built A and B registers to load up values to add/subtract. And then an instruction register to store instructions/memory addresses. The instruction register hasn’t been used yet because I don’t have anything to control CPU logic yet.

Register circuit diagram
This is one of the registers hooked up to the system clock. It can read/write values to the yellow LED’s

Arithmetic Logic Unit

The ALU can add/subtract two 8 bit numbers. It makes use of twos-complement to handle negative values. Normally a computers ALU would handle some bitwise operations but this one is only going to add/subtract.

ALU circuit diagram
ALU connected to the A and B registers

Conclusion

There is still a ton of work left to do to get this working decently. I need to work on the system bus, hooking up RAM/ROM and something to keep track of CPU instructions. Also memory management with will be kind of difficult. Planning out the next steps now. I wanted to try and do some stuff outside of Ben’s plans but am not sure what yet.

Here is everything hooked up in all its’ messy glory:

Clock/ALU/A and B register all hooked up writing to the bus. Not pictured is the instruction register

Finding Plaid Shirts with the Amazon Rekognition API

Is that flannel you are wearing?

I’m pretty sure I’ve been bitten by the machine learning bug! The past few weeks, I’ve had the opportunity to work with Amazon Rekognition. It’s a new fangled deep-learning image recognition API that is part of AWS.It’s been fun to play around with. You feed it images and it will send back attempts to detect objects, faces, text and other things you’d want to detect. No need to train your own model and run all sorts of specialized software. Just sign up for AWS, set up a client on your machine and start sending the API images to analyze. It’ll take you around 30 minutes to setup a simple proof of concept and get an idea of the API’s features.

What is Rekognition?

First, lets go over a little bit about what Rekognition is for those who aren’t familiar. Rekognition is an API for deep-learning based image and video analysis. You send it photos or video and it can identify objects, people, faces, scenes, text and other stuff.  Rekognition’s deep-learning algo will attempt to label objects in the image.

There are four types of labelling currently supported

I was blown away by how many objects it could label and the granularity of its’ classifications. My expectations of the API’s accuracy were low initially but I was quickly proven wrong. For instance, the API is able to distinguish different breeds of dogs. It knows there’s a difference between a dung beetle and a cockroach. It is also great at finding faces and labeling the parts of a face. Nose, eye, eyebrow and pupil location are just a few. There was a bit of uncertainty when trying to label emotions. For some reason, it always set my emotion as ‘confused’? As time goes by it will only get better at identification. One thing it never fails to label is flannel/plaid. If there is plaid is an image Rekognition will label it like there is no tomorrow.

It can also analyze streaming video for faces in real time. I haven’t tried video yet but at work we have an AWS DeepLens preordered. It has specialized hardware for deep learning and will be able to use custom detection models.

Let’s Start Tinkering

It is easy to start tinkering with Rekognition. We’ll use the AWS CLI and an S3 bucket to get started. We will upload images to the S3 bucket and pass them to the API via CLI. When the API is done processing the image it will return a string of JSON.

To begin, we will setup a simple environment to send images to the API.

  1. Setup the AWS CLI
  2. Create a S3 bucket for the images to be labeled
  3. Upload those images
  4. Use CLI to run Rekognition on bucket images

If you don’t have the CLI setup here are some 3rd party guides. The AWS docs aren’t known for their quality.

Next, you’ll need to create an S3 bucket with public read permissions. I used the GUI on the AWS console to make one in an availability zone close to me. In this case us-east-1. Take note of your bucket’s AZ because Rekognition needs it to find the right image. Once the bucket was ready, I uploaded a few images to bucket and made sure they were publicly accessible.

Look at that tasty pic

woman eating chicken wings with face covered in hot sauce

We will use cutting edge technology to analyze this image


# The base CLI command for Rekognition is: aws rekognition 
# To detect labels in the image use aws rekognition detect-labels 
# We need to specify an S3 bucket and the proper AZ 
# The bucket is described with escaped JSON 
# The AZ uses the abbreviations used across AWS 
# This page has all the AZ shortnames if you forgot 
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html 
# this will return JSON describing the images aws rekognition 

detect-labels --image "{\"S3Object\":{\"Bucket\":\"tinker-bucket\",\"Name\":\"wing-woman.jpg\"}}" --region us-east-1

Here’s a sample of the JSON that gets returned. Some of the results are pretty funny
"Labels": [
 {
 "Name": "Human",
 "Confidence": 99.27762603759766
 },
 {
 "Name": "Corn",
 "Confidence": 94.26850891113281
 },
 {
 "Name": "Flora",
 "Confidence": 94.26850891113281
 },
 {
 "Name": "Grain",
 "Confidence": 94.26850891113281
 }
}

Here’s a command to use face detection instead of object detection. Face detection mode returns an array with entries for each face in the image. It can detect up to 100 faces per image. For each face it returns an array of potential emotions too. It seems like the emotion detection is hit or miss at times but it is still really good.
# Note the --attributes ALL argument at the end
# Without this the array of emotions wouldn't be returned
aws rekognition detect-faces --image "{\"S3Object\":{\"Bucket\":\"secrete-bucket\",\"Name\":\"wing-woman.jpg\"}}" --region us-east-1 --attributes ALL

// the result has been shortened
{
 "FaceDetails": [
 {
 "BoundingBox": {
 "Width": 0.3016826808452606,
 "Height": 0.46822741627693176,
 "Left": 0.359375,
 "Top": 0.15793386101722717
 },
 "AgeRange": {
 "Low": 23,
 "High": 38
 },
 "Smile": {
 "Value": false,
 "Confidence": 74.72581481933594
 },
 "Eyeglasses": {
 "Value": false,
 "Confidence": 50.551666259765625
 },
 "Emotions": [
 {
 "Type": "HAPPY",
 "Confidence": 38.40011215209961
 },
 {
 "Type": "SAD",
 "Confidence": 3.1377792358398438
 },
 {
 "Type": "DISGUSTED",
 "Confidence": 1.5140950679779053
 }
 ],
 "Landmarks": [
 {
 "Type": "eyeLeft",
 "X": 0.4536619782447815,
 "Y": 0.3465670645236969
 },
 {
 "Type": "eyeRight",
 "X": 0.5664145946502686,
 "Y": 0.3220127522945404
 }
 ]
 }
 ]
}

It returned a bounding box for the person’s face, a potential age range, if they have glasses or not,  an array of potential emotions and coordinates for facial landmarks like the person’s eyes.

One thing to note is the coordinates for landmarks are formatted as decimals between 0.0 and 1.0. To get values in pixels multiply ‘X’ coordinates by the source images width or ‘Y’ coordinates by the height.

Pretty Neat

Rekognition is pretty impressive considering how simple it is to setup and start using. The CLI is easy to setup and start using although there are some parts of the API you can’t use. For instance, you are stuck uploading images to S3 whenever you want them processed. Using an SDK gives you more control over the API and will let you integrate it with applications you are writing seamlessly. I have been using the Python SDK Boto3 and have been very pleased. It has methods for pretty much any AWS product. At some point I’ll post about using it to alter S3 buckets.

Adventures Converting Large PDF Files Into Text

PDF’s Are NOT COOL

Yesterday, I discovered that programmatically searching for text in PDF files is more complicated than you would imagine. A friend of mine that works at a non-profit came to me with a problem she thought could be automated. Her organization investigates the accounting practices of public institutions and works with lots of old government files as a result. This particular task involved searching large amounts of PDF files that were made between 1995 to present day. She needed a way to search through each states financial records, year by year, and record the amount certain search terms occurred in all of the PDF’s. To do this, she was opening each file and “ctrl + f” ing for the search term. Then manually recording each count in a spreadsheet. To me, this sounded like a microcosm of hell on earth. Writing a script for this seemed pretty straight forward. Then I started to learn about PDF files. They are tricky to say the least.

Continue…

Part 1: Wrangling Redis, Gevent, SocketIO and Django

Wiring panel for electric door bell and buzzer

MAGIC DOOR DEVICE

So the other day I was digging around a storage space under neath the step in our apartment and found something awesome. I found the little box and all the wiring for our doors buzzer. It completely exposed and ready to be fiddled with. This discovery paired with seeing this bad boy  Spark Core got me thinking of all sorts of cool things you could do with small WIFI enabled micro-controllers.

Continue…

The Apollo Guidance Computer Part 1

Recently, I have been getting interested in old computers, especially one with low amounts of computing power. The Apollo Guidance Computer has been the one I’ve been reading the most about. It is pretty amazing the way it is designed and how it was able to guide a rocket all the way to the moon and back. It was one of the first computers to heavily use integrated circuits. Some of the hardware used was really unique and embodies a great methods in computer design.

Continue…

Linear Models in R (Part 2) – Data Analysis and Multiple Regressions

Multiple linear regression follows the same ideas as univariate regression, but instead of using one variable to predict and outcome, we will be using several.So there will be one dependent variable and multiple independent ones. By adding more independent variables you can make more precise predictions and model more complex systems.

b0 is the intercept and b1, b2 and so on are just the slopes, which is similar to the way the univariate regression was. They are also known as the regression coefficients.

The general formula looks like this.
y = Θ0 + Θ1(x1) + Θ2(x2) + Θ3(x3) +Θ3(x4)

Investigating Data Before Creating Models

Initially looking at a summary of the data frame you want to do a multiple regression on is a good start point. You’ll get a little bit of a better idea of what the data is like and the range of data for each of the input variables. You can also look into the correlations of the data by using R’s cor() function and see if there are any strong correlation’s between any of the variables. This will establish some idea of bivariate relationships in the data set, but unfortunately they don’t take in account all of the other variables we will be using. The great thing about a multiple regression is that it will give you a more pure representation of the correlation of the variables by taking them all into account.

I’ll just start off with some starter code that will get all of the data into a data frame and then we can start looking into the data itself some more using the techniques I’ve described.


# load ggplot to use later on to make some simple plots
> library(ggplot2)

> baby.datacolnames(baby.data)

Now we’ve got a data frame with the data about baby births and there mothers, and we can begin to start poking around at the data to see if we can find any interesting patterns. R’s summary() function is a good place to start.


> summary(baby.data)
birth_weight gestation parity age
Min. : 55.0 Min. :148.0 Min. :0.0000 Min. :15.00
1st Qu.:108.0 1st Qu.:272.0 1st Qu.:0.0000 1st Qu.:23.00
Median :120.0 Median :280.0 Median :0.0000 Median :26.00
Mean :119.5 Mean :279.1 Mean :0.2624 Mean :27.23
3rd Qu.:131.0 3rd Qu.:288.0 3rd Qu.:1.0000 3rd Qu.:31.00
Max. :176.0 Max. :353.0 Max. :1.0000 Max. :45.00
height weight smoke
Min. :53.00 Min. : 87.0 Min. :0.000
1st Qu.:62.00 1st Qu.:114.2 1st Qu.:0.000
Median :64.00 Median :125.0 Median :0.000
Mean :64.05 Mean :128.5 Mean :0.391
3rd Qu.:66.00 3rd Qu.:139.0 3rd Qu.:1.000
Max. :72.00 Max. :250.0 Max. :1.000

From this we see a few interesting aspects of the data. The smoking and parity values are binary variables, so it may be interesting to split up the data set into two subsets at some point. There isn’t much variance on the height, but both birth_wieght of the baby, weight of the mother, and gestation period have some what large ranges.

Next, we can take a look at the correlations between the variables and see if there are any interesting relationships between the variables. You can use R’s cor() function for this.


> cor(baby.data)
birth_weight gestation parity age height
birth_weight 1.00000000 0.40754279 -0.043908173 0.026982911 0.203704177
gestation 0.40754279 1.00000000 0.080916029 -0.053424774 0.070469902
parity -0.04390817 0.08091603 1.000000000 -0.351040648 0.043543487
age 0.02698291 -0.05342477 -0.351040648 1.000000000 -0.006452846
height 0.20370418 0.07046990 0.043543487 -0.006452846 1.000000000
weight 0.15592327 0.02365494 -0.096362092 0.147322111 0.435287428
smoke -0.24679951 -0.06026684 -0.009598971 -0.067771942 0.017506595
weight smoke
birth_weight 0.15592327 -0.246799515
gestation 0.02365494 -0.060266842
parity -0.09636209 -0.009598971
age 0.14732211 -0.067771942
height 0.43528743 0.017506595
weight 1.00000000 -0.060281396
smoke -0.06028140 1.000000000

There aren’t too many strong correlations from this data set, a few that stick out are baby’s birth weight and smoking, the baby’s height and weight of the mother, and the gestation period and birth weight. There doesn’t really seem to be any thing out of the ordinary for this and most of these correlations make sense. There aren’t any negative ones that really stick out at all. THe only negative correlation that sticks out is between smoking and birth weight.

The next step to get an even more in depth understanding of the data is to start plotting some of the varaibles and looking for linear relationships. VIsualizing the data definitely gives a better perspective of what you are working with and can expose trends thats numbers alone can’t show. Since smoking and birth weight had a decent correlation we’ll start by plotting them.


> ggplot(baby.data, aes(x = birth_weight, y = smoke)) + geom_point()

The only thing that sticks out from this graph is that if the mother isn’t smoking the birth weight will be a little bit higher. Plotting gestation and birth weight might be more interesting because there is a stronger correlation.


> ggplot(baby.data, aes(x = gestation, y = birth)) + geom_point()


This plot is much more revealing and you can clearly seen a linear relationship between the two variables. You can use ggplots geom_smooth() function to plot a fitted line to the data and see the relationships a little bit better. The method argument is set to lm, so it will use a linear model to make the fitted line. THere are many other options you can use as well. The se argument decides if you want to see the distribution of the standard error of the fitted line.


> ggplot(baby.data, aes(x = gestation, y = birth_weight)) +
> geom_point()+
> geom_smooth(method = 'lm', se = FALSE)


The fitted line doesn’t seem to be too revealing, but that is mostly because the correlation between the two variables wasn’t incredibly strong. There is still clearly a linear relationship between them though. By incorporating more input variables into the regression better predictions will be able to be created.

An other tehcnique to get more informoatiomn about the data is to make some density plots of the data you are working with. Here is a simple one for birth weight.


ggplot(baby.data, aes(x = BirthWeight)) + geom_density()


This is a nice way to visualize the data. From this graph you can see that birth weight follows a normal distribution. If this wasn’t the case it might be helpful to take the log of the variable and see how that turns out. Just use R’s log() function. That type of thing is an other topic that I will write about later, when I deal with scaling and normalizing data.

These are just a few of the techniques you can use to look into the data you are working with before trying to build models with it. Getting to know the data well and any relationships can make it much easier to create models that will help you accurately predict values later on.

Multiple Regressions

Now that you have analyzed the data and understand some of the relationships within the data set, its time to create a model with the data and begin to make predictions. You will be using the same lm() function that is used for univariate regressions except the formulation being used will be modified to include the rest of the input variables. Let’s start off by trying to predict the birth weight of a baby given all of the other input variables.


> baby.modelsummary(baby.model)

Call:
lm(formula = BirthWeight ~ Gestation + Parity + Age + Height +
Weight + Smoke, data = baby.data)

Residuals:
Min 1Q Median 3Q Max
-57.613 -10.189 -0.135 9.683 51.713

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -80.41085 14.34657 -5.605 2.60e-08 ***
Gestation 0.44398 0.02910 15.258 < 2e-16 ***
Parity -3.32720 1.12895 -2.947 0.00327 **
Age -0.00895 0.08582 -0.104 0.91696
Height 1.15402 0.20502 5.629 2.27e-08 ***
Weight 0.05017 0.02524 1.987 0.04711 *
Smoke -8.40073 0.95382 -8.807 < 2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.83 on 1167 degrees of freedom
Multiple R-squared: 0.258, Adjusted R-squared: 0.2541
F-statistic: 67.61 on 6 and 1167 DF, p-value: < 2.2e-16

Now you should have a linear model that incorporates multiple input variables. In later post I will go over what all of this data means, but I’ll also put some links at the bottom to explain some of the topics. The next post will go over seeing how the model preforms and the amount of error. Also some ways to minimize the error as well.

Helpful Reading