Raspberry Pi Timelapse 4: Sort Images Using Machine Learning

Note: if you would like to jump to the source code and learn from there, it is at: https://github.com/stevefoga/image-classifier

Now that the Raspberry Pi is configured and capturing images 24/7, there will inevitably be some undesired images. The plant light only runs 18 hours a day, so some images will be dark. The logical fix could simply be to remove images by small file size, with either a static threshold or rolling standard deviation. However, due to the ambient lighting conditions here, the “dark” images are often the same size as the “light” images. The files in question are visualized below using smoothed histograms of both “dark” and “light” images:

hist_dark_vs_light

Note the overlap of distributions, indicating a single threshold would not work well in this case. The plot was generated in Python using the os module to get the file sizesmatplotlib to plot the results, and smoothed the histograms using advice found here.

Another option would be to adjust the cron to capture when the light is on; this doesn’t work in practice, as a slight flicker in power, a bump of the power plug, or other unplanned maintenance resets the timer altogether, meaning precious acquisitions will be missed (or unlit conditions will persist) under a static schedule.

Therefore, the best solution is to automatically remove dark images. The images are dark, so perhaps the quickest way would be to calculate each image’s mean, and qualitatively determine an appropriate cutoff value. However, this neglects outliers that may be bright, but not consistent with the desired images for a timelaspe video — otherwise, flickering becomes more apparent.

To perform the classification, a “good” (“light”) set and a “bad” (“dark”) set of training images are selected; these are given to a machine learning algorithm to build a statistical model that will be applied to the remaining images in the dataset. Twenty “good” and twenty “bad” images were picked from the same day. Twenty was picked rather arbitrarily; it took little time to pick twenty of each image type, and all the classifier has to do is discern light from dark. An example of the “good” images selected:

good_samples.jpg

An example of the “bad” images:

bad_samples.jpg

The classifier workflow was written following this blog post, which also thoroughly explains the science behind image classification. The code written for this post, including step-by-step instructions, is found here. The program used here is generate_classifier.py. The output of the classifier is serialized to a text file (using pickle), as the result is a Python class and not raw data. The only non-standard Python library required to run the code is sklearn, which could be installed several different ways. From there, the program can be called by specifying both input directories, output filename for the classification file, and file extension of the input images (default is ‘.jpg’.) An example:

python generate_classifier.py --pos /path/to/good_images --neg /path/to/bad_images -o /path/to/classification_file

After the classifier is generated, the images can be sorted using sort_images.py. This applies the classifier to a directory with all captured images, and moves (not copy) them into output directories. An example:

python sort_images.py -i /path/to/all_images -e .jpg -m /path/to/classification_file --pos /path/to/good_classified --neg /path/to/bad_classified

Note that all of these programs have a --dryrun flag, which runs the code, but does not make any file modifications.

How well did the classifier work? All of the dark images were moved to the “bad” bin successfully, and does not contain any images where the lamp is lit. However, the “good” bin contains some images with the light off; while the ambient lighting is sufficient, it would still throw off a timelapse video. An example:

good_nolight_30pct.jpg

Designing a classifier is an iterative approach, so adding this image to the “bad” training set would likely help eliminate this situation.

In future posts, I’ll discuss how to create a timelapse video. There are numerous workflows using free, trial, and/or open source software to accomplish this task. I ended up making some formatting alterations using a few Python scripts, and using Windows-based software to complete the videos.

Advertisements

One thought on “Raspberry Pi Timelapse 4: Sort Images Using Machine Learning

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s