Jump to This TimeTranscription Text
00:00:03Hello, everyone, and welcome back to the channel. If you've been following along, you know that I'm trying to build a robot that is capable of finding specific coins that I can add to my collection. If you're new here, welcome and please consider subscribing to help this channel grow. In my last video, I was struggling to develop a neural
00:00:23network model that was capable of identifying particular coin designs. I was having mixed results, and the final model was still suffering from a lack of real world accuracy and the problem of overconfidence in its results. Called overfitting. After struggling with the model for a few more weeks and reviewing my last video again, I now realize that when I said my first attempt was the best,
00:00:50what I should have said was that my first attempt was the least bad. I've also discovered that the path to enlightenment often involves taking a few steps back to unlearn or properly relearn things that you thought you understood well. But such is life, I guess. Over the last three weeks,
00:01:08I did a lot of traveling to visit with family that I haven't seen in many years. That involved driving through most of the states along the eastern seaboard from upstate New York to central Florida. That gave me a lot of time to think of ways to improve my neural network and improve its efficiency, the result of which is now several models that are only slightly less bad. In today's video, I just want to go over a few things
00:01:34I've learned over the last few weeks and show you the slight improvements I've made to my coin recognition and orientation algorithm. First of all, running the training for 21 hours to get a single epoch seemed like an awfully long time, so that was the first thing I wanted to tackle with the number of models I needed to run to get this all to work. The object really should be to get the runtime and efficiency
00:01:58of the algorithm to a much more acceptable level. The goal for me is to be able to train and retrain all of the models in a single overnight session while I'm sleeping. I tried adjusting several variables in my coin recognition model, but the one that seemed to have the most effect was adjusting the batch size. Apparently, I had my batch size set
00:02:20to 2, which, as it turns out, resulted in nearly the longest runtime possible. Oops! Batch size affects how often the weights and biases of your tensors are updated. Setting this to 2 meant that the tensor values were updated after every other step. By changing just this one parameter back to the default value of 32,
00:02:42the epochs went from 21 hours down to about 40 minutes, which meant that at least this one model could meet the goal of running multiple epics while I slept. Even after those improvements in runtime, the model was still suffering from overfitting and overall low accuracy. So I realized that I needed to get some help from someone with more experience in these matters.
00:03:04And that reminded me that one of my Facebook friends, Adrian Bostrom, had shared a similar project regarding Lighting Augmentation by a researcher named Paul Krush. A few years ago, Paul was doing some experiments with coin sorting in an effort to demonstrate machine learning's ability to detect manufacturing defects.
00:03:24I'll leave a link below to Paul's YouTube channel and GitHub repository in the description below. But when I rewatched Paul's "Lighting Augmentation" video, I noticed that his coin sorting model used images at a resolution of only 28 x 28 pixels. And that got me curious about whether or not I might be able to speed up my runtime
00:03:43by actually reducing the resolution of the images in my own data set. If you spend any time at all thinking about such things, you'll quickly realize that doubling the width and height of an image quadruples the area of that image. When you quadruple the area, you also quadruple the number of pixels in the image. And the total amount of data to process also increases by a factor of four.
00:04:09At the time, I was using 100 by 100 pixel images in my data set. So that meant if I could reduce the resolution to 28 by 28 like Paul was using, I could potentially cut my runtime by nearly a quarter. But all of that prompted me to reach out to Paul for some additional guidance. Specifically, I was interested in knowing if he had given much more thought to adding additional designs to his
00:04:33search, and if so, what he thought the next steps would be. I was also curious how he implemented coin orientation into his design and why he chose 28 x 28 pixel resolution for his images. Paul was super generous with his time, and when I shared with him my ideas of rotating each of the 14 designs through 360 degrees, he indicated that that was probably overkill.
00:04:59He told me that 90 different angles was more than enough to get the coins oriented "in the ballpark," which results in a data set with images rotated in four degree increments. Also, I'm paraphrasing here so Paul, if you're watching this, please feel free to correct me in the comments down below. But basically he said that too much data, either on the input or output of your
00:05:22model, can exacerbate the issue of overfitting. So in reality, reducing the input channels and the number of output classes tends to help generalize the model better. When I asked why he was using the 28 x 28 pixel resolution, he said that he was using a convolutional neural network model called LeNet, and that was the resolution required as input to that particular model.
00:05:47Well, that answered that question, and it makes sense. GALAXY QUEST: "Nothing I just thought it would be more complicated than that." In my first model, I had 10,000 inputs for each image. That's 100 pixels times 100 pixels per image, and 5,040 output classes. That's 14 designs times rotating each of them through 360 degrees. But armed with Paul's advice, I rebuilt my data set at 28 by 28 pixel
00:06:12resolution and rotated each image through four degree increments. This reduced the overall amount of data to just 784 inputs and 1,260 output classes. Running the new model resulted in a huge improvement in runtime, and each epoch was down to about ten minutes, even with just over 1.2 million images to work with. But ultimately, the end results weren't much better.
00:06:40I came to the conclusion that 28 x 28 was probably not enough input for the model that I had created to differentiate between the 14 different designs. If you take a look at some of these images, you can see that it would be hard even for a human to see what was going on in some of them. That being said, I do understand that making comparisons between what a human can do and see
00:07:01and how a computer processes images ultimately isn't particularly useful. So there's really no need to tell me about that in the comments. Unless, of course, you really need to. I did a bit more googling to see if anyone on the Internet had tried to solve the coin recognition part of the overall model. But what I discovered was that most people
00:07:22who started similar projects had figured out how to differentiate between only two designs, either heads or tails of a particular coin, or in one instance, two different types of Roman coin design. It seemed that most researchers, after figuring out how to differentiate between two different designs, decided that adding additional designs would be simple and therefore didn't go much beyond the first two.
00:07:47But as it turns out, in my experience, adding additional designs to the mix, including multiple augmentations like rotation, jitter and brightness, increases the complexity exponentially. So my new theory was that adding an additional twelve designs to the mix might be asking my model to do too much in a single pass. I realized after Paul's feedback that I could actually break the problem into much
00:08:12smaller chunks and solve for individual pieces of the puzzle with separate models. Paul suggested that I first solve for the coin design with 14 output classes, then solve for rotation with 90 output classes instead of 360, then solve for coins dates by cropping dates from the correctly oriented fullsized images, and then resizing the cropped images at the full 28 by 28 pixel resolution.
00:08:39I was already using TensorFlow Keras image dataset from directory preprocessing to use the folder structure of the data set to automatically produce the output classes of my model. And that led me to believe that if I could organize my data set into a better folder structure with each level representing the output classes of the different models, I could continue to use the original 6 million plus training
00:09:05images to train the AI and continue to train all of the different models with the same exact data set. By reducing the image resolution, I also recognized that two of the 14 designs for US small cents, were very similar to other designs in the collection, the details differentiating. Those two were so insignificant that they
00:09:25would not even be visible to the AI at the lower 28 by 28 pixel resolutions. Therefore, those designs could either be removed or combined with the other output classes of the design recognition model. The designs I'm referring to are the Lincoln Memorial Wide AM reverse design and the Lincoln Memorial Close AM reverse, as well as the Wheat reverse and the Wheat reverse with the VDB initials.
00:09:53By combining those four different designs into just two separate output classes Lincoln Memorial reverse and Wheat reverse it narrowed the initial number of output classes in the initial model from 14 down to just 12. In other words, the first model would sort the 6 million images by different designs and separate them into twelve separate output classes. Each of those designs would then have a separate subset of output classes
00:10:21that would check the coin's rotation in increments of five degrees, broken down into 72 total output classes for the degrees zero to 355. And in turn, inside each of those 72 class folders would be all of the images within the adjacent five degrees and one degree increments. This folder structure allows me to create different unique models with unique sets of output classes and unique tensor
00:10:49structures, depending on what features we're interested in at that level. But this still maintains each model's ability to train itself unsupervised. So what's the upshot of all this? Well, the results are still a bit mixed. The accuracy reported by TensorFlow on the design recognition model is nearly 100%, but in realworld testing, it's more like 95%.
00:11:13Actually, it's 94.5% to be more precise. And I still don't believe this is quite good enough for coin sorting. But rather than just admit defeat or actually taking the time to learn more about creating better image recognition models, I took the lazy way out and developed a new method for improving the accuracy of the existing model through sheer brute force.
00:11:37In my real world testing, I created custom validation images by rotating some of the original images at random angles. What I noticed was that my model would get the recognition of some of these images correct, but it would guess incorrectly for certain rotations of the same exact image. So I developed a consensus method for improving the accuracy of my model
00:12:00without changing any of the underlying tensor or layer values. This means that when testing a particular image, I rotate the image six times at specific intervals. If the majority of the recognition results point to the same design being recognized, I can be more certain that the output is correct. Overall, this worked pretty well.
00:12:22It took the accuracy of my flawed model from about 95% up to nearly 99%, so this is a major improvement. The downside, though, is that it takes six times as long to run the algorithm, or about 4 to 5 seconds per image, to correctly identify and orient the coin. So there's a lot more work to be done to improve the underlying models that I
00:12:44currently have and improve their accuracy on the first pass. At some point, I hope to meet up with other professional or even armchair data scientists who might be able to help get me pointed in a better direction when trying to improve my models. But in the meantime, I'm pretty happy with the results so far. The models I currently have are pretty good at recognizing the Lincoln Obverse
00:13:06design in particular, and getting that design properly oriented. In these tests, you can see the original image on the left and the resulting output on the right when you see that the coin is properly oriented in the output. This means that the coin design was properly recognized and that the next model was able to determine how many degrees out of alignment it was. In other news, my 3D printer is now out of backorder, and I was finally able
00:13:37to pay for the delivery of a BambuLabs X-1 Carbon Combo 3D printer that should be coming within the next few weeks. And I am so excited to be able to start building the actual body of the robot soon. By the next video, I hope to learn a lot more about neural network image recognition model design so that I can
00:13:56develop even better models for each of the cent designs and their orientation. So I remain hopeful that in the next video you will see a fully functioning neural network capable of recognizing and orienting any of the 14 small US cent designs. Or maybe I got that down to 12 now. I'm not really sure anymore,
00:14:16but if you like this sort of thing, I hope you'll subscribe to the channel, hit the like button to let the YouTube algorithm know that other people just like you, might be interested in this type of content. And if you're really a glutton for this sort of thing, by all means hit the notification bell so you know exactly when the next video comes out.
00:14:29If anyone would like more information, or if you'd like to collaborate or contribute to this project in some way, please leave a comment down below. I'd love to chat with you, but until next time, that's my two cents. Take care, everyone. Stay safe and have a great day.