Academics from Cornell University have concluded that machine learning could potentially be used for applications such as automated checkout systems following an experiment using the technology to identify what food and drink Oktoberfest-goers preferred.
The published datasets [PDF] revealed that 1,100 images were analysed as part of the research. The images were captured over 11 days using a video camera that was installed above a checkout in a beer tent during the world's largest beer festival in Munich, Germany.
Only a small amount of available data was analysed however, the academics said, noting that analysing all the data would be "very labour-intensive".
Instead, they suggested one possible way around this would be to use self-training -- or "pseudo labelling".
"Pseudo labelling uses a model trained on the labelled data to annotate the unlabelled data. On this automatically annotated data, a new model can be trained that would see more training data at the cost of possible errors in the annotations," the researchers said.
As part of the research, the academics also evaluated the impact of training machine learning systems using daytime only images to examine night time images with low saturation, and it uncovered that performance dropped "substantially".
"Thus, we need night time images in the training set for stable result," the paper stated.
To further examine the performance of the machine learning models, the researchers extracted 85 test images that featured difficult cases like images where many objects are close to each other, there is large obstruction by waiters, or motion blur.
The researchers also used data from the last two days of the event to evaluate the models that were trained on data from the first nine days.
As part of releasing the datasets, the academics have also made available the remaining 600GB of data that was not analysed.
The academics said while deep learning is suitable for detecting objects, it is critical that detection results are stable for automated checkout services.
"In order to get closer to this goal, more research has to be conducted and/or more data needs to be annotated. The dataset we release gives a starting point," the academics said.