Available in: Axsy Smart Vision


When capturing photos and videos for the Axsy Smart Vision, it is crucial to follow best practices to ensure high-quality datasets that contribute effectively to the model’s training and performance. This guide outlines key considerations for capturing images and videos, aiming to enhance the diversity, quality, and utility of the dataset. Below are best practices to keep in mind. 


NOTE: Axsy Smart Vision supports both photos and videos for capturing data - if the entire area to be scanned does not fit within a single photo, you should use video to capture the full target area in a single recording.


General Guidelines

Since field reps value their time and prioritise efficiency, Axsy Smart Vision allows them to perform their tasks smoothly and accurately. By remembering a few key guidelines, they can ensure best results without compromising speed and convenience.


Image Quality

If possible, use a high quality device capable of capturing high quality images. This will allow for the model to be accurate in detecting and labelling the products. If capturing the initial dataset for training purposes, this will also ensure the flexibility of the future model. Make sure to minimise the blur caused by sudden camera movements.


Figure 1 - Example of a Bad Picture - Blurry Products.



Figure 2 - Example of a Good Picture - Products are Seen Clearly. 


Angle

Capture images and videos square-on to the shelves with minimal camera rotation. When capturing a video, make sure to maintain the same angle across the entire video. You should move along the shelves - you mustn't stand in one place and rotate the camera.


Figure 3 - Example of a Bad Picture - The Angle is Oblique. 


Figure 4 - Example of a Good Picture - Camera is Square To the Shelves. 


Distance

Although the optimal distance from the shelf varies with product size, the general guidance is to fit about 4 shelf fixtures on screen. A full fixture in view is fine, as long as you don't include background shelves that may make the target products blurry and confuse the detection model. Make sure the products are fully captured, and their front images are not 'cut' - pay special attention when getting close to the end of a shelf.


Figure 5 - Example of a Bad Picture - Items in the Background Make the Front Products Blurred. 


Figure 6 - Example of a Good Picture - Although the Picture is Zoomed, There are No Products in the Background. 



Figure 7 - Example of a Good Picture - Multiple Levels of Products are Seen Clearly. 


Obstructions

Minimise shelf occlusion from pillars, people, baskets etc. Similarly, minimise visual artefacts such as light reflection and blurry light spots. If possible, remove glass/plastic panels to take pictures of unobstructed products.


Figure 8 - Example of a Bad Picture - a Basket Covers Parts of the Products Scanned. 


Figure 9 - Example of a Good Picture - Products are Unobstructed. 


Video Capture Direction

When capturing a video, make sure the products on a shelf are scanned smoothly in only one direction: left to right, right to left, up to down, or down to up; without doubling back. Make sure you don't change your zoom during video capture.


NOTE: When capturing a video using Axsy Smart Vision video capture, you can enable Guidance UI and Haptic Feedback. These features help to ensure a smoother, more consistent scanning experience with real-time visual and haptic feedback regarding your speed, distance from the shelves and angle, for capturing high-quality scans.


Figure 10 - Guidance UI and haptic feedback enabled. 


Initial Model Training

During the initial model training phase, in addition to following the general guidelines outlined in the previous section, it's crucial to capture high-quality, diverse and well-framed images and videos to ensure an effective and efficient training process and improve future model accuracy.


Diversity of the Set

To ensure the best quality of the model, make sure to provide a diverse sample of training images and videos. To capture as much diversity of the set as possible, capture the images in:

  • Different locations (stores, supermarkets, corner shops etc): this will include different lighting conditions, shelving, layout etc. If possible, include images from different geographic locations (different cities, regions, countries).
  • Different camera orientations: landscape and portrait.
  • Different zoom levels.


NOTE: While setting different zoom levels for different images, make sure to maintain the same zoom level across one video.


While maintaining diversity of the set, make sure to keep the dataset balanced across the range of products - try not to generate excessive duplicates.