Putting together the Dataset — Computer Vision Division

Anyone in machine learning will tell you that the most important part of training any network architecure is the data set used to do so. It needs to be varied so that the network learns to recognize the attributes in several different contexts and this means the volume of data implied is quite considerable.

Our job is to train an architeture which will then by deployed on a CCTV network in order to create custom made alarm systems which will trigger only in certain circumstances, reducing the number of false alarms we are currently observing.

The final product should be able to recognize several situations and objetcs in the streams, but as an initial approach to our dataset we gathered footage off the CCTV network and tagged exclusively those in which people were present. The footage collector was automatized and will automatically generate tags for pictures in which it finds people, tags which are then manually checked by a member of our teams using CVAT, the best open source computer vision anotation tool we found.

Sample CVAT image tagged via bounding boxes. Source:https://www.youtube.com/watch?v=uSqaQENdyJE

There are several different ways of tagging pictures (bounding boxes, polygons, splines, among some) but we found the most suitable to be the use of bounding boxes since we could use our pretagging method gracefully directly on the footage as it was being saved.

We brainstormed as to what variability the data would face given the nature of the job the network would have to do, and we came up with 3 major factors:

  • Light: The cameras’ sensibility changes with light, especially artificial versus natural light. The data set should include pictures taken during the day and the night
  • Weather conditions: Water on lenses create large blurry areas which make the task of detecting pictures difficult. Furthermore fog reduces vision significantly, so the data set should include pictures taken in sunny, rainy, stormy and foggy days to account for all cases.
  • Angles: Cameras at angled at 180° were excluded since the shapes in these cases are significantly warped and would be difficult to learn

With this in mind, we gathered around 5000 images from over 40 different cameras over the course of a week, carefully choosing days when the weather was rainy and foggy as well. The process of checking tags was long and time consuming, so we had several members of our team working on it.

What comes next? Well, training a sample TLT pretrained network. Come back next week to see how that went




We’re FlowLabs, a Software Company specialized in AI, APP and WEB solutions. Join us on our journey through some interesting solutions to our daily challenges!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

GA COVID-19 Report April 20, 2021

GA COVID-19 Report November 8, 2020

Design Considerations for A Web App

Why You Should Think Critically About Your Machine Learning Model Outputs

Predicting user churn for a music streaming app with Spark / PySpark

READ/DOWNLOAD%& Inventory Management and Optimizat

Interpreting a Regression Model Result

FINAL: The @Bengals score 24 second-half points for the win! #JAXvsCIN

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Flow Labs

Flow Labs

We’re FlowLabs, a Software Company specialized in AI, APP and WEB solutions. Join us on our journey through some interesting solutions to our daily challenges!

More from Medium

Ahead of 2023 Elections, by Saidu Ibrahim Emirokpa

Hey everyone! I’m Piyush kumar singh.

E-Portfolio UED 102

The Model of Decentralization