Global monitoring of disease vectors is undoubtedly becoming an urgent need as the human population rises and becomes increasingly mobile, international commercial exchanges increase, and climate change expands the habitats of many vector species.

Traditional surveillance of mosquitoes, vectors of many diseases, relies on catches, which requires regular manual inspection and reporting, and dedicated personnel, making large-scale monitoring difficult and expensive. New approaches are solving the problem of scalability by relying on smartphones and the Internet to enable novel community-based and digital observatories, where people can upload pictures of mosquitoes whenever they encounter them. An example is the Mosquito Alert citizen science system, which includes a dedicated mobile phone app through which geotagged images are collected.

This system provides a viable option for monitoring the spread of various mosquito species across the globe, although it is partly limited by the quality of the citizen scientists’ photos. To make the system useful for public health agencies, and to give feedback to the volunteering citizens, the submitted images are inspected and labeled by entomology experts. Although citizen-based data collection can greatly broaden disease-vector monitoring scales, manual inspection of each image is not an easily scalable option in the long run, and the system could be improved through automation.

Based on Mosquito Alert’s curated database of expert-validated mosquito photos, we trained a deep learning model to find tiger mosquitoes (Aedes albopictus), a species that is responsible for spreading chikungunya, dengue, and Zika among other diseases.

The highly accurate 0.96 area under the receiver operating characteristic curve score promises not only a helpful pre-selector for the expert validation process but also an automated classifier giving quick feedback to the app participants, which may help to keep them motivated.

In the paper, we also explored the possibilities of using the model to improve future data collection quality as a feedback loop.