Smart Monitoring for Conservation Areas
The pipeline has two phases. During training (top), articles are vectorised, classified as threat-relevant or not, and evaluated; a smart-annotation loop based on active learning then selects the most informative samples for human review, iteratively expanding both training and validation sets. At prediction time (bottom), the trained classifier labels new, unlabelled news articles, then a downstream module extracts place mentions, organisations, facilities, and dates from the positives before reporting results to WWF. The architecture achieved 96% recall and 82% precision on conservation-threat detection, enabling near-real-time monitoring of emerging risks to protected areas worldwide.

Abstract

This report documents the outcomes of a Data Study Group held at The Alan Turing Institute in collaboration with WWF Conservation Intelligence. The challenge focused on developing data science techniques to automatically detect news articles reporting emerging threats to protected areas. The project explored approaches ranging from keyword-based filtering to fine-tuned neural language models (BERT) for classifying news articles as relevant conservation threats, particularly infrastructure developments near protected sites. The best-performing model achieved 96% recall and 82% precision, significantly outperforming baseline approaches and demonstrating the feasibility of real-time, automated conservation threat monitoring using NLP.

Keywords: Conservation, WWF, Natural Language Processing, BERT, Threat Detection, Protected Areas