♻️ TrashDet - Iterative Neural Architecture Search for Efficient Waste Detection

Bringing Trash Detection to TinyML

By Tony Tran

Project Overview: This post introduces TrashDet, a family of efficient object detectors designed specifically for resource-constrained edge devices (TinyML). By employing an Iterative Neural Architecture Search (NAS) framework, we successfully deployed waste detection models on microcontrollers, reducing energy consumption by up to 88% while outperforming existing baselines on the TACO dataset.

Introduction

Uncontrolled waste disposal is a major environmental crisis. While automatic image-based waste classification offers a scalable solution for monitoring streets, parks, and rivers, a critical bottleneck remains: power.

Effective monitoring systems often need to operate in remote locations on battery or solar power, where network connectivity is unreliable. This requires models to run locally on the device (Edge Computing). However, state-of-the-art detectors like YOLOv8 or Deformable DETR require GPU-class hardware, making them infeasible for microcontrollers with strict memory (kilobytes, not gigabytes) and energy constraints.

To bridge this gap, we developed TrashDet, a framework that uses hardware-aware Neural Architecture Search (NAS) to design models that fit specifically within the tight budgets of TinyML hardware like the Analog Devices MAX78002.

Methodology

1. The Challenge: Combinatorial Complexity

Designing a neural network involves choosing depth, width, kernel sizes, and resolutions. When you add hardware constraints (latency, energy, flash memory) to the mix, the search space becomes incredibly vast.

We formulated the search as a constrained optimization problem:

\[f = \text{arg} \min_{f \in \mathcal{A}} \mathcal{L}_{val} (\mathcal{N}(f, W^*))\] \[\text{s.t.} \quad Cost(f) \le T\]

Here, \(f\)represents the architecture configuration, and\(T\) is the hardware constraint threshold. To solve this efficiently, we didn’t search the whole network at once. Instead, we used a “Once-For-All” (OFA) supernet and an Iterative Decomposition strategy.

Overview of TrashDet Framework Figure 1: The TrashDet framework. A unified supernet is trained once, followed by an iterative evolutionary search that alternates between optimizing the backbone and the neck/head.

Because jointly optimizing the backbone, neck, and head is computationally expensive, we decomposed the problem. We alternate between two stages:

  1. Search Backbone: Optimize the feature extractor while keeping the head fixed.
  2. Search Head: Optimize the detection head while keeping the backbone fixed.

This reduces the search space dimensionality. To ensure the model doesn’t “forget” good configurations when switching stages, we implemented a Population Passthrough mechanism. This retains a fraction of elite architectures from the previous stage to seed the next generation.

3. Hardware-Constrained Supernet

Our search space is built upon a dynamic ResNet-style backbone and a YOLO-style detection head. The supernet supports dynamic:

  • Depth: 2 to 8 residual blocks per stage.
  • Width: Channel multipliers from \(0.8\)to\(1.5\).
  • Expansion Ratios: Ratios determining the internal bottleneck size.

Crucially, the search is hardware-aware. Every candidate architecture is checked against the specific constraints of the target hardware (e.g., the MAX78002 CNN accelerator limit of 128 layers and 80 KiB activation memory) before evaluation.

Comparison of Basic Building Blocks Figure 2: Visualizing the building blocks. We utilize a dynamic OFA-Res Block (right) that allows elastic width and depth compared to standard ResNet or DarkNet blocks.

Experimental Setup

  • Dataset: We used a subset of the TACO (Trash Annotations in Context) dataset, focusing on five classes: Paper, Plastic, Bottle, Can, and Cigarette. We also validated on TrashNet for microcontroller experiments.
  • Hardware Target: Analog Devices MAX78002, an ultra-low-power microcontroller with a hardware CNN accelerator.
  • Baselines: We compared against YOLOv5m, YOLOv8m, RTMDet, and AltiDet.

Results

Size vs. Accuracy on TACO

The TrashDet family demonstrated a superior Pareto frontier compared to existing state-of-the-art models.

  • TrashDet-l achieved 19.5 mAP50 with just 30.5M parameters.
  • This outperforms AltiDet-m, which requires 85.3M parameters (nearly 3x larger) to achieve a lower score of 18.4 mAP50.
  • It also surpassed Deformable DETR (40M params) and RTMDet (52M params).

State-of-the-Art Comparison Figure 3: Comparisons on the TACO dataset. TrashDet (red) provides higher accuracy with significantly fewer parameters than competing methods (blue).

Microcontroller Deployment (MAX78002)

We specialized the search for the MAX78002 microcontroller using the TrashNet dataset, resulting in two distinct models: TrashDet-ResNet (efficiency focus) and TrashDet-MBNet (accuracy focus).

The efficiency gains compared to the baseline (ai87-fpndetector) were substantial:

Model Params Energy (µJ) Latency (ms) Power (mW) mAP50
Baseline (ai87) 2.18M 62,001 122.6 445.8 83.1
TrashDet-MBNet 1.32M 17,581 51.1 285.0 93.3
TrashDet-ResNet 1.08M 7,525 26.7 210.5 84.6

Key Findings:

  1. Massive Energy Savings: TrashDet-ResNet reduced energy consumption by 87.9% and latency by 78.2% compared to the baseline, allowing for prolonged battery life in the field.
  2. High FPS: The ResNet variant runs at 37.45 FPS on the microcontroller, enabling true real-time waste sorting.
  3. Scalability: The framework successfully generated models ranging from 1.2M to 30.5M parameters, proving it can adapt to different hardware budgets.

Conclusion

TrashDet proves that we do not need GPU-server infrastructure to solve complex computer vision tasks like waste detection. By intelligently searching for architectures that fit within the hardware constraints of microcontrollers, we can deploy accurate, real-time, and sustainable monitoring solutions directly to the edge.

This work lays the foundation for smart cities where every bin or street corner can autonomously monitor waste levels without relying on the cloud or frequent battery replacements.


Citation

If you find this work useful, please check out the full paper: ArXiv Link

T. Tran and B. Hu, “TrashDet: Iterative Neural Architecture Search for Efficient Waste Detection,” in Proceedings of the 2026 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, Mar. 2026.

Share: X (Twitter) Facebook LinkedIn