I investigate reducing human labeling effort on an email topic classification task by augmenting pool-based active learning with LLM-generated (synthetic) labels and LLM-guided human selection. Using a K-Means initialization and a robust, calibrated SVM student model, we show LLM-augmented strategies increase label efficiency in simulation experiments (see plots below).
A high-level schematic: embeddings + SVM student are combined with LLM synthetic labeling and LLM strategic selection to reduce human labeling needs.

Goal: Reduce human labels required for accurate email topic models by leveraging LLMs for labeling and selection.
Context: Standard pool-based active learning selects uncertain examples from an embedding space (we use precomputed embeddings in `embeddings_matrix.npy`). However, many examples can be reliably labeled by a modern LLM — if used carefully this can save human effort.
Impact: Lower annotation cost and faster iterations for domain-specific classifiers while maintaining accuracy.
Pipeline: (1) precompute embeddings, (2) initialize labeled set with K-Means centroids, (3) train a robust SVM (wrapped with calibration), (4) run active learning iterations where we either query humans (uncertainty or LLM-strategic selection) or request batch LLM labels, (5) re-train with weighted synthetic labels.
Key components: Core active learning loops with LLM integration, an interactive notebook demonstrating experimental runs, precomputed embeddings, and a dataset with email subjects and bodies.
LLM usage: We use a Gemini flash model for batched JSON output. The LLM returns synthetic labels for many items and a separate strategic-selection prompt can return which items to send to humans.
Evaluation: Simulated experiments compare (1) standard uncertainty sampling AL, (2) LLM-augmented labeling AL, and (3) LLM-strategic AL. We measure accuracy on held-out data vs number of human labels.
| Experiment | Standard AL | LLM-Labeling AL | LLM-Strategic AL |
|---|---|---|---|
| Metric | Accuracy vs Labels | Accuracy vs Labels | Accuracy vs Labels |
| Notes | Baseline uncertainty sampling | LLM provides synthetic labels (discounted weight) | LLM selects which examples humans should label |
Figure: Example accuracy curves (notebook `active_learning.ipynb` reproduces these runs).
LLM-augmented active learning shows promise to reduce human labeling budgets. Reproducibility requires the dataset, precomputed embeddings, and a (possibly paid) LLM API key to recreate synthetic-label experiments. Future work: improved prompt engineering, calibration of synthetic label weights, and an interactive human-in-the-loop labeling UI.