Virtue AI Research

Research Insights

From academic research to production-grade AI safety.

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Can Pruning Improve Certified Robustness of Neural Networks?

Shake to Leak: Amplifying the Generative Privacy Risk through Fine-tuning

Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM.

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?