HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal. July 29, 2024
Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing. July 29, 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression. July 29, 2024
Research Fine-tuning aligned language models compromises safety, even when users do not intend to! May 25, 2024