Publications | Tianle Gu（顾天乐）

2026

Technical
Report
scholar1

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Xin Wang , Yunhao Chen , Juncheng Li , and 8 more authors

arXiv preprint arXiv:2601.01592, 2026

ABS GITHUB

Abstract not provided yet.

2025

ACL
(Findings)
scholar17

From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs

Tianle Gu , Kexin Huang , Ruilin Luo , and 5 more authors

In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025

ABS PDF GITHUB

LLM Unlearning plays a crucial role in removing sensitive information from language models to mitigate potential misuse. However, previous approaches often treat nonsensical responses or template-based refusals (e.g., “Sorry, I cannot answer.”) as the unlearning target, which can give the impression of deliberate information suppression, making the process even more vulnerable to attacks and jailbreaks. Moreover, most methods rely on auxiliary models or retaining datasets, which adds complexity to the unlearning process. To address these challenges, we propose MEOW, a streamlined and stealthy unlearning method that eliminates the need for auxiliary models or retaining data while avoiding leakage through its innovative use of inverted facts. These inverted facts are generated by an offline LLM and serve as fine-tuning labels. Meanwhile, we introduce MEMO, a novel metric that measures the model’s memorization, to select optimal fine-tuning targets. The use of inverted facts not only maintains the covert nature of the model but also ensures that sensitive information is effectively forgotten without revealing the target data. Evaluated on the ToFU Knowledge Unlearning dataset using Llama2-7B-Chat and Phi-1.5, MEOW outperforms baselines in forgetting quality while preserving model utility. MEOW also maintains strong performance across NLU and NLG tasks and demonstrates superior resilience to attacks, validated via the Min-K% membership inference method.
ACL
scholar16

Morphmark: Flexible adaptive watermarking for large language models

Zongqi Wang , Tianle Gu , Baoyuan Wu , and 1 more author

arXiv preprint arXiv:2505.11541, 2025

ABS

This paper studies the effectiveness-quality trade-off in red-green list watermarking for LLM text generation. It formulates watermarking as a multi-objective optimization problem and identifies a key factor behind this dilemma. Based on the analysis, MorphMark is proposed as an adaptive method that dynamically adjusts watermark strength instead of using a fixed hyperparameter. The method is model-agnostic and model-free, making deployment easier across rapidly evolving models. Experiments show improved balance between watermark detectability and text quality, while also providing strong efficiency and flexibility.
EMNLP
(Oral)
scholar3

Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking

Tianle Gu , Zongqi Wang , Kexin Huang , and 4 more authors

arXiv preprint arXiv:2505.14112, 2025

ABS VIDEO

We present Invisible Entropy, a safe and efficient low-entropy watermarking paradigm for large language models. The method improves robustness and detectability while preserving text quality under practical decoding settings. It is designed to reduce deployment overhead and maintain compatibility with modern LLM generation pipelines. Extensive experiments show favorable trade-offs against prior watermarking methods across quality, security, and efficiency metrics.
AAAI
scholar4

HoneypotNet: Backdoor attacks against model extraction

Yixu Wang , Tianle Gu , Yan Teng , and 2 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2025

ABS

We study model extraction under adversarial settings and propose HoneypotNet, a backdoor-based attack strategy against extraction pipelines. The method injects stealthy triggers that remain latent during normal usage but induce targeted behavior in extracted surrogate models. We analyze transferability and attack success under different query budgets and defense settings. Results show that extraction systems can inherit hidden vulnerabilities, motivating stronger auditing and robust defense mechanisms.
EMNLP
scholar26

Benchmarking large language models under data contamination: A survey from static to dynamic evaluation

Simin Chen , Yiming Chen , Zexin Li , and 8 more authors

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ABS

Abstract not provided yet.
Lab. Safework-r1: Coevolving safety and intelligence under the ai-45◦ law

AI Shanghai

arXiv preprint arXiv:2507.18576, 2025

ABS

Abstract not provided yet.
EMNLP
(Findings)
scholar0

Fair Text-Attributed Graph Representation Learning

Ruilin Luo , Tianle Gu , Lin Wang , and 4 more authors

In Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

ABS PDF

Abstract not provided yet.
arXiv
scholar6

Linguasafe: A comprehensive multilingual safety benchmark for large language models

Zhiyuan Ning , Tianle Gu , Jiaxin Song , and 8 more authors

arXiv preprint arXiv:2508.12733, 2025

ABS

Abstract not provided yet.

2024

NeurIPS
scholar66

MLLMGUARD: a multi-dimensional safety evaluation suite for multimodal large language models

Tianle Gu , Zeyang Zhou , Kexin Huang , and 8 more authors

In Proceedings of the 38th International Conference on Neural Information Processing Systems, 2024

ABS REVIEW PDF DATASET

We introduce MLLMGuard, a multi-dimensional safety evaluation suite for multimodal large language models. The framework covers diverse risk categories with bilingual data, standardized inference tooling, and both manual and automatic evaluation protocols. We further train a lightweight evaluator (GuardRank) on human annotations to enable scalable and reproducible scoring. Experiments on closed-source and open-source MLLMs show substantial safety differences across dimensions and reveal gaps not captured by existing single-axis benchmarks.
arXiv
scholar58

Chain of history: Learning and forecasting with llms for temporal knowledge graph completion

Ruilin Luo , Tianle Gu , Haoling Li , and 4 more authors

arXiv preprint arXiv:2401.06072, 2024

ABS

We propose Chain of History, an LLM-based framework for temporal knowledge graph completion that jointly models historical evolution and future forecasting. The approach leverages sequential temporal context to improve reasoning over dynamic entities and relations. It supports both interpolation and extrapolation settings and integrates naturally with large language model priors. Experiments on benchmark temporal KGs demonstrate competitive or superior performance with strong generalization across time horizons.
EMNLP
scholar28

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

Haiquan Zhao , Lingyu Li , Shisong Chen , and 11 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

ABS DOI PDF

Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (e.g., ChatGPT) and ESC-oriented LLMs (e.g., ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4.

Bold indicates first author.