How to Correctly Report LLM-as-a-Judge Evaluations

Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, Kangwook Lee

Language Model ICML 2026

How to Correctly Report LLM-as-a-Judge Evaluations

ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

Yuchen Zeng, Shuibai Zhang, Wonjun Kang, Shutong Wu, Lynnix Zou, Ying Fan, Heeju Kim, Ziqian Lin, Jungtaek Kim, Hyung Il Koo, Dimitris Papailiopoulos, Kangwook Lee

Language Model ICML 2026

ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

Seojeong Park*, Jiho Choi*, Junyong Kang, Seonho Lee, Jaeyo Shin, Hyunjung Shim

Language Model ICML 2026

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

Sanghyun Lee, Seungryong Kim, Jongho Park, Dongmin Park

Language Model ICML 2026

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

Jiyeon Kim, Hyunji Lee, Dylan Zhou, Sue Hyun Park, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Sungmin Cha, Minjoon Seo

Language Model ACL 2026

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models

Junhyuck Kim, Ethan Ewer, Taehong Moon, Jongho Park, Dimitris Papailiopoulos

Language Model ICLR 2026

Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjae Lee, Yuchen Zeng, Shuibai Zhang, Coleman Richard Charles Hooper, Yuezhou Hu, Hyung Il Koo, Nam Ik Cho, Kangwook Lee

Language Model ICLR 2026

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

Draft-based Approximate Inference for LLMs

Kevin Galim, Ethan Ewer, Wonjun Kang, Minjae Lee, Hyung Il Koo, Kangwook Lee

Language Model ICLR 2026

Draft-based Approximate Inference for LLMs

T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models

Minki Kang, Jongwon Jeong, Jaewoong Cho

Language Model ICLR 2026

T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho

Language Model ICLR 2026

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games