Abstract: We study the consistency of listwise ranking methods with respect to the popular Normalized Discounted Cumulative Gain (NDCG) criterion. State of the art listwise approaches replace NDCG with a surrogate loss that is easier to optimize. We characterize NDCG consistency of surrogate losses to discover a surprising fact: several commonly used surrogates are NDCG inconsistent. We then show how to modify them so that they become NDCG consistent. We then state a stronger but more natural notion of strong NDCG consistency, and surprisingly are able to provide an explicit characterization of all strongly NDCG consistent surrogates. Going beyond qualitative consistency considerations, we also give quantitive statements that enable us to transform the excess error, as measured in the surrogate, to the excess error in comparison to the Bayes optimal ranking function for NDCG. Finally, we also derive improved results if a certain natural “low noise” or “large margin” condition holds. Our experiments demonstrate that ensuring NDCG consistency does improve the performance of listwise ranking methods on real-world datasets. Moreover, a novel surrogate function suggested by our theoretical results leads to further improvements over even NDCG consistent versions of existing surrogates.