Min-kNN Distance is a robust, black-box method for detecting RLVR training data exposure without requiring access to token probabilities or internal model parameters.
- Sampling: Generate multiple completions for a given prompt from the model.
- Edit Distance: Compute the pairwise normalized edit distance between all generated completions.
- Min-kNN: Identify the nearest neighbor for each completion and average the k smallest distances.
Core Insight: RLVR induces a structural collapse on seen prompts, leading to highly repetitive outputs and significantly lower Min-kNN values compared to unseen prompts.