

数据集包含以下特征字段:
id
:唯一标识符(字符串类型)title
:生成的标题(字符串类型)content
:原始内容/文本(字符串类型)title_match_1
:第一位标注者对标题-内容相关性的评估分数(浮点型)title_match_2
:第二位标注者对标题-内容相关性的评估分数(浮点型)tendency
:情感或倾向分类(字符串类型)average_score
:两位标注者的平均分数(浮点型)score_difference
:两位标注者分数的差异(浮点型)bo/ average_score_4_or_higher.csv average_score_below_4.csv bo-all.csv mn/ average_score_4_or_higher.csv average_score_below_4.csv mn-all.csv ug/ average_score_4_or_higher.csv average_score_below_4.csv ug-3.csv
bibtex @inproceedings{xu2025cmhg, title = {{CMHG}: A Dataset and Benchmark for Headline Generation of Minority Languages in China}, author = {Guixian Xu and Zeli Su and Ziyin Zhang and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong}, booktitle = {The 2025 Conference on Empirical Methods in Natural Language Processing}, year = {2025}, url = {https://openreview.net/forum?id=bmkwrhOmC2} }