globalchange  > 气候变化事实与影响
CSCD记录号: CSCD:6008544
论文题名:
基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究
其他题名: Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity
作者: 王子璇; 乐小虬; 何远标
刊名: 数据分析与知识发现
ISSN: 2096-3467
出版年: 2017
卷: 1, 期:4, 页码:417-427
语种: 中文
中文关键词: 语义相似 ; 主题句识别 ; 外部特征
英文关键词: WMD ; TextRank ; WMD ; TextRank ; Semantic Similarity ; Topic Sentence Recognition ; External Features
WOS学科分类: COMPUTER SCIENCE INTERDISCIPLINARY APPLICATIONS
WOS研究方向: Computer Science
中文摘要: 【目的】自动甄别科技论文中描述研究主题的关键语句。【方法】以论文小节为单位组织句子集,通过训练领域词向量计算句子间WMD距离得到相应语义相似度,优化TextRank算法迭代过程,利用外部特征对所得权值进行调整,按句子权值降序选取关键主题句。【结果】以气候变化领域科技论文作为实验数据,以人工标注的结果为基准对本文的算法和传统的TextRank算法进行对比实验,初步结果表明该方法的识别效果(F值)比传统TextRank算法提升约5%。【局限】句子特征提取有待提高,词向量训练及方法中的相关参数需要做进一步优化。【结论】基于领域词向量,融合WMD语义相似度的TextRank改进算法,能够较好地甄别科技论文小节内部中心句,辅以外部特征的权值调整后可以较好地识别出一篇论文的核心主题句。
英文摘要: [Objective] This paper aims to automatically recognize key sentences describing the research topics of scientific papers. [Methods] First, we used paper sections as the unit to organize sentence sets. Then, we calculated the WMD distance between sentences by trained domain word embeddings. Third, we optimized the iterative process of TextRank algorithm, and used external features to adjust sentence's weights. Finally, we identified the core topic sentences according to the sentence's weights descendingly. [Results] We examined the proposed method with scientific papers on climate changes and compared it with the traditional TextRank algorithm. The recognition efficiency (F-value) was about 5% higher than that of the TextRank algorithm. [Limitations] The extraction of sentence features needs to be improved, and word embedding training and related parameters of the proposed method need to be further optimized. [Conclusions] The improved TextRank algorithm, could effectively recognize inner core sentences of scientific paper sections. It could recognize core topic sentences of a paper with the adjusted weights of external features.
资源类型: 期刊论文
标识符: http://119.78.100.158/handle/2HF3EXSE/153140
Appears in Collections:气候变化事实与影响

Files in This Item:

There are no files associated with this item.


作者单位: 中国科学院文献情报中心, 北京 100190, 中国

Recommended Citation:
王子璇,乐小虬,何远标. 基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J]. 数据分析与知识发现,2017-01-01,1(4):417-427
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[王子璇]'s Articles
[乐小虬]'s Articles
[何远标]'s Articles
百度学术
Similar articles in Baidu Scholar
[王子璇]'s Articles
[乐小虬]'s Articles
[何远标]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[王子璇]‘s Articles
[乐小虬]‘s Articles
[何远标]‘s Articles
Related Copyright Policies
Null
收藏/分享
所有评论 (0)
暂无评论
 

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.