Text Summarization with Pretrained Encoders
Text Summarization with Pretrained Encoders 를 읽고 정리한 글입니다.
Text Summarization with Pretrained Encoders
이 저자가 참여한 전 논문이 Fine-tune BERT for Extractive Summarization
였음.
- 따라서 발전된 부분에 대해 살펴볼 것임.
이전 연구와의 차이점 (Differences from previous studies
)
- 전에는 extractive summarization에 대해서만 살펴봤었음.
- 현 연구에서는 a general framework for both extractive and abstractive models (두 서머리 모델을 사용함)
Data
- CNN/Dailymail : highlighted된… -> extractive summaries
- NYT : abstractive summaries
- XSum : one-sentence summary
Method
- extractive model
- built on top of the encoder by stacking several inter-sentence Transformer layers
- abstractive model
- a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismathch between two (the former is pretrained while the latter is not)
- a two-staged fine-tuning approach
- it can further boost the quality of the generated summaries
- the combination of extractive and abstractive objectives can help generate better summaries (Gehrmann et al., 2018)
- 즉 두 가지 모델을 컴바인시키면 성능 향상을 노릴 수 있다는 아이디어에서 착안한 연구.
- architecture of BERTSUM
- Summarization Encoder
- Interval Segment Embeddings
- sent(i)를 홀수,짝수 순서에 따라 E(a) or E(b)로 segment embedding 한다.
1 2
sentence -> [sent1,sent2,sent3,sent4,sent5] embedding -> [E(a),E(b),E(a),E(b),E(a)]
- sent(i)를 홀수,짝수 순서에 따라 E(a) or E(b)로 segment embedding 한다.
- Interval Segment Embeddings
- Extractive Summarization
- 이전 연구와 수식 동일
- Abstractive Summarization
- the encoder is the pretrained BERTSUM and the decoder is a 6-layered Transformer initialized randomly.
- encoder & decoder 사이의 mismatch 가능성 있음
- new fine-tuning schedule!
- encoder & decoder 사이의 mismatch 가능성 있음
- difference optimizer 적용!
- the encoder is the pretrained BERTSUM and the decoder is a 6-layered Transformer initialized randomly.
- two-stage fine-tuning
1 2
1. fine-tune the encoder on the extractive summarization 2. fine-tune it on the abstractive summarization
- using extractive objectives can boost the performance of abstractive summarization.
Results
ROUGE score 사용함
전보다 성능도 많이 좋아짐.
관련 논문
접기/펼치기 버튼
- Sebastian Gehrmann, Yuntian Deng, and Alexander Rush. 2018. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empiri- cal Methods in Natural Language Processing, pages 4098–4109, Brussels, Belgium.
- Chin-Yew Lin. 2004. ROUGE: A package for auto- matic evaluation of summaries. In Text Summariza- tion Branches Out, pages 74–81, Barcelona, Spain.
- Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018a. Don’t give me the details, just the summary! topic-aware convolutional neural networks for ex- treme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807, Brussels, Bel- gium.
- Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018b. Ranking sentences for extractive summa- rization with reinforcement learning. In Proceed- ings of the 2018 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies, Volume 1 (Long Papers), pages 1747–1759, New Orleans, Louisiana.
- Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sen- tence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Lan- guage Processing, pages 379–389, Lisbon, Portugal.
- Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. HI- BERT: Document level pre-training of hierarchical bidirectional transformers for document summariza- tion. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5059–5069, Florence, Italy. Association for Computational Linguistics.
- Xingxing Zhang, Mirella Lapata, Furu Wei, and Ming Zhou. 2018. Neural latent extractive document sum- marization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 779–784, Brussels, Belgium.
This post is licensed under CC BY 4.0 by the author.