Text Summarization with Pretrained Encoders

Posted Jul 26, 2020

3 min read

Text Summarization with Pretrained Encoders 를 읽고 정리한 글입니다.

이 저자가 참여한 전 논문이 Fine-tune BERT for Extractive Summarization였음.

이전 연구와의 차이점 (Differences from previous studies

)

전에는 extractive summarization에 대해서만 살펴봤었음.
현 연구에서는 a general framework for both extractive and abstractive models (두 서머리 모델을 사용함)

extractive model
- built on top of the encoder by stacking several inter-sentence Transformer layers
abstractive model
- a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismathch between two (the former is pretrained while the latter is not)
a two-staged fine-tuning approach
- it can further boost the quality of the generated summaries
- the combination of extractive and abstractive objectives can help generate better summaries (Gehrmann et al., 2018)
- 즉 두 가지 모델을 컴바인시키면 성능 향상을 노릴 수 있다는 아이디어에서 착안한 연구.
architecture of BERTSUM

Summarization Encoder
- Interval Segment Embeddings
  - sent(i)를 홀수,짝수 순서에 따라 E(a) or E(b)로 segment embedding 한다.
    1 2 sentence -> [sent1,sent2,sent3,sent4,sent5] embedding -> [E(a),E(b),E(a),E(b),E(a)]
Extractive Summarization
- 이전 연구와 수식 동일
Abstractive Summarization
- the encoder is the pretrained BERTSUM and the decoder is a 6-layered Transformer initialized randomly.
  - encoder & decoder 사이의 mismatch 가능성 있음
    - new fine-tuning schedule!
- difference optimizer 적용!

two-stage fine-tuning

1. fine-tune the encoder on the extractive summarization
2. fine-tune it on the abstractive summarization

using extractive objectives can boost the performance of abstractive summarization.