Fine-tune BERT for Extractive Summarization

Posted Jul 25, 2020

1 min read

Fine-tune BERT for Extractive Summarization 를 읽고 정리한 글입니다.

Summarization의 종류

abstractive summarization : contains words or phrases that were not in the original text…
- paraphrasing와 유사
extractive summarization : by copying and concatenating the most important spans in a document
- 중요한 문장을 copy & paste 하는 것으로 서머리를 만듦.
이 논문에서는 후자인 extractive summarization을 사용

insert a [CLS] token before each sentence and a [SEP] token after each sentence
- In vanilla BERT, [CLS] is used as a symbol to aggregate features from one sentence or a pair of sentences.

sentence -> [sent1,sent2,sent3,sent4,sent5]
embedding -> [E(a),E(b),E(a),E(b),E(a)]

simple classifier : sigmoid function
inter-sentence transformer
- extracting document-level features focusing on summarization tasks from the BERT outputs
- 공식에 layer normalization과 multi-head attention operation를 이용한다 하는데 MHAtt는 더 알아봐야할 듯함.
Recurrent Neural Network
- RNN이 더 좋게 만들 수 있음(왜지)
- BERT output에 LSTM 레이러를 적용시켜서 summarization-specific features를 학습할 수 있도록 함.

This post is licensed under CC BY 4.0 by the author.