Fine-tune BERT for Extractive Summarization
Fine-tune BERT for Extractive Summarization 를 읽고 정리한 글입니다.
Fine-tune BERT for Extractive Summarization
Summarization의 종류
- abstractive summarization : contains words or phrases that were not in the original text…
- paraphrasing와 유사
- extractive summarization : by copying and concatenating the most important spans in a document
- 중요한 문장을 copy & paste 하는 것으로 서머리를 만듦.
- 이 논문에서는 후자인 extractive summarization을 사용
Data
- CNN/Dailymail
- NYT
Method
Encoding Multiple Sentences
- insert a [CLS] token before each sentence and a [SEP] token after each sentence
- In vanilla BERT, [CLS] is used as a symbol to aggregate features from one sentence or a pair of sentences.
Interval Segment Embeddings
- sent(i)를 홀수,짝수 순서에 따라 E(a) or E(b)로 segment embedding 한다.
1
2
sentence -> [sent1,sent2,sent3,sent4,sent5]
embedding -> [E(a),E(b),E(a),E(b),E(a)]
Fine-tuning with Summarization Layers
- simple classifier : sigmoid function
- inter-sentence transformer
- extracting document-level features focusing on summarization tasks from the BERT outputs
- 공식에 layer normalization과 multi-head attention operation를 이용한다 하는데 MHAtt는 더 알아봐야할 듯함.
- Recurrent Neural Network
- RNN이 더 좋게 만들 수 있음(왜지)
- BERT output에 LSTM 레이러를 적용시켜서 summarization-specific features를 학습할 수 있도록 함.
Results
- ROUGE score 사용함
This post is licensed under CC BY 4.0 by the author.