딥러닝 모델을 이용한 수화 교육 웹 어플리케이션-Handlang(1)

Posted Aug 26, 2020

By Yonsoo Kim

2 min read

DSC EWHA에서 2019.9~2020.8 까지 진행한 팀프로젝트로, 딥러닝 모델을 이용한 수화 학습 웹 어플리케이션입니다. 이 포스팅에서는 수화 인식 딥러닝 모델에 대해서만 다룹니다.

Handlang - ASL(American Sign Language) Education by using deep learning model

딥러닝으로 학습된 수화 인식 모델을 바탕으로 알파벳, 숫자에 해당되는 수화를 학습 및 연습 할 수 있는 웹 어플리케이션입니다.

모델 정확도 개선을 위한 여러 시도들

[About models]

YOLO darknet

YOLO is the model with excellent performance in Object detection. We used darkflow, not yolo darknet, to take advantage of tensorflow.

https://github.com/thtrieu/darkflow

The most attempts were made at darkflow.

YOLO_experiment_1
- [a~z] 600 images each. training 500 epochs
- acc : 0.42

Feedback -> Predict performance is poor.

YOLO_experiment_2
- pretrained weight - hand tracking model (https://github.com/Abdul-Mukit/dope_with_hand_tracking)
- a~y 600 images each. 140 epochs
- acc : 0.47
YOLO_experiment_3
- pretrained weight - yolov2-tiny.weight(https://pjreddie.com/darknet/yolov2/)
- a~y 600 images each. 220 epochs
- acc : 0.56

Feedback -> It is still not a satisfactory performance.

Inception-v3

We tried transfer learning by using inception-v3.

a~y 600 images each. 1000 steps
test acc. : about 88%
- but not that much at real-time…

Tensorflow-Object-Detection-API

We tried transfer learning by using fast r-cnn.

a~y 600 images each. 6000 steps
test acc. : about 80%
- not test by images, but test by webcam.
  - poor performance

Self-feedback: It is still not a satisfactory performance.

Finally Custom CNN model(our current model)!

  
handlang_model = Sequential()
handlang_model.add(Conv2D(64, kernel_size=4, strides=1, activation='relu', input_shape=target_dims))
handlang_model.add(Conv2D(64, kernel_size=4, strides=2, activation='relu'))
handlang_model.add(Dropout(0.5))
handlang_model.add(Conv2D(128, kernel_size=4, strides=1, activation='relu'))
handlang_model.add(Conv2D(128, kernel_size=4, strides=2, activation='relu'))
handlang_model.add(Dropout(0.5))
handlang_model.add(Conv2D(256, kernel_size=4, strides=1, activation='relu'))
handlang_model.add(Conv2D(256, kernel_size=4, strides=2, activation='relu'))
handlang_model.add(Flatten())
handlang_model.add(Dropout(0.5))
handlang_model.add(Dense(512, activation='relu'))
handlang_model.add(Dense(n_classes, activation='softmax'))

handlang_model.summary()

handlang_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])

Feedback : 위 모델을 사용하고, 웹 코드 내에서의 trick을 이용하여 조금 더 빠른 인식과 높은 정확도를 가질 수 있었음.

Datasets

아래 데이터 셋들은 모델 트레이닝에 사용되었습니다.

참고로, 우리 모델에서는 알파벳 i,z 제외했습니다. (손동작이 포함되었기 때문에)

https://www.kaggle.com/grassknoted/asl-alphabet
- 가장 성능이 좋은 모델에 사용된 데이터 셋입니다.

다른 모델에서는 아래의 데이터셋들을 사용했습니다.

Team Handlang

Project Github Link

Side Project

This post is licensed under CC BY 4.0 by the author.