한낮의 천문학

Jooeun Ahn

Love is our sword, Humor is our shield.

AIX

Introduction

자연스러운 한국어 text to speech application 을 만들기 위해 학습해야하는 음성을 기계가 잘 알아들을 수록 숫자로 바꾸고, 이 숫자를 음성을 구현하기위해 스펙트로그램을 만들고, Encoder와 Decoder 사이에서 Attention이 적절하게 학습하면서 스펙트로그램을 음성으로 만드는 것을 학습해보려 한다.

#### banilla tacotron + griffin_lim (tacotron)

ref : tacotron : https://arxiv.org/abs/1703.10135

ref : https://github.com/carpedm20/multi-Speaker-tacotron-tensorflow

ref : https://github.com/keithito/tacotron

"안녕하세요 문재인 대통령의 음성입니다"

#### multispeaker (deepvoice2 , deepvoice3)

ref :deepvocie2 ([https://arxiv.org/abs/1705.08947](https://arxiv.org/abs/1705.08947)) (17.05)

ref : deepvocie3 ([https://arxiv.org/abs/1710.07654](https://arxiv.org/abs/1710.07654)) (17.10)

김상중 : https://www.youtube.com/watch?v=cNeWM-d1cEw

김주하 : https://www.youtube.com/watch?v=extWjhaW_q4

리춘희 :https://www.youtube.com/watch?v=HSqRKrg-USk

#### banilla dctts

ref : [https://arxiv.org/abs/1710.08969](https://arxiv.org/abs/1710.08969) 17.10

ref : [https://github.com/kyubyong/dc_tts](https://github.com/kyubyong/dc_tts)

#### 한국어 다화자 데이터셋 테스트(Speaker embedding test)

[https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset)

#### Making artificial sentences for target sentences

"Data analyze for few sample adaptation learing"

'target 데이터 링크 : 청와대 동화' : [http://children.president.go.kr/our_space/fairy_tales.php](http://children.president.go.kr/our_space/fairy_tales.php)

#### wavenet_vocoder+tacotron2

ref : https://arxiv.org/abs/1712.05884 (17.12)

Griffin Lim vocoder

WaveNet_vocoder

"이 음성은 보코더를 비교하기 위한 음성입니다"

#### 다중 데이터 정제 시스템 Crowd sourcing 시스템 도입

#### Lionvoice (difficult sound, tongue-twist)

"seq2seq + korean linguistic feature + Korean Rulebook"

##### works :

president moon + tongue-twist

https://www.youtube.com/watch?v=j2ALlXgvK3o

#### 1.tacotron :

1. "앞집 안방 장판장은 노란꽃 장판장이고"

2. "내가 그린 구름 그림은 새털 구름 그린 그림이고"

3. "육통 통장 적금통장은 황색 적금 통장이고"

#### 2.Lionvoice :

1. "앞집 안방 장판장은 노란꽃 장판장이고"

2. "내가 그린 구름 그림은 새털 구름 그린 그림이고"

3. "육통 통장 적금통장은 황색 적금 통장이고"

#### very very short target

"synthesise a short-length sentence which was unseen in training dataset"

#### BEFORE

"행복"

"사랑"

"즐거움"

#### AFTER

"행복"

"사랑"

"즐거움"

#### Control the flow in sentence

→ essential function for making audiobook

Normal synthesis: "어느날 갑자기 호랑이가 나타났다"

Control synthesis1: 어느날V갑자기 호랑이가 나타났다

Control synthesis2: 어느날 갑자기V호랑이가 나타났다

Control synthesis3: 어V느V날 갑자기 호랑이가 나타났다

#### emotional embedding (GST)

ref : https://arxiv.org/pdf/1803.09017.pdf

* Normal

* Happpy

* Gloomy

#### waveglow

ref : https://arxiv.org/abs/1811.00002

* WaveGLOW

#### vocoder (clear voice)

we recover the voice close to ground-truth

* Original 김주하

* vocoder 김주하

« Back To List

comments powered by Disqus