AIX
Introduction
자연스러운 한국어 text to speech application 을 만들기 위해 학습해야하는 음성을 기계가 잘 알아들을 수록 숫자로 바꾸고, 이 숫자를 음성을 구현하기위해 스펙트로그램을 만들고, Encoder와 Decoder 사이에서 Attention이 적절하게 학습하면서 스펙트로그램을 음성으로 만드는 것을 학습해보려 한다.
#### banilla tacotron + griffin_lim (tacotron)
ref : tacotron : https://arxiv.org/abs/1703.10135
![](https://lionrocket.github.io/assets/img/multi.png)
![](https://lionrocket.github.io/assets/img/moon.jpg)
"안녕하세요 문재인 대통령의 음성입니다"
#### multispeaker (deepvoice2 , deepvoice3)
![](https://lionrocket.github.io/assets/img/multi.png)
ref :deepvocie2 ([https://arxiv.org/abs/1705.08947](https://arxiv.org/abs/1705.08947)) (17.05)
![](https://lionrocket.github.io/assets/img/multi2.png)
ref : deepvocie3 ([https://arxiv.org/abs/1710.07654](https://arxiv.org/abs/1710.07654)) (17.10)
![](https://lionrocket.github.io/assets/img/jung.jpg)
![](https://lionrocket.github.io/assets/img/jooha.jpg)
![](https://lionrocket.github.io/assets/img/chun.jpg)
#### banilla dctts
ref : [https://arxiv.org/abs/1710.08969](https://arxiv.org/abs/1710.08969) 17.10
#### 한국어 다화자 데이터셋 테스트(Speaker embedding test)
[https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset)
#### Making artificial sentences for target sentences
"Data analyze for few sample adaptation learing"
![](https://lionrocket.github.io/assets/img/target1.png)
![](https://lionrocket.github.io/assets/img/target2.png)
![](https://lionrocket.github.io/assets/img/target3.png)
'target 데이터 링크 : 청와대 동화' : [http://children.president.go.kr/our_space/fairy_tales.php](http://children.president.go.kr/our_space/fairy_tales.php)
#### wavenet_vocoder+tacotron2
ref : https://arxiv.org/abs/1712.05884 (17.12)
![](https://lionrocket.github.io/assets/img/taco2.png)
![](https://lionrocket.github.io/assets/img/moon.jpg)
Griffin Lim vocoder
WaveNet_vocoder
"이 음성은 보코더를 비교하기 위한 음성입니다"
#### 다중 데이터 정제 시스템 Crowd sourcing 시스템 도입
![](https://lionrocket.github.io /assets/img/drop.png)
![](https://lionrocket.github.io/assets/img/fire.jpg)
![](https://lionrocket.github.io/assets/img/sourcing.png)
#### Lionvoice (difficult sound, tongue-twist)
"seq2seq + korean linguistic feature + Korean Rulebook"
##### works :
president moon + tongue-twist
#### 1.tacotron :
![](https://lionrocket.github.io/assets/img/taco_a.png’ alt=""></div>
<div><img style=)
1. "앞집 안방 장판장은 노란꽃 장판장이고"
2. "내가 그린 구름 그림은 새털 구름 그린 그림이고"
3. "육통 통장 적금통장은 황색 적금 통장이고"
#### 2.Lionvoice :
![](https://lionrocket.github.io/assets/img/lion_a.png)
![](https://lionrocket.github.io/assets/img/lion_c.png)
1. "앞집 안방 장판장은 노란꽃 장판장이고"
2. "내가 그린 구름 그림은 새털 구름 그린 그림이고"
3. "육통 통장 적금통장은 황색 적금 통장이고"
#### very very short target
"synthesise a short-length sentence which was unseen in training dataset"
#### BEFORE
![](https://lionrocket.github.io/assets/img/before_a.png)
![](https://lionrocket.github.io/assets/img/before_b.png)
![](https://lionrocket.github.io/assets/img/before_c.png)
"행복"
"사랑"
"즐거움"
#### AFTER
![](https://lionrocket.github.io/assets/img/after_a.png)
![](https://lionrocket.github.io/assets/img/after_b.png)
![](https://lionrocket.github.io/assets/img/after_c.png)
"행복"
"사랑"
"즐거움"
#### Control the flow in sentence
→ essential function for making audiobook
Normal synthesis: "어느날 갑자기 호랑이가 나타났다"
![](https://lionrocket.github.io/assets/img/control1.png)
Control synthesis1: 어느날V갑자기 호랑이가 나타났다
![](https://lionrocket.github.io/assets/img/sam1.png)
Control synthesis2: 어느날 갑자기V호랑이가 나타났다
![](https://lionrocket.github.io/assets/img/sam2.png)
Control synthesis3: 어V느V날 갑자기 호랑이가 나타났다
![](https://lionrocket.github.io/assets/img/sam3.png)
#### emotional embedding (GST)
![](https://lionrocket.github.io/assets/img/gst.png)
* Normal
* Happpy
* Gloomy
#### waveglow
* WaveGLOW
![](https://lionrocket.github.io/assets/img/waveglow.png)
#### vocoder (clear voice)
we recover the voice close to ground-truth
* Original 김주하
* vocoder 김주하