demo_part3.ipynb 4.61 KB
Newer Older
wl-zhao's avatar
wl-zhao committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multi-Accent and Multi-Lingual Voice Clone Demo with MeloTTS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import torch\n",
    "from openvoice import se_extractor\n",
    "from openvoice.api import ToneColorConverter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialization\n",
    "\n",
    "In this example, we will use the checkpoints from OpenVoiceV2. OpenVoiceV2 is trained with more aggressive augmentations and thus demonstrate better robustness in some cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ckpt_converter = 'checkpoints_v2/converter'\n",
    "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
    "output_dir = 'outputs_v2'\n",
    "\n",
    "tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
    "tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
    "\n",
    "os.makedirs(output_dir, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtain Tone Color Embedding\n",
    "We only extract the tone color embedding for the target speaker. The source tone color embeddings can be directly loaded from `checkpoints_v2/ses` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
qinzy's avatar
qinzy committed
62
    "reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone\n",
Zengyi Qin's avatar
Zengyi Qin committed
63
    "target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)"
wl-zhao's avatar
wl-zhao committed
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use MeloTTS as Base Speakers\n",
    "\n",
    "MeloTTS is a high-quality multi-lingual text-to-speech library by @MyShell.ai, supporting languages including English (American, British, Indian, Australian, Default), Spanish, French, Chinese, Japanese, Korean. In the following example, we will use the models in MeloTTS as the base speakers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from melo.api import TTS\n",
    "\n",
    "texts = {\n",
qinzy's avatar
qinzy committed
84
    "    'EN_NEWEST': \"Did you ever hear a folk tale about a giant turtle?\",  # The newest English base speaker model\n",
wl-zhao's avatar
wl-zhao committed
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
    "    'EN': \"Did you ever hear a folk tale about a giant turtle?\",\n",
    "    'ES': \"El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\",\n",
    "    'FR': \"La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\",\n",
    "    'ZH': \"在这次vacation中,我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景。\",\n",
    "    'JP': \"彼は毎朝ジョギングをして体を健康に保っています。\",\n",
    "    'KR': \"안녕하세요! 오늘은 날씨가 정말 좋네요.\",\n",
    "}\n",
    "\n",
    "\n",
    "src_path = f'{output_dir}/tmp.wav'\n",
    "\n",
    "# Speed is adjustable\n",
    "speed = 1.0\n",
    "\n",
    "for language, text in texts.items():\n",
    "    model = TTS(language=language, device=device)\n",
    "    speaker_ids = model.hps.data.spk2id\n",
    "    \n",
    "    for speaker_key in speaker_ids.keys():\n",
    "        speaker_id = speaker_ids[speaker_key]\n",
    "        speaker_key = speaker_key.lower().replace('_', '-')\n",
    "        \n",
    "        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)\n",
    "        model.tts_to_file(text, speaker_id, src_path, speed=speed)\n",
qinzy's avatar
qinzy committed
109
    "        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'\n",
wl-zhao's avatar
wl-zhao committed
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
    "\n",
    "        # Run the tone color converter\n",
    "        encode_message = \"@MyShell\"\n",
    "        tone_color_converter.convert(\n",
    "            audio_src_path=src_path, \n",
    "            src_se=source_se, \n",
    "            tgt_se=target_se, \n",
    "            output_path=save_path,\n",
    "            message=encode_message)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "melo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
qinzy's avatar
qinzy committed
138
   "version": "3.9.18"
wl-zhao's avatar
wl-zhao committed
139
140
141
142
143
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}