2024.12.18

VOCALOID 20th Anniversary Special Interview With “Developers”

Today, we have invited two people who have contributed to the success of VOCALOID for a 20th anniversary interview at the Yamaha headquarters in Hamamatsu City, Shizuoka Prefecture.

Hideki Kemmochi (left) and Masafumi Yoshida (right) have been key people in the advancement of VOCALOID

The meeting of Yamaha technology and text-to-speech synthesis that gave birth to VOCALOID

Q: Now, VOCALOID can boast enormous name recognition. Please tell us, though, about what led to its development and release as a commercial product.

Kenmochi:At the time that the VOCALOID project got started, the production of music using computers was just beginning to become normal, and software samplers and software instruments were finally starting to become usable. Many creators were connecting a lot of hardware instruments to their computers and producing music by inputting notes and data of the songs.

Hideki Kemmochi started the VOCALOID project in 2000

At the time, reproducing the sounds of various instruments and performances was possible through computer input, but reproducing singing was still not possible. There was a plug-in board* called the “PLG100-SG” that had the FS - formant singing - tone generator that could synthesize human singing like a synthesizer, but it was obviously a computer voice and there was still a big gap with human voices. (*An expansion board that could add sounds by installing it in a compatible hardware)

*PLG-100SG:https://jp.yamaha.com/products/music_production/plugins/plg100-sg/index.html

In fact, before beginning the VOCALOID project, I was being sent out to another company where I was creating a system to synthesize speaking voices from text. When I returned to Yamaha after that, I thought maybe we could synthesize singing voices by combining the voice synthesis technologies of Yamaha with my text-to-speech synthesis skills.

In addition, we had a connection with the Universitat Pompeu Fabra in Barcelona and started the DAISY joint research project to synthesize singing voices. That was February 2000.

As the prototype began to take shape, we entered the stage of developing a singing voice database. It became necessary to record a large quantity of high-quality vocal materials to be used as the basis for the singing voice. That is when I called on Yoshida-san.

Yoshida:At that time, I was assigned to a division that was developing surround effects for a digital mixer called the DM2000. I was stationed in a studio at the company and doing large amounts of recording. Then, one day Kemmochi-san asked me to do recording for the project. This was 2002, a year before VOCALOID1 was released. This was also the year of the Korea/Japan World Cup.

Masafumi Yoshida was working on digital mixer development at the time

I had a vocalist come to the Toyooka Factory and spent two or three days repeating the task of recording materials. After that, before the release of VOCALOID, I left Yamaha for a while.

<Photographs of Toyooka Factory exterior and recording studio>

Kenmochi:It took four years until the release of VOCALOID1. There was a lot of trial and error. Even adoption of a piano-roll style for the note input interface was a year after the start of the project. Inputting lyrics on the piano-roll above or near the notes was an idea that resulted from trial and error.

<VOCALOID1 input screen>

Around 2008, things became hectic. Along with VOCALOID2, as you know, VOCALOID became a lot more well-known thanks to the release of Hatsune Miku by Crypton Future Media the year before. Since recording was in-house and became more serious and needed to be done more frequently, I called Yoshida-san. I said, “We have this kind of work. Are you interested?” (Laughs.)

Yoshida:I remember being asked, “Are you free? I heard you quit your job” (at another company). I was already doing some other work, but I was curious, so we met and I heard the details.

Hideki Kemmochi called on Masafumi Yoshida again in 2008

I had been away from VOCALOID for a while, but when I looked into VOCALOID and Hatsune Miku at that time, it seemed interesting. Those days, I had the preconception that “vocals were something sung by people,” so I was shocked to discover a world where that was not true.

Furthermore, what I knew was the sound before VOCALOID1, so I was also surprised how much it had changed with VOCALOID2. With the workplace being the Toyooka Factory, the same as before, I got to be involved in the project again.

The hardest thing was getting people to understand something that had not existed before

Q: Please tell us about something that was difficult in the history of VOCALOID development.

Kenmochi:There were a lot of technical things that were difficult, but the hardest thing was getting people to understand “synthesizing voices” and “having computers sing songs,” creating something that had not existed in the world and getting people to understand its value.

Now, people accept this. “Ah! This is a VOCALOID song.” “That is Hatsune Miku!” At the time, though, it was difficult to get people to understand the idea of computers singing songs. “What is interesting about this?” “How is this useful?” and “ People can sing so why is it necessary to make all this effort to enable computers to sing?” and so on. Not only was it difficult to get understanding in the world at large, but also within the company.

VOCALOID was announced at the 2003 Musikmesse (a trade show that was held every year in Frankfurt Germany with a focus on music production equipment). When we tried to attract people to the display with “Computer sings!” people passing by said, “I can sing!” and kept going. (Laughs.) Getting people to understand it was difficult, but I think we conveyed that we were doing something interesting. The number of participants in our demonstrations kept increasing as the show progressed. We also passed out demo CDs on site.

<Booth at 2003 Musikmesse>

Yoshida:I still have some of those CDs! They are probably the last two still in the company.

<Demo CD distributed at the 2003 Musikmesse>
vocaloid_yamaha · 07 Amazing Grace
vocaloid_yamaha · 01 Kimi No uwasa(Japanese)

After I rejoined in 2008, rather than being in the position of a recording engineer, I was primarily involved in direction. This was hard because direction that was completely different from with ordinary vocal recording was necessary. Since we knew that it was going to be really difficult recording for the performers, as I did my job, I kept thinking about things like how to create conditions that would bring out the best vocals and what kind of snacks to buy.

In normal music recording, you direct understanding the intention to create a musical work. With VOCALOID, however, you have to capture the transitions of consonants and vowels. Since we record special words that have no meaning if you hear these parts, which sound like magic spells or chants, a completely different kind of direction is necessary.

Kenmochi:I wrote they were magic spells at first. Trying various things, that trial and error resulted in a way of recording that we thought would be good and it seems like that is what stuck.

vocaloid_yamaha · V1_articulation

Yoshida:That I was the only one that could judge it was also a lot of pressure. It was not only whether the recording quality was good or bad, imagining and evaluating whether they were suitable as a voice bank materials was also important.

Now, the recording method has changed and magic spells are not used anymore. In the production of voice banks for VOCALOID:AI, which is the new voice synthesis engine included in VOCALOID6 released in 2022, in order to create voice banks based on vocal data, the process became the same as normal vocal recording. In addition to direction from the development perspective as before, direction like a music producer has also become necessary, so the difficulty of the job keeps increasing. (Laughs.)

Furthermore, I have also become involved in product planning, so now I see sales and other numbers. When I was just doing development, I could focus only on “making good sounds.” With profit and sales promotion, for example, I needed to change my perspective greatly. Even now it is difficult. (Laughs.)

VOCALOID and Niconico were born in the same era with miraculous timing

Q: Please tell us about something that was especially memorable since you became involved with VOCALOID.

Kenmochi:Around the time that Crypton Future Media began selling Hatsune Miku, VOCALOID songs were being posted to the Niconico video sharing site with great enthusiasm. At the beginning, I listened to all songs. Then, the number of songs increased, so I had to check them at home. Soon after that, I just did not have enough time. (Laughs.) Realizing that many people were using the technology that I had been working on made me really happy.

At that time, Niconico was really buzzing, but this was not originally because people were creating and posting VOCALOID songs. I think a big factor was that the timings of the appearances of VOCALOID and Niconico were miraculously aligned. The creation of a place where songs created using VOCALOID could be shared led to rapid popularization.

Yoshida:That is right. It felt like when a voice library that I was involved in was released, a bunch of songs using it would be posted one after another that same day. That was a really novel and happy experience.

Even at Yamaha, few products get a reaction and result in artworks quickly

There are a great number of instruments made by Yamaha, but I think that very few instruments get feedback quickly after they are released commercially. With VOCALOID, however, review videos like “a new voice bank is singing” are posted right away. Sometimes there are pretty harsh opinions, but I am happy to get candid impressions and actions right away.

Q: Are there any VOCALOID songs that have been especially memorable?

Kenmochi:There are many, but if I were to pick one, probably “World is Mine.” In the second chorus, the tsun (cool) attitude up to that point suddenly gives way to a dere (warm) character. I thought about what this means. My interpretation was, “Aha. This song depicts an ideal tsun-dere female from a male perspective.” (Tsun-dere is a character type that transforms from aloof to affectionate.) I wept when I understood that. (Laughs.) I became aware that with VOCALOID, it was possible to create female images that are ideal from male perspectives.

Hatsune Miku singing the original song “World is Mine”

https://www.nicovideo.jp/watch/sm3504435

I think that when a human vocalist sings, we are inevitably pulled by the singer so the way we receive the song changes. Of course, it can also be seen that Hatsune Miku is performing as a tsun-dere female, but the fact is that woman does not exist. Considering this, I believe that expressions are possible that are different from human vocalists.

Hideki Kemmochi became aware that expressions were possible that would be difficult for people

Q: What about you, Yoshida-san?

Yoshida:After the release of VY1 (in 2010 as Yamaha’s first voice bank), a cover of “Uninstall” by Chiaki Ishikawa was uploaded.

VY1 - Uninstall (cover)

https://www.nicovideo.jp/watch/sm12018350

VY1 was the first product after changing from recording magic spells, and the flow of the song was smoother compared to products before that. On the other hand, I felt that it was still different from a human singing voice.

But, when I heard “Uninstall,” I thought, “VY1 is amazing! Sounds super real!!!” I remember being really surprised. I am always surprised that creators are using VOCALOID to express in ways that constantly exceed our imaginations.

-As VOCALOID welcomes its 20th anniversary, please tell us how you feel as developers.

Kenmochi:The truth is it feels both like it was a long time and short time. I reflected on what kinds of VOCALOID works there were 20 years ago. There were musical works, there were works that were not music and then there were works that made me chuckle. What they all have in common though was that their creators make content they think is interesting, and from that chaos some come out shining brilliantly… I believe that people with various talents have been applying those talents recklessly, in a good sense, and getting excited about it for 20 years. I am happy that new things will continue to be created at a brisk pace.

Yoshida:Time went by in a flash for me too. There are still many things that I want to do with VOCALOID. With VOCALOID 6, we incorporated VOCALOID:AI as a new singing voice synthesis method, making it possible to output even more natural singing voices, but I feel that is still far from human.

On the other hand, VOCALOID is being created as an electronic instrument, so I want to make ways of singing that are freer and the output of various singing voices possible. As there are people who say, “This is a synthesized sound,” having voices that can pass as human would also be good. I want to make something that is like a department store of vocal sound. I do not like to call things old, but while keeping the voice banks that have been made so far, I think we should also keep making new voice banks and provide something that increases options for creators. I want to keep making proposals like this in the future.

Q: Finally, please share a word with the creators who use VOCALOID to make their works.

Kenmochi:I believe that creators are people who are vastly exceeding the imaginations of the developers and expanding the potentials of these tools to their furthest limits. I am looking forward to having them expand the potentials of vocal synthesis by using it in ways that we cannot even imagine.

Yoshida:With 20 years of history, VOCALOID truly has a large amount of voice banks. We created a web site where people can listen to and compare all the voice banks since VOCALOID1. I will be happy if you can find voice banks that seem like they could sing your own music and that you will be able to use for amazing things. I am looking forward to hearing your works.

Hideki Kemmochi and Masafumi Yoshida say they are looking forward to the continued creation of musical works

Interview and article production: SoundWorksK Marketing LLC