{"id":523,"date":"2023-06-13T10:09:41","date_gmt":"2023-06-13T01:09:41","guid":{"rendered":"https:\/\/slp.cs.tut.ac.jp\/?page_id=523"},"modified":"2025-05-07T14:19:29","modified_gmt":"2025-05-07T05:19:29","slug":"project","status":"publish","type":"page","link":"https:\/\/slp.cs.tut.ac.jp\/en\/project\/","title":{"rendered":"RESEARCH"},"content":{"rendered":"\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\">Spoken Language Processing \/ Multimodal Interaction<\/h2>\n\n\n\n<p>Research in spoken language information processing, with a focus on speech recognition, and interaction using other modalities.<br>Speech is the most natural modality (means) of human communication. Our research aims to scientifically know, analyse and process speech, as well as to engineer the human ability to communicate.<br>Furthermore, we focus on applying this to the construction of future spoken dialogue and multimodal interaction systems.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=VoBj6cMZCW0\">YouTube (Japanese)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"\u3010\u60c5\u5831\u30fb\u77e5\u80fd\u5de5\u5b66\u3011\u97f3\u58f0\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u5ba4\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/VoBj6cMZCW0?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Large Vocabulary Continuous Speech Recognition<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"695\" height=\"840\" src=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/latestEndToEnd.png\" alt=\"\u6700\u65b0\u306e\u97f3\u58f0\u8a8d\u8b58\u6280\u8853\" class=\"wp-image-386\" srcset=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/latestEndToEnd.png 695w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/latestEndToEnd-248x300.png 248w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/><\/figure>\n\n\n\n<p>There are many situations where large-vocabulary continuous speech recognition is expected to be applied, for example, in the transcription of speech such as lecture speech.<br>In recent years, research into end-to-end speech recognition using deep learning models has progressed.<br>Various aspects, such as model improvement and methods for applying language models, are used to improve the accuracy of such models.<\/p>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Elderly Voice Recognition<\/strong><\/h2>\n\n\n\n<p>The so-called information-weak benefit from speech recognition and spoken dialogue.<br>It should be particularly useful for the elderly, who find it difficult to handle equipment due to unfamiliarity with information devices or reduced physical function.<br>However, research on speech recognition for the elderly has not progressed.<br>We are researching how to build a speech recognition system that can be used by the elderly, starting with a steady collection of elderly speech.<\/p>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Spoken Dialogue Interfaces (1)<br>&#8211; Friendly Interaction<\/strong> &#8211;<\/h2>\n\n\n\n<p>How do ordinary users become familiar with spoken dialogue interfaces?<br>When they try it out, they find it hard to get a response, and they don&#8217;t know whether they are being heard or not, which is a barrier.<br>We are therefore trying to break down these barriers by creating a system that responds in real time, is attuned to the &#8216;excitement&#8217; of the dialogue, and makes talking itself enjoyable.<br>We are also researching understanding methods that can robustly respond to all kinds of speech and quickly recover from confusion caused by misrecognition or misunderstanding.<\/p>\n\n\n\n<p><a href=\"https:\/\/youtu.be\/AWiMZRwcJAw\">YouTube (Japanese)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"\u97f3\u58f0\u5bfe\u8a71\u30b7\u30b9\u30c6\u30e0\uff08\u5929\u6c17\u306e\u8a71\uff09\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/AWiMZRwcJAw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/aei-saya.jp\/\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1006\" height=\"483\" src=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/photoRealCGAgentDS.png\" alt=\"\u30d5\u30a9\u30c8\u30ea\u30a2\u30ebCG\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3068\u306e\u5bfe\u8a71\u30b7\u30b9\u30c6\u30e0\" class=\"wp-image-387\" srcset=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/photoRealCGAgentDS.png 1006w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/photoRealCGAgentDS-300x144.png 300w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/photoRealCGAgentDS-768x369.png 768w\" sizes=\"auto, (max-width: 1006px) 100vw, 1006px\" \/><\/a><figcaption class=\"wp-element-caption\">Artificial Emotional Intelligence &#8220;Saya&#8221;<\/figcaption><\/figure>\n\n\n\n<p class=\"has-text-align-right\"><a href=\"https:\/\/aei-saya.jp\/\" target=\"_blank\" rel=\"noreferrer noopener\">Artificial Emotional Intelligence &#8220;Saya&#8221;<\/a><\/p>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Spoken Dialogue Interfaces (2)<br>&#8211; Application To The Field Of Medicine<\/strong> &#8211;<\/h2>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\">In the medical field, there is a need to immediately reflect what is heard in medical records and to collect dialogue with patients as a source of information.<br>The application of speech recognition and dialogue technology to improve the efficiency of medical practice in these and other situations is being studied as part of joint research with hospitals.<\/p>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"><a rel=\"noreferrer noopener\" href=\"https:\/\/youtu.be\/clsAdBymcaE\" target=\"_blank\">YouTube (Japanese)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"\u30b9\u30de\u30fc\u30c8\u30db\u30b9\u30d4\u30bf\u30eb\uff08\u8c4a\u6a4b\u30cf\u30fc\u30c8\u30bb\u30f3\u30bf\u30fc\u00d7\u8c4a\u6a4b\u6280\u79d1\u5927\uff09\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/clsAdBymcaE?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"947\" height=\"452\" src=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/electronicChart.png\" alt=\"\u97f3\u58f0\u5165\u529b\u306b\u3088\u308b\u96fb\u5b50\u30ab\u30eb\u30c6\u5165\u529b\u652f\u63f4\" class=\"wp-image-388\" srcset=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/electronicChart.png 947w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/electronicChart-300x143.png 300w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/electronicChart-768x367.png 768w\" sizes=\"auto, (max-width: 947px) 100vw, 947px\" \/><\/figure>\n\n\n\n<p>SmartHospital<\/p>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Spoken Dialogue Interfaces (3)<br>&#8211; Interfaces That Work Naturally &#8211;<\/strong><\/h2>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\">For such an interface, we aim to build a system that senses when you are talking to it and responds to your calls, even though you are usually unaware of its presence.<br>The aim is to create such an interface that responds immediately to a call.<\/p>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"><a rel=\"noreferrer noopener\" href=\"https:\/\/youtu.be\/dBeKEOTHr-g\" target=\"_blank\">YouTube (Japanese)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"\u30b4\u30a8\u30e2\u30f3\u691c\u51fa\u3010\u8c4a\u6a4b\u6280\u79d1\u5927\/\u5fb3\u5cf6\u5927\u5317\u5ca1\u7814\u3011\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/dBeKEOTHr-g?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Multimodal Interface<\/strong><\/h2>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\">We are trying to use a multimodal interface, mainly spoken dialogue, as a means of accessing a variety of information on the network at any time.<br>The key is how to combine this with pen input, touch panels and pointing movements.<br>Using a multimodal interface, a variety of things are possible.<br>For example, when answering a geometry problem in mathematics, people use their voice and fingers.<br>If the answer is given by voice and pointing, the system will turn it into a proof text.<\/p>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"><a rel=\"noreferrer noopener\" href=\"https:\/\/youtu.be\/NL37dvBUsmY\" target=\"_blank\">YouTube (Japanese)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Multimodal communication enabled autonomous vehicle\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/Mesx4qgONqs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\">Ultimately, also operating automated vehicles.<\/p>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"><a rel=\"noreferrer noopener\" href=\"https:\/\/youtu.be\/EkVdZUEEFwM\" target=\"_blank\">YouTube (Japanese)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Multimodal communication enabled autonomous vehicle\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/Mesx4qgONqs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"><a href=\"https:\/\/youtu.be\/Mesx4qgONqs\" target=\"_blank\" rel=\"noreferrer noopener\">YouTube (English)<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Multimodal communication enabled autonomous vehicle\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/Mesx4qgONqs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-embed-aspect-16-9 wp-has-aspect-ratio\"><a rel=\"noreferrer noopener\" href=\"https:\/\/www.google.com\/url?q=https%3A%2F%2Fjidounten-lab.com%2Fu-nagoya-tokushima-aisin-autonomous&amp;sa=D&amp;sntz=1&amp;usg=AOvVaw3uDb2xvmyfwhGRpQ2pgt0M\" target=\"_blank\">Web Article<\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Multimodal communication enabled autonomous vehicle\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/Mesx4qgONqs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\"><strong>Natural And Expressive Speech Synthesis<\/strong><\/h2>\n\n\n\n<p>Even if a variety of inputs are possible, a natural dialogue cannot be established if the system responds unnaturally.<br>It is desirable for speech synthesis to be of such high quality that it is indistinguishable from human speech, but also to be able to express individuality and emotions.<br>Therefore, we are aiming for natural and expressive speech synthesis by controlling prosody (voice intensity, such as accents, and high\/low pitch) and learning emotional speech.<\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" href=\"http:\/\/www.google.com\/url?q=http%3A%2F%2Fwww.slp.cs.tut.ac.jp%2Ftts_demo%2F&amp;sa=D&amp;sntz=1&amp;usg=AOvVaw0mt6laqGnfBYbcITHG0L8C\" target=\"_blank\">Demo Site<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"http:\/\/www.google.com\/url?q=http%3A%2F%2Fwww.slp.cs.tut.ac.jp%2Ftts_demo%2F&amp;sa=D&amp;sntz=1&amp;usg=AOvVaw0mt6laqGnfBYbcITHG0L8C\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"651\" src=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/kenkyuDemoSite-1024x651.png\" alt=\"\" class=\"wp-image-451\" srcset=\"https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/kenkyuDemoSite-1024x651.png 1024w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/kenkyuDemoSite-300x191.png 300w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/kenkyuDemoSite-768x488.png 768w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/kenkyuDemoSite-1536x976.png 1536w, https:\/\/slp.cs.tut.ac.jp\/wp-content\/uploads\/2023\/05\/kenkyuDemoSite.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n<\/div>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading ribbon\">Teaching Materials prepared by Lab Students<\/h2>\n\n\n\n<p>This page introduces the teaching material \u201cActual Dialogue with ChatGPT\u201d prepared by our lab students, Kumagai, Odom, and Oda.<br>This is a computer software \u201cSpoken Dialogue System\u201d that can interact with humans by voice using speech recognition, ChatGPT and speech synthesis.<br><br><a href=\"https:\/\/www.mirai-kougaku.jp\/laboratory\/pages\/250214_02.php\">Actual Dialogue with ChatGPT<\/a><\/p>\n<\/div>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>More infomation, <a rel=\"noreferrer noopener\" href=\"http:\/\/www.google.com\/url?q=http%3A%2F%2Fsites.google.com%2Fsite%2Fnorihidekitaokashome%2Fresearch-topics%2Fresearch-topics-in-english&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEg7dTE48HfgoyqipViaONHZ01Fhw\" target=\"_blank\">Click here<\/a>.<\/p>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Spoken Language Processing \/ Multimodal Interaction Research in spoken language information processing, with a focus on speech recognition, and interaction using other modalities.Speech is the most natural modality (means) of human communication. Our research aims to scientifically know, analyse and process speech, as well as to engineer the human ability to communicate.Furthermore, we focus on [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"wp-custom-template-en%e5%9b%ba%e5%ae%9a%e3%83%9a%e3%83%bc%e3%82%b8","meta":{"_locale":"en_US","_original_post":"https:\/\/slp.cs.tut.ac.jp\/?page_id=54","footnotes":""},"class_list":["post-523","page","type-page","status-publish","hentry","en-US"],"_links":{"self":[{"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/pages\/523","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/comments?post=523"}],"version-history":[{"count":13,"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/pages\/523\/revisions"}],"predecessor-version":[{"id":1885,"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/pages\/523\/revisions\/1885"}],"wp:attachment":[{"href":"https:\/\/slp.cs.tut.ac.jp\/wp-json\/wp\/v2\/media?parent=523"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}