沢田慶, 藤田達也, 三井健太郎, 法野行哉, 若月駿尭, 石川翔, オーカールターレック, 陳心琪, “テキスト・音声・動画生成を活用した低コストでスケーラブルなリアルタイム音声対話”, 第102回 言語・音声理解と対話処理研究会, pp. 157, 2024年11月.
研究開発
私たちはAIの民主化を目指して
日々研究開発に
取り組んでいます。
rinna株式会社は「人とAIの共創世界」をビジョンに掲げ、人と人との間にAIが介在することによる豊かなコミュニケーションを通して、すべての人が自分らしい創造性を発揮できる社会の実現を目指しています。AIの社会実装が進む中で、誰もが気軽にAIを使える世界を目指す「AIの民主化」という考えが普及し、世界中の研究機関が成果を公開しAI技術の発展に貢献しています。rinna株式会社も「AIの民主化」という考えに共感し、研究成果を積極的に発表しています。このページでは、これまでに公開してきた研究成果をまとめています。
rinna株式会社では学術論文の発表に加え、大規模事前学習モデルの公開や技術ブログ・SNSを用いた情報発信により、AI研究・開発コミュニティの発展を目指した活動を行っています。
・Hugging Face - https://huggingface.co/rinna
・GitHub - https://github.com/rinnakk
・Zenn - https://zenn.dev/p/rinna
・Twitter (X) - https://twitter.com/rinna_research
今後もテキスト・音声・画像を中心に先進的な研究開発をし、社会実装を進めていきます。
研究に関するお問い合わせはこちらのフォームからご連絡ください。
For inquiries regarding research, please contact us through this form.
Kentaro Mitsui, Koh Mitsuda, Toshiaki Wakatsuki, Yukiya Hono, Kei Sawada, “PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems”, Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 2692-2700, November 2024. [Paper] [Demo] [arXiv]
法野行哉, “深層学習に基づく歌声合成・波形生成の進展と展望”, 音声研究会・音声言語情報処理研究会, 2024年10月.
Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada, “Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition”, Findings of the Association for Computational Linguistics: ACL 2024, pp. 13289-13305, August 2024. [Paper] [GitHub] [Hugging Face][arXiv]
Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda, “Release of Pre-Trained Models for the Japanese Language”, The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 13898-13905, May 2024. [Paper] [Hugging Face][arXiv]
Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura, “Focused Prefix Tuning for Controllable Text Generation”, Journal of Natural Language Processing, Volume 31, Issue 1, pp. 250-265, March 2024. [Paper]
三井健太郎, 法野行哉, 沢田慶, “AIエージェント間の自然な会話に向けたテキストからの音声対話生成”, 日本音響学会第151回 (2024年春季) 研究発表会, pp. 1327-1330, 2024年3月. [Slide]
法野行哉, 光田航, 趙天雨, 三井健太郎, 若月駿尭, 沢田慶, “自己教師あり学習に基づく音声・言語モデルを統合したEnd-to-End音声認識”, 日本音響学会第151回 (2024年春季) 研究発表会, pp. 1323-1326, 2024年3月.
沢田慶, 法野行哉, 三井健太郎, “自己教師あり学習を用いた日本語事前学習モデルと音声認識・合成への応用”, 日本音響学会第151回 (2024年春季) 研究発表会, pp. 1319-1320, 2024年3月.
Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara, “Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks”, The 14th International Workshop on Spoken Dialogue Systems Technology (IWSDS 2024), March 2024. [arXiv]
Yuya Chiba, Koh Mitsuda, Akinobu Lee, Ryuichiro Higashinaka, “The Remdis Toolkit: Building Advanced Real-time Multimodal Dialogue Systems with Incremental Processing and Large Language Models”, The 14th International Workshop on Spoken Dialogue Systems Technology (IWSDS 2024), March 2024. [GitHub]
沢田慶, “世界を繋ぐAI技術:言語やモダリティを跨いだ大規模言語モデル”, 音声研究会・音声言語情報処理研究会, 2024年1月.
千葉祐弥, 光田航, 李晃伸, 東中竜一郎, “Remdis: リアルタイムマルチモーダル対話システム構築ツールキット”, 第99回言語・音声理解と対話処理研究会, pp. 25-30, 2023年12月. [GitHub]
三井健太郎, 松浦孝平, “国際会議Interspeech2023参加報告”, 第258回自然言語処理・第149回音声言語情報処理合同研究発表会, 2023年12月. [Slide]
Kentaro Mitsui, Yukiya Hono, Kei Sawada, “Towards Human-Like Spoken Dialogue Generation Between AI Agents from Written Dialogue”, arXiv preprint arXiv:2310.01088, October 2023. [arXiv] [Demo]
Kentaro Mitsui, Yukiya Hono, Kei Sawada, “UniFLG: Unified Facial Landmark Generator from Text or Speech”, The 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023), pp. 5501-5505, August 2023. [Paper] [Demo] [arXiv]
Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura, “Focused Prefix Tuning for Controllable Text Generation”, The 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023), pp. 1116-1127, July 2023. [Paper] [arXiv]
三井健太郎, 法野行哉, 沢田慶, “End-to-End音声合成を利用したテキストまたは音声からの統合的な顔ランドマーク生成”, 日本音響学会第149回 (2023年春季) 研究発表会, pp. 763-766, 2023年3月.
AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio Okura, “Text-Guided Scene Sketch-to-Photo Synthesis”, arXiv preprint arXiv:2302.06883, February 2023. [arXiv]
沢田慶, シーン誠, 趙天雨, “日本語におけるAIの民主化を目指した事前学習モデルの公開”, 第96回 言語・音声理解と対話処理研究会, pp. 165-166, 2022年12月.
Divesh Lala, Koji Inoue, Tatsuya Kawahara, Kei Sawada, “Backchannel Generation Model for a Third Party Listener Agent”, The 10th International Conference on Human-Agent Interaction (HAI 2022), pp. 114-122, December 2022. [Paper]
Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda, “End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue”, The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022), pp. 2328-2332, September 2022. [Paper] [Demo] [arXiv]
Kentaro Mitsui, Kei Sawada, “MSR-NV: Neural Vocoder Using Multiple Sampling Rates”, The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022), pp. 798-802, September 2022. [Paper] [Demo] [arXiv]
三井健太郎, 趙天雨, 沢田慶, 法野行哉, 南角吉彦, 徳田恵一, “自発的対話を用いた潜在スタイル表現の抽出・予測に基づく音声合成”, 日本音響学会第148回 (2022年秋季) 研究発表会, pp. 1593-1596, 2022年9月.
三井健太郎, 沢田慶, “テキストを入力とする音声・顔ランドマーク系列の同期生成”, 日本音響学会第148回 (2022年秋季) 研究発表会, pp. 1191-1194, 2022年9月.
沢田慶, シーン誠, 三井健太郎, 趙天雨, “ディープラーニングの活用:AI × キャラクターによる新しいゲームの世界”, コンピュータエンターテインメントデベロッパーズカンファレンス2022 (CEDEC2022), 2022年8月. [Slide]
シーン誠, 趙天雨, 沢田慶, “日本語における言語画像事前学習モデルの構築と公開”, 第25回 画像の認識・理解シンポジウム (MIRU2022), 2022年7月. [GitHub]
三井健太郎, 沢田慶, “MSR-NV: 複数サンプリングレートを用いたニューラルボコーダの検討”, 日本音響学会2022年春季研究発表会, pp. 931-934, 2022年3月.
趙天雨, 沢田慶, “日本語自然言語処理における事前学習モデルの公開”, 第93回 言語・音声理解と対話処理研究会, pp. 169-170, 2021年11月. [Paper] [GitHub]
沢田慶, “身近になった対話システム:4.一般ユーザとの雑談会話のためのAIチャットボット”, 情報処理, pp. e19-e23, 2021年9月. [Paper]
Tianyu Zhao, Tatsuya Kawahara, “Multi-Referenced Training for Dialogue Response Generation”, The 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2021), pp. 190-201, July 2021. [Paper] [GitHub] [arXiv]
Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li, “Open Domain Dialogue Generation with Latent Images”, The 35th AAAI Conference on Artificial Intelligence (AAAI-21), pp. 14239-14247, February 2021. [Paper] [arXiv]
Linxiao Li, Can Xu, Wei Wu, Yufan Zhao, Xueliang Zhao, Chongyang Tao, “Zero-Resource Knowledge-Grounded Dialogue Generation”, The Thirty-fourth Annual Conference on Neural Information Processing Systems (NeurIPS 2020), December 2020. [Paper] [GitHub] [arXiv]
Yufan Zhao, Can Xu, Wei Wu, Lei Yu, “Learning a Simple and Effective Model for Multi-Turn Response Generation with Auxiliary Tasks”, The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 3472-3483, November 2020. [Paper] [arXiv]
Xueliang Zhao, Wei Wu, Can Xu, Chongyang Tao, Dongyan Zhao, Rui Yan, “Knowledge-Grounded Dialogue Generation with Pre-Trained Language Models”, The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 3377-3390, November 2020. [Paper] [arXiv]
Ze Yang, Wei Wu, Can Xu, Xinnian Liang, Jiaqi Bai, Liran Wang, Wei Wang, Zhoujun Li, “StyleDGPT: Stylized Response Generation with Pre-Trained Language Models”, Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1548-1559, November 2020. [Paper] [arXiv]
Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, “Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis”, The 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020), pp. 3441-3445, October 2020. [Paper] [Demo] [arXiv]
法野行哉, 坪井一菜, 沢田慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一, “階層化多重粒度生成モデルを用いた表現豊かな音声合成”, 日本音響学会2020年秋季研究発表会, pp. 791-794, 2020年9月.
Chongyang Tao, Wei Wu, Can Xu, Yansong Feng, Dongyan Zhao, Rui Yan, “Improving Matching Models with Hierarchical Contextualized Representations for Multi-Turn Response Selection”, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), pp. 1865-1868, July 2020. [Paper] [arXiv]
Xueliang Zhao, Wei Wu, Chongyang Tao, Can Xu, Dongyan Zhao, Rui Yan, “Low-Resource Knowledge-Grounded Dialogue Generation”, The Eighth International Conference on Learning Representations (ICLR 2020), May 2020. [Paper] [arXiv]
三井健太郎, 法野行哉, 坪井一菜, 沢田慶, “カスケード構造を用いた音声パラメータ予測に基づく統計的パラメトリック音声合成”, 日本音響学会2020年春季研究発表会, pp. 1107-1108, 2020年3月.
Ze Yang, Can Xu, Wei Wu, Zhoujun Li, “Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation”, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pp. 5077-5089, November 2019. [Paper] [arXiv]
Ze Yang, Wei Wu, Jian Yang, Can Xu, Zhoujun Li, “Low-Resource Response Generation with Template Prior”, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pp. 1886-1897, November 2019. [Paper] [GitHub] [arXiv]
Jia Li, Chongyang Tao, Wei Wu, Yansong Feng, Dongyan Zhao, Rui Yan, “Sampling Matters! An Empirical Study of Negative Sampling Strategies for Learning of Matching Models in Retrieval-Based Dialogue Systems”, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), pp. 1291-1296, November 2019. [Paper]
坪井一菜, 沢田慶, AIりんな, “AI「りんな」のボイストレーニング”, コンピュータエンターテインメントデベロッパーズカンファレンス2019 (CEDEC2019), 2019年9月. [Slide]
Xueliang Zhao, Chongyang Tao, Wei Wu, Can Xu, Dongyan Zhao, Rui Yan, “A Document-Grounded Matching Network for Response Selection in Retrieval-Based Chatbots”, The Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019), pp. 5443-5449, August 2019. [Paper] [arXiv]
Can Xu, Wei Wu, Chongyang Tao, Huang Hu, Matt Schuerman, Ying Wang, “Neural Response Generation with Meta-Words”, The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pp. 5416-5426, July 2019. [Paper] [arXiv]
Jiazhan Feng, Chongyang Tao, Wei Wu, Yansong Feng, Dongyan Zhao, Rui Yan, “Learning a Matching Model with Co-Teaching for Multi-Turn Response Selection in Retrieval-Based Dialogue Systems”, The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pp. 3805-3815, July 2019. [Paper] [arXiv]
Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, Rui Yan, “One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues”, The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), pp. 1-11, July 2019. [Paper]
沢田慶, 坪井一菜, Xianchao Wu, Zhan Chen, 法野行哉, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一, “AI歌手りんな:ユーザ歌唱や楽譜を入力とする歌声合成システム”, 日本音響学会2019年春季研究発表会, pp. 1041-1044, 2019年3月.
Yu Wu, Wei Wu, Chen Xing, Can Xu, Zhoujun Li, Ming Zhou, “A Sequential Matching Framework for Multi-Turn Response Selection in Retrieval-Based Chatbots”, Computational Linguistics, Volume 45, Issue 1, pp. 163-197, March 2019. [Paper] [arXiv]
高木信二, 安藤厚志, 越智景子, 沢田慶, 塩田さやか, 鈴木雅之, 玉森聡, 俵直弘, 福田隆, 増村亮, “国際会議Interspeech2018報告”, 第126回音声言語情報処理研究発表会 (SIG-SLP), 2019年2月. [Paper]
Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, Rui Yan, “Multi-Representation Fusion Network for Multi-Turn Response Selection in Retrieval-Based Chatbots”, The Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19), pp. 267-275, January 2019. [Paper] [GitHub]
Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou, “Response Selection with Topic Clues for Retrieval-Based Chatbots”, Neurocomputing, Volume 316, pp. 251-261, November 2018. [Paper] [arXiv]
Huang Hu, Xianchao Wu, Bingfeng Luo, Chongyang Tao, Can Xu, Wei Wu, Zhan Chen, “Playing 20 Question Game with Policy-Based Reinforcement Learning”, 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), pp. 3233-3242, October 2018. [Paper] [arXiv]
Kei Sawada, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, “The NITech Text-to-Speech System for the Blizzard Challenge 2018”, Blizzard Challenge 2018 Workshop, September 2018. [Paper] [Demo]
Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou, “Learning Matching Models with Weak Supervision for Response Selection in Retrieval-Based Chatbots”, The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), pp. 420-425, July 2018. [Paper] [arXiv]
Chongyang Tao, Shen Gao, Mingyue Shang, Wei Wu, Dongyan Zhao, Rui Yan, “Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism”, The 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18), pp. 4418-4424, July 2018. [Paper]
Can Xu, Wei Wu, Yu Wu, “Towards Explainable and Controllable Open Domain Dialogue Generation with Dialogue Acts”, arXiv preprint arXiv:1807.07255, July 2018. [arXiv]
Xianchao Wu, Ander Martinez, Momo Klyen, “Dialog Generation Using Multi-Turn Reasoning Neural Networks”, The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), pp. 2049-2059, June 2018. [Paper]
Xianchao Wu, Huang Hu, Momo Klyen, Kyohei Tomita, Zhan Chen, “Q20: Rinna Riddles Your Mind by Asking 20 Questions”, 言語処理学会第24回年次大会 (NLP2018), pp. 1312-1315, 2018年3月. [Paper]
Xianchao Wu, Huang Hu, “Evaluating Rinna’s Mind-Reading Feature by Self-Playing”, 言語処理学会第24回年次大会 (NLP2018), pp. 1235-1238, 2018年3月. [Paper]
Chen Xing, Wei Wu, Yu Wu, Ming Zhou, Yalou Huang, Wei-Ying Ma, “Hierarchical Recurrent Attention Network for Response Generation”, The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 5610-5617, February 2018. [arXiv]
Yu Wu, Wei Wu, Dejian Yang, Can Xu, Zhoujun Li, Ming Zhou, “Neural Response Generation with Dynamic Vocabularies”, The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 5594-5601, February 2018. [arXiv]
Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou, “Knowledge Enhanced Hybrid Neural Network for Text Matching”, The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 5586-5593, February 2018. [arXiv]
Xianchao Wu, Hang Tong, Momo Klyen, “Fine-Grained Sentiment Analysis with 32 Dimensions”, The 21st International Conference on Asian Language Processing (IALP 2017), December 2017. [Paper]
呉先超, 藤原敬三, 飯田勝也, 冨田恭平, 中島りか, “りんなのキャラボックス: 雑談から商品推薦まで”, 第81回言語・音声理解と対話処理研究会, pp. 62-65, 2017年10月. [Paper]
Yu Wu, Wei Wu, Chen Xing, Ming Zhou, Zhoujun Li, “Sequential Matching Network: A New Architecture for Multi-Turn Response Selection in Retrieval-Based Chatbots”, The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), pp. 496-505, July 2017. [Paper] [arXiv]
Xianchao Wu, Momo Klyen, Kazushige Ito, Zhan Chen, “Haiku Generation Using Deep Neural Networks”, 言語処理学会第23回年次大会 (NLP2017), pp. 1133-1136, 2017年3月. [Paper]
Xianchao Wu, Yuichiro Kikura, Momo Klyen, Zhan Chen, “Sentiment Analysis with Eight Dimensions for Emotional Chatbots”, 言語処理学会第23回年次大会 (NLP2017), pp. 791-794, 2017年3月. [Paper]
呉先超, 伊藤和重, 飯田勝也, 坪井一菜, クライアン桃, “りんな:女子高生人工知能”, 言語処理学会第22回年次大会 (NLP2016), pp. 306-309, 2016年3月. [Paper]