{"id":229,"date":"2021-11-24T19:02:01","date_gmt":"2021-11-24T10:02:01","guid":{"rendered":"https:\/\/shitsukan.jp\/deep\/en\/?page_id=229"},"modified":"2021-12-10T13:52:56","modified_gmt":"2021-12-10T04:52:56","slug":"d01-4-multimodal-deep-learning-model-for-analysis-and-synthesis-of-shitsukan-images-by-disentangling-styles-and-shapes","status":"publish","type":"page","link":"https:\/\/shitsukan.jp\/deep\/en\/?page_id=229","title":{"rendered":"D01-4 Multimodal Deep Learning Model for Analysis and Synthesis of Shitsukan Images by Disentangling Styles and Shapes"},"content":{"rendered":"<div id=\"block-profile\">\n  <figure id=\"profile-fig\">\n    <img class=\"profile-img\" src=\"https:\/\/shitsukan.jp\/deep\/en\/wordpress\/wp-content\/uploads\/2021\/11\/yanai_large_jpeg-Keiji-Yanai.jpg\" \/>\n  <\/figure>\n  <div id=\"profile-text\">\n    <span id=\"profile-name\">Keiji Yanai<\/span>\n    <span id=\"profile-affiliation\">The University of Electro-Communications<\/span>\n  <\/div>\n<\/div>\n\n\n\n<p>In our research, we conduct a study on &#8220;a multimodal deep learning model for Shitsukan recognition and synthesis in images by disentangling texture style and shape features.&#8221; Specifically, (1) from a large amount of paired image and language data, we automatically learn the correnpondence between the texture part of the image and the texture representation of the language, and construct a shared texture embedding space of image texture features and language texture features to realize cross-modal retrieval (recognition) between image and language. (2) Furthermore, by fusing texture embedding vectors with image shape features, we will synthesize images with novel textures. The objective of our research is to propose a deep learning model which can achieve these tasks in a unified manner. The proposed model is expected to enable (a) &#8220;deep&#8221; texture analysis of images and linguistic expressions using a large amount of data, and (b) subtle manipulation of image textures by language.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1200\" height=\"511\" src=\"https:\/\/shitsukan.jp\/deep\/en\/wordpress\/wp-content\/uploads\/2021\/11\/\u6df1\u5965\u8cea\u611f2-Keiji-Yanai-1200x511.png\" alt=\"\" class=\"wp-image-231\" srcset=\"https:\/\/shitsukan.jp\/deep\/en\/wordpress\/wp-content\/uploads\/2021\/11\/\u6df1\u5965\u8cea\u611f2-Keiji-Yanai-1200x511.png 1200w, https:\/\/shitsukan.jp\/deep\/en\/wordpress\/wp-content\/uploads\/2021\/11\/\u6df1\u5965\u8cea\u611f2-Keiji-Yanai-600x255.png 600w, https:\/\/shitsukan.jp\/deep\/en\/wordpress\/wp-content\/uploads\/2021\/11\/\u6df1\u5965\u8cea\u611f2-Keiji-Yanai-768x327.png 768w, https:\/\/shitsukan.jp\/deep\/en\/wordpress\/wp-content\/uploads\/2021\/11\/\u6df1\u5965\u8cea\u611f2-Keiji-Yanai.png 1435w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our research, we conduct a study on &#8220;a multimodal deep learning model for Shitsukan recognition and synthesis in images by disentangling texture style and shape features.&#8221; Specifically, (1) from a large amount of paired image and language data, we automatically learn the correnpondence between the texture part of the image and the texture representation [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":278,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/pages\/229"}],"collection":[{"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=229"}],"version-history":[{"count":1,"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/pages\/229\/revisions"}],"predecessor-version":[{"id":232,"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/pages\/229\/revisions\/232"}],"up":[{"embeddable":true,"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=\/wp\/v2\/pages\/278"}],"wp:attachment":[{"href":"https:\/\/shitsukan.jp\/deep\/en\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=229"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}