0.9K Views
July 15, 25
スライド概要
第14回 7 月 17日 深層学習フレームワークの基礎と実践1
本講義では深層学習フレームワークのPyTorchを用いて画像認識や自然言語処理の簡単な問題を例に取り
大規模分散学習を行う際に必要な基礎知識について解説する。
第14回は科学技術計算と深層学習の差異を明らかにし
サンプルコードを用いた実例を通して便利な機能ついて紹介する。
R-CCS 計算科学研究推進室
計算科学技術特論A 第14回: 深層学習フレームワークの基礎と実践1 第15回: 深層学習フレームワークの基礎と実践2 東京科学大学 総合研究院 スーパーコンピューティング研究センター 横田理央 [email protected]
主要な深層ニューラルネットモデルの変遷 AlexNet: ReLU, Dropout, GPU Ef cientNet: Neural architecture search Jamba: Mamba+Transformer+MoE LeNet-5:畳み込み ResNet: Skip connection, batch norm Mamba: 状態空間モデル Transformer: 注意機構 Vision Transformer: 画像パッチ LSTM 1995 2012 2015 2017 2019 2021 2023 2025 fi fi https://towardsdatascience.com/from-lenet-to-ef cientnet-the-evolution-of-cnns-3a57eb34672f
深層ニューラルネットの学習 Cross entropy loss batch X 勾配行列は入力と出力微分の直積でできている <latexit sha1_base64="DXpwA2Q1P0/no+sxFbJ6YF7xhOg=">AAAD3nicdZLLbtNAFIanMZcSbiksWWARIbEhikNJYQGqWgRdtFJATVsUh3Q8PkmszsWaGbeNRl7CDrHlDViwhYfhbRhfikjiHsnS8f+fb0bn1wQxjZRut/+s1JwrV69dX71Rv3nr9p27jbV7B0okkkCfCCrkUYAV0IhDX0eawlEsAbOAwmFwsp35h6cgVST4vp7FMGR4wqNxRLC20qjx0GdYTwmmZjd95auEfTIB1mSaPvWpmLjxqNH0Wu283Haru/7s5UbWlMqF1URl9UZrtR9+KEjCgGtCsVIDrx3rocFSR4RCWvcTBTEmJ3gCA9tyzEANTb5J6j62SuiOhbQf126uzhHhaRSrkjm/gJiQUMyquh/C2GaR/5mPQKk466RmZ39vNzXb3a03nbcWIInSgi0fbzBTasaC7Ewbi1r0MrHKGyR6/GJoIh4nGjgp9hgn1NXCzVJ3w0gC0XRmG0xkZKNwyRRLTDTI+VtClW1uY+JwRgRjmIfGlxBqONfpwBsa4xe7WS01TS9NF2YnEmz0i9M9m8m7zKlkAprAEvJBzDDdsk4lMsujXYL+JV6BCIn5ZPmeQq4k7GO1u+AlZK/QK5k4kTG9ZJte7pXY/0/78uag0/K6refv15ubr8tHvooeoEfoCfLQBtpEO6iH+oigz+gn+oV+O8fOF+er860Yra2UzH00V873v+PdVCM=</latexit> L= log p 確率的勾配降下法 (SGD) @L ⌘ @Wt <latexit sha1_base64="BpaiO7b9hbl/rfkDJL7cM+CMhk8=">AAACNHicbVDLSsNAFJ34rO+qSzeDRRDEkqhoN0LRjQsXFawtNCXcTCc6dPJg5kYoIb/iZ/gFbnUvuJNu/QYntYhVDwwczn2dOX4ihUbbfrWmpmdm5+ZLC4tLyyura+X1jRsdp4rxJotlrNo+aC5FxJsoUPJ2ojiEvuQtv39e1Fv3XGkRR9c4SHg3hNtIBIIBGskr11pehntOftrycN/lCG6ggGVuAgoFSDcEvGMgs8s8/xap6c29csWu2iPQv8QZkwoZo+GVh24vZmnII2QStO44doLdrFjJJM8X3VTzBFgfbnnH0AhCrrvZ6Ic53TFKjwaxMi9COlJ/TmQQaj0IfdNZONa/a4X4X62TYlDrZiJKUuQR+zoUpJJiTIu4aE8ozlAODAGmhPFK2R2YiNCEOnGlpwtrRS7O7xT+kpuDqnNcPbw6qtTPxgmVyBbZJrvEISekTi5IgzQJIw/kiTyTF+vRerPereFX65Q1ntkkE7A+PgFacq02</latexit> 犬 Wt+1 = Wt 0.9 0.1 Fried chicken 誤差逆伝播法 2 5 <latexit sha1_base64="o+Wilgmq/6kQ7GIvnJrVGKJbkNs=">AAAD7XicdZLNbtNAEMe3MR8lfKVw5GIRIXGK7NCmcEFVi6AHKgXUtEVxFK3X49TqetfaXbcxK78F3BDXvgNXeAnehvVHgSTuSpbG/5nfrOfv8RMaSeU4v9da1o2bt26v32nfvXf/wcPOxqMjyVNBYEQ45eLExxJoxGCkIkXhJBGAY5/CsX+2V+SPz0HIiLNDlSUwifGMRWFEsDLStNPzyh7apynkbS8UmGgvwUJFmNrzqZv/e/s8dfJpp+v2nPLYTm+w+eLVdhHUylWqi+oznG60Lr2AkzQGpgjFUo5dJ1ETXfQktLgylZBgcoZnMDYhwzHIiS4/KrefGSWwQy7Mw5RdqgtEcB4lsmbmV1DMBVS1su0FEBprqiE/AaX8op/r/cOD97neG+y+6b81AEml4vFqe41jKbPYL3pidSqXc4XYlBunKnw50RFLUgWMVHOEKbUVt4ufYAeRAKJoZgJMRGSssMkpNt4rEIu3BLKY3NjE4ILwOMYs0J6AQMFc5WN3onX9A42W666b50u1MwHG+uXqofHkXZFpZIplWEE+8gzT3WJNmpCstHYF+ut4A8IFZrPVeyq5kTC7a2bBK8hBpTcySSoSes00wzJXY/+v9vXBUb/nDnpbHza7O6/rJV9HT9BT9By5aBvtoH00RCNE0Bf0A/1EvyxufbW+Wd+r0tZazTxGC8e6/AOdZVqO</latexit> x0 x1 = f (z0 ) p = g(z1 ) z0 = W 0 x 0 z1 = W 1 x 1 <latexit sha1_base64="7B6VK+s/cHGUFSRtyq4x1H2zNbs=">AAAD1nicdZJNbxMxEIbdLh8lfDSFAwcuKyKkcol2oyQ0B1DVIuiBSgE1bVASRV7vbLqq117Z3jbBWm6IK/+BAxf4QfwbvB9FSrK1ZGn0zjxjzevxYhpK5Th/NzatW7fv3N26V7v/4OGj7frO41PJE0FgQDjlYuhhCTRkMFChojCMBeDIo3DmXRxm+bNLEDLk7EQtYphEeMbCICRYGWlafzrOe2iPJpDW5lP3dbD7Zeq8nNYbTrO31+l12rbTdPKTBa1ur+3Ybqk0UHn6053NX2OfkyQCpgjFUo5cJ1YTjYUKCTW9x4mEGJMLPIORCRmOQE50/npqvzCKbwdcmMuUnatLhH8ZxrJk5tdQxAUUtbI29iEwHhTTfAZK+VUr1Ucnxx9Sfdg9eNt6ZwCSSMWj9fYaR1IuIi/ridW5XM1lYlVulKhgb6JDFicKGCnmCBJqK25nbtt+KIAoujABJiI0VtjkHAtMFIjlV3yZTW5sYnBFeBRh5uuxAF/BXKUjd6J1+VNGS3XDTdOV2pkAY/1qdd948j7LVDLZr68hn/gC04NsH6qQRW7tGvTf8QqEC8xm6+8UciVhltTMgteQ40KvZOJExPSGafp5rsTMal/vr31zcNpqut1m52O7sf+mXPIt9Aw9R7vIRa/QPjpCfTRABKXoJ/qN/lhD66v1zfpelG5ulMwTtHSsH/8AhKVQAQ==</latexit> <latexit sha1_base64="Y10rI/7bZKt0QVm20UMpIcJCq3g=">AAAD23icdZLLbhMxFIbdDJcSbiksYTEiQiqbKJOWFBagqkWlCyoF1LRFySjyeE5Sqx57ZHvaBms2sENseQbYwtvwNnguRaQztWTp6D/nO/b57SBmVOlu989Sw7l2/cbN5VvN23fu3rvfWnlwoEQiCQyJYEIeBVgBoxyGmmoGR7EEHAUMDoOT7Sx/eApSUcH39TwGP8IzTqeUYG2lSevxOO9hdoQEpd9KAJ4241ez1U8T79mk1fY63Xy53U5/fe3lRhaUykWqjco1mKw0foxDQZIIuCYMKzXyurH2DZaaEgZpc5woiDE5wTMY2ZDjCJRv8iuk7lOrhO5USLu5dnN1gQhPaaxK5vwCiuzNi1rVHIcwtUYUI30ExsRZLzW7+3vvUrPd33rT27EASZQWUbW9wZFS8yjIemJ9rC7nMrEuN0r09IVvKI8TDZwUc0wT5mrhZpa7IZVANJvbABNJrRUuOcYSEw1y8ZRQZZNbmzicERFFmIdmLCHUcK7TkecbUz6X1VLT9tL0Uu0se8BK9cB6UjxtHROwBCrIBzHHbMtmapF5bm0F+ud4DSIk5rPqOYVcS9ifamfBFWSv0GuZOJExu2KaQZ4rsf+/9tXBQa/j9TvP36+3N1+Xn3wZPUJP0Cry0AbaRLtogIaIoM/oJ/qFfju+88X56nwrShtLJfMQLSzn+19Ir1I6</latexit> <latexit sha1_base64="UuN06zQsRSsAb27Xdav3z4C9v20=">AAADwHicdZLdbtMwFMe9ho9Rvja45CaiQuKqSioYXKGpQ7ALJhVYt6E2qhznpLXm2MF2tgYrj8AtPALPxNvgfIDUJhwp0vH/nJ+d89cJU0aV9rzfOz3nxs1bt3fv9O/eu//g4d7+ozMlMklgSgQT8iLEChjlMNVUM7hIJeAkZHAeXh6V9fMrkIoKfqrzFIIELzmNKcHaSp/XC2+xN/CGXhVuO/GbZICamCz2e7/mkSBZAlwThpWa+V6qA4OlpoRB0Z9nClJMLvESZjblOAEVmOpfC/eZVSI3FtJ+XLuVukFEVzRVDbP+CyVCQt2r+vMIYjttdTJfgDFxPSrM8enJh8IcHYzfjt5ZgGRKi6R9vcGJUnkSlndivVLbtVLsqs0yHb8ODOVppoGTeo44Y64WbumrG1EJRLPcJphIaq1wyQpLTDTIzVciVU5ubeJwTUSSYB6ZuYRIw1oXMz8wZl7PZrXCDPyi2OpdSrDWb3dPrCfvy0onE7IMWsgnkWM2tpVOJK+sbUH/HO9AhMR82X6nljsJu452FtxCTmq9k0kzmbL/TDOpag1mV9vfXuR2cjYa+gfDlx9fDA7fNEu+i56gp+g58tErdIiO0QRNEUFL9B39QD+dsbNyhPO1bu3tNMxjtBHOtz+zQEeo</latexit> <latexit sha1_base64="rt/AguJOellaTn4myWaxqUt6jSU=">AAAD3nicdZJNb9NAEIa3MR8lfKVw5IBFhMQpikNJ2wOoahH0QEVATVOURO56PUlXXe9au+s2YeUj3BBX/gEHrvBj+DesHRc1iTuSpdE784w9ryeIGVW62fy7UnGuXb9xc/VW9fadu/fu19YeHCqRSAJdIpiQRwFWwCiHrqaawVEsAUcBg15wupvVe2cgFRX8QE9jGEZ4zOmIEqyt5NceD/IZ5hMwJs7fS8zHkFY/+97Lnu9NfM+v1b1GMw+32WivP9/ayJJCuSjVUREdf63ycxAKkkTANWFYqb7XjPXQYKkpYXb2IFEQY3KKx9C3KccRqKHJvyJ1n1oldEdC2odrN1fniPCMxqpgJhdQJCTMelV1EMLIenF5q1Zq9g7236Vmt73zuvXGAiRRWkTL4w2OlJpGQTYT6xO1WMvEslo/0aPNoaE8TjRwMttjlDBXCzdz3Q2pBKLZ1CaYSGqtcMkJlphokPNvCVW2ubWJwzkRUYR5aAYSQg0Tnfa9oTHFH7Naaupemi70jiVY6xe7O9aTt1mllAlYAkvIRzHFbMdWSpFpbu0S9N/xEkTk17WEzORSwh6r3QUvIfszvZSJExmzK7bp5LUCu3zaVyeHrYbXbrz4sF7fflUc+Sp6hJ6gZ8hDG2gb7aEO6iKCvqBf6Df64xw7X51vzvdZa2WlYB6iuXB+/AMAnVOT</latexit> <latexit sha1_base64="QChJEzwNyGWa57Rha8NzpgKHtp8=">AAAD1XicdZJNbxMxEIbdLB8lfKUgceGyIkLiFO1GSWgOoKpF0AOVAmqaoiRaeb2T1Kp3vbK9bYLZG+LKf+DCAf4Q/wbvRxFJtpYsjd6ZZ6x5PX7MqFSO82erZt24eev29p363Xv3Hzxs7Dw6kTwRBIaEMy5OfSyB0QiGiioGp7EAHPoMRv75QZYfXYCQlEfHahnDNMTziM4owcpIXuPJJO+hBQRp/bPnvBp5zsJzvEbTafV3u/1ux3ZaTn6yoN3rdxzbLZUmKs/A26n9nAScJCFEijAs5dh1YjXVWChKGKT1SSIhxuQcz2FswgiHIKc6fzy1nxslsGdcmBspO1dXiOCCxrJkFldQyAUUtbI+CWBmLCiG+QSM8ct2qg+Pj96n+qC3/6b91gAkkYqHm+01DqVchn7WE6szuZ7LxKrcOFGz3ammUZwoiEgxxyxhtuJ2ZrYdUAFEsaUJMBHUWGGTMywwUSBWXwlkNrmxKYJLwsMQR4GemE9RsFDp2J1q/d9H6aabpmu1cwHG+vXqgfHkXZapZHyWwAbykS8x2zeZSmSZW7sB/XO8AuECR/PNdwq5kjA7ambBG8hRoVcycSJids00gzxXYma1r/bXvj44abfcXqv7odPce10u+TZ6ip6hF8hFL9EeOkQDNEQEfUE/0C/02xpZqfXV+laU1rZK5jFaOdb3v4FQT7g=</latexit> Forward propagation <latexit sha1_base64="csp6ZDfedoxqnf2wU3ILcZcoXx0=">AAAD7HicdZLdbtMwFMe9ho9Rvjq45CaiQuKGKulGx26maUOwCyYVtK5DTVU5zkkXzbEj29lWrDwF3CFu4Rm4hafgbXA+BmubWYp0/D/nd5zzt/2ERlI5zp+VhnXj5q3bq3ead+/df/CwtfboSPJUEBgQTrk49rEEGjEYqEhROE4E4NinMPRP9/L88AyEjDg7VLMExjGesiiMCFZGmrReeEUPLSDIml4oMNFegoWKMLU/TZzs/25odpNW2+04xbKdTm9jfWszDyrlMtVG1epP1ho/vICTNAamCMVSjlwnUWOd9yQUzJGphASTUzyFkQkZjkGOdfFPmf3MKIEdcmE+puxCnSOCsyiRFXNxCcVcQFkrm14AoXGmnPEjUMrPu5nePzx4l+m93u7r7hsDkFQqHi+31ziWchb7eU+sTuRiLhfrcqNUha/GOmJJqoCRco4wpbbidn4HdhAJIIrOTICJiIwVNjnBxnsFYv6UQOaTG5sYnBMex5gF2jN3peBCZSN3rPWV+9NtN8sWaqcCjPWL1X3jyds8U8v4NIUl5AOfYbprMrXIrLB2CfrneA3CBWbT5XNKuZYwT9fMgpeQg1KvZZJUJPSaafpFrsKuPu3rg6Nux+11Xr7faO9sV498FT1BT9Fz5KJNtIP2UR8NEEGf0U/0C/22mPXF+mp9K0sbKxXzGM0t6/tfg3NZ9Q==</latexit> @x1 @z1 @p @z0 @x1 @z1 @z0 @W0 <latexit sha1_base64="rF1caxuySQGRgV1ePfEnnnJceRs=">AAAD9XicdZLdbtMwFMe9ho9Rvjq45CaiQuKqasrWwQ2aNgS7YKKgdRtqqspxTjprjh3Zztpi5U3gDsEl78AtvABvg5NmsLaZpUj2/5zfcc7fJ0gYVbrd/rNWc65dv3Fz/Vb99p279+43Nh4cKZFKAn0imJAnAVbAKIe+pprBSSIBxwGD4+BsL48fn4NUVPBDPUtgGOMxpxElWFtp1Oj6RQ3zERgTk3cS8zFkdT+SmBg/wVJTzNxPIy/7f5ra06jR9FrtYrntVnfz2YvtfFMqF6EmKldvtFH77oeCpDFwTRhWauC1Ez00eU3C8itTBQkmZ3gMA7vlOAY1NMXPZe4Tq4RuJKT9uHYLdYEIz2miSmZ6AcVCwjxX1f0QImvR5WY7mdk/PHibmb3u7qvOawuQVGkRr5Y3OFZqFgd5TaxP1XIsF6tig1RHz4eG8iTVwMm8jyhlrhZu/hhuSCUQzWZ2g4mk1gqXnGLrvQa5eEuo8s6tTRwmRMQx5qHxJYQapjobeENjyoe0WmaaXpYt5Y4lWOuXs3vWkzd5pJIJWAoryAcxw2zXRiqRWWHtCvTP8QpEFEO3gszlSsLOsO0FryAHc72SSVKZsCu66RWxErs82ldvjjotr9vaer/Z3HlZDvk6eoQeo6fIQ9toB+2jHuojgj6jn+gX+u1MnC/OV+fbPLW2VjIP0cJyfvwFKZleDw==</latexit> 5 @z1 @x1 5 <latexit sha1_base64="Xdq7aCiFEo9G0CYpm8rT5Wcgomo=">AAAD8nicdZLNbtNAEMe3NR8lfKVw5GIRIXGK4rRN4YKqFpUeqBRQ0xbFUbRej1Or693V7rptWPlB4IY4cOEduMIj8DasPwqkdleyNPrP/GY9/9lA0FjpXu/30rJz4+at2yt3Wnfv3X/wsL366FDxVBIYEU65PA6wAhozGOlYUzgWEnASUDgKTnfy/NEZSBVzdqDnAiYJnrE4ignWVpq21/yih9nlEpR+IwFY1vIjiYnxBZY6xtQV2b/449TLpu2O1+0Vx+11B+trLzfzoFIuUx1UneF0dfmbH3KSJsA0oVipsdcTemLynoSCvTBVIDA5xTMY25DhBNTEFH+Wuc+sEroRl/Zj2i3UBSI8i4WqmItLKLEDlbWq5YcQWX/KST8Apfy8n5m9g/23mdkZbL/u71qApErzpN7e4ESpeRLkPbE+UVdzudiUG6c6ejExMROpBkbKOaKUupq7+SbcMJZANJ3bABMZWytccoKt8xrk4i2hyie3NjE4JzxJMAuNLyHUcKGzsTcxptqi1TLT8bLsSu0s32utemg9KTfexAQ0hRryns8x3baZRmReWFuD/jregHCJ2ax+Tyk3EvYB21lwDdkv9UZGpFLQa6YZFrkK+/9pXx8c9rveoLvxbr2z9ap65CvoCXqKniMPbaIttIeGaIQI+oR+oJ/ol6Odz84X52tZurxUMY/RwnG+/wHzylzl</latexit> 5 <latexit sha1_base64="rF1caxuySQGRgV1ePfEnnnJceRs=">AAAD9XicdZLdbtMwFMe9ho9Rvjq45CaiQuKqasrWwQ2aNgS7YKKgdRtqqspxTjprjh3Zztpi5U3gDsEl78AtvABvg5NmsLaZpUj2/5zfcc7fJ0gYVbrd/rNWc65dv3Fz/Vb99p279+43Nh4cKZFKAn0imJAnAVbAKIe+pprBSSIBxwGD4+BsL48fn4NUVPBDPUtgGOMxpxElWFtp1Oj6RQ3zERgTk3cS8zFkdT+SmBg/wVJTzNxPIy/7f5ra06jR9FrtYrntVnfz2YvtfFMqF6EmKldvtFH77oeCpDFwTRhWauC1Ez00eU3C8itTBQkmZ3gMA7vlOAY1NMXPZe4Tq4RuJKT9uHYLdYEIz2miSmZ6AcVCwjxX1f0QImvR5WY7mdk/PHibmb3u7qvOawuQVGkRr5Y3OFZqFgd5TaxP1XIsF6tig1RHz4eG8iTVwMm8jyhlrhZu/hhuSCUQzWZ2g4mk1gqXnGLrvQa5eEuo8s6tTRwmRMQx5qHxJYQapjobeENjyoe0WmaaXpYt5Y4lWOuXs3vWkzd5pJIJWAoryAcxw2zXRiqRWWHtCvTP8QpEFEO3gszlSsLOsO0FryAHc72SSVKZsCu66RWxErs82ldvjjotr9vaer/Z3HlZDvk6eoQeo6fIQ9toB+2jHuojgj6jn+gX+u1MnC/OV+fbPLW2VjIP0cJyfvwFKZleDw==</latexit> <latexit sha1_base64="o+Wilgmq/6kQ7GIvnJrVGKJbkNs=">AAAD7XicdZLNbtNAEMe3MR8lfKVw5GIRIXGK7NCmcEFVi6AHKgXUtEVxFK3X49TqetfaXbcxK78F3BDXvgNXeAnehvVHgSTuSpbG/5nfrOfv8RMaSeU4v9da1o2bt26v32nfvXf/wcPOxqMjyVNBYEQ45eLExxJoxGCkIkXhJBGAY5/CsX+2V+SPz0HIiLNDlSUwifGMRWFEsDLStNPzyh7apynkbS8UmGgvwUJFmNrzqZv/e/s8dfJpp+v2nPLYTm+w+eLVdhHUylWqi+oznG60Lr2AkzQGpgjFUo5dJ1ETXfQktLgylZBgcoZnMDYhwzHIiS4/KrefGSWwQy7Mw5RdqgtEcB4lsmbmV1DMBVS1su0FEBprqiE/AaX8op/r/cOD97neG+y+6b81AEml4vFqe41jKbPYL3pidSqXc4XYlBunKnw50RFLUgWMVHOEKbUVt4ufYAeRAKJoZgJMRGSssMkpNt4rEIu3BLKY3NjE4ILwOMYs0J6AQMFc5WN3onX9A42W666b50u1MwHG+uXqofHkXZFpZIplWEE+8gzT3WJNmpCstHYF+ut4A8IFZrPVeyq5kTC7a2bBK8hBpTcySSoSes00wzJXY/+v9vXBUb/nDnpbHza7O6/rJV9HT9BT9By5aBvtoH00RCNE0Bf0A/1EvyxufbW+Wd+r0tZazTxGC8e6/AOdZVqO</latexit> @x1 @z0 <latexit sha1_base64="Xdq7aCiFEo9G0CYpm8rT5Wcgomo=">AAAD8nicdZLNbtNAEMe3NR8lfKVw5GIRIXGK4rRN4YKqFpUeqBRQ0xbFUbRej1Or693V7rptWPlB4IY4cOEduMIj8DasPwqkdleyNPrP/GY9/9lA0FjpXu/30rJz4+at2yt3Wnfv3X/wsL366FDxVBIYEU65PA6wAhozGOlYUzgWEnASUDgKTnfy/NEZSBVzdqDnAiYJnrE4ignWVpq21/yih9nlEpR+IwFY1vIjiYnxBZY6xtQV2b/449TLpu2O1+0Vx+11B+trLzfzoFIuUx1UneF0dfmbH3KSJsA0oVipsdcTemLynoSCvTBVIDA5xTMY25DhBNTEFH+Wuc+sEroRl/Zj2i3UBSI8i4WqmItLKLEDlbWq5YcQWX/KST8Apfy8n5m9g/23mdkZbL/u71qApErzpN7e4ESpeRLkPbE+UVdzudiUG6c6ejExMROpBkbKOaKUupq7+SbcMJZANJ3bABMZWytccoKt8xrk4i2hyie3NjE4JzxJMAuNLyHUcKGzsTcxptqi1TLT8bLsSu0s32utemg9KTfexAQ0hRryns8x3baZRmReWFuD/jregHCJ2ax+Tyk3EvYB21lwDdkv9UZGpFLQa6YZFrkK+/9pXx8c9rveoLvxbr2z9ap65CvoCXqKniMPbaIttIeGaIQI+oR+oJ/ol6Odz84X52tZurxUMY/RwnG+/wHzylzl</latexit> <latexit sha1_base64="aPkGF++vmojr/PnNYcMYw96HEn0=">AAAD9XicdZLdbtMwFMe9ho9Rvjq45CaiQuKqasrWwQ2aNgS7YKKgdR1qqspxTjprjh3Zzrpi5U3gDsEl78AtvABvg5NmsLaZpUj2/5zfcc7fJ0gYVbrd/rNWc65dv3Fz/Vb99p279+43Nh4cKZFKAn0imJDHAVbAKIe+pprBcSIBxwGDQXC6l8cHZyAVFfxQzxIYxXjCaUQJ1lYaN7p+UcN8BMbE9J3EfAJZ3Y8kJsZPsNQUM/fT2Mv+nwb2NG40vVa7WG671d189mI735TKRaiJytUbb9S++6EgaQxcE4aVGnrtRI9MXpOw/MpUQYLJKZ7A0G45jkGNTPFzmfvEKqEbCWk/rt1CXSDCM5qokjm/gGIhYZ6r6n4IkbXocrOdzOwfHrzNzF5391XntQVIqrSIV8sbHCs1i4O8JtYnajmWi1WxYaqj5yNDeZJq4GTeR5QyVws3fww3pBKIZjO7wURSa4VLTrD1XoNcvCVUeefWJg5TIuIY89D4EkIN5zobeiNjyoe0WmaaXpYt5U4kWOuXs3vWkzd5pJIJWAoryAcxw2zXRiqRWWHtCvTP8QpEFEO3gszlSsLOsO0FryAHc72SSVKZsCu66RWxErs82ldvjjotr9vaer/Z3HlZDvk6eoQeo6fIQ9toB+2jHuojgj6jn+gX+u1MnS/OV+fbPLW2VjIP0cJyfvwFuo1d7g==</latexit> <latexit sha1_base64="RlrtYxiGwNDm/OSOojM6YjHJMWs=">AAACI3icbVC7TsMwFHV4lvIKMLJYVAimKgEEjBUsDAxFog+piSrHdVqrjmPZDlIV5RP4DL6AFb6ADbEwMPIfOG0kaMuRLB2d+zo+gWBUacf5tBYWl5ZXVktr5fWNza1te2e3qeJEYtLAMYtlO0CKMMpJQ1PNSFtIgqKAkVYwvM7rrQciFY35vR4J4keoz2lIMdJG6tpHXigRTj2BpKaIQS9CeoARS2+z7FcVWdeuOFVnDDhP3IJUQIF61/72ejFOIsI1ZkipjusI7af5QsxIVvYSRQTCQ9QnHUM5iojy0/GHMnholB4MY2ke13Cs/p1IUaTUKApMZ+5XzdZy8b9aJ9HhpZ9SLhJNOJ4cChMGdQzzdGCPSoI1GxmCsKTGK8QDZBLSJsOpKz2VW8tzcWdTmCfNk6p7Xj29O6vUroqESmAfHIBj4IILUAM3oA4aAINH8AxewKv1ZL1Z79bHpHXBKmb2wBSsrx/HuKZK</latexit> 1 2 2 @p @z1 2 Backward propagation @L @L = @p @W <latexit sha1_base64="NXhC3ff4B32CgQ5BYZuIAyqz5Qg=">AAACJHicbVDLSsNAFJ3UV62vqEs3g0XoqiQq6kYounHhooJ9QBPKZDJph04mYWYilJBf8DP8Arf6Be7EhRt3/oeTNqBtPTBwOPd15ngxo1JZ1qdRWlpeWV0rr1c2Nre2d8zdvbaMEoFJC0csEl0PScIoJy1FFSPdWBAUeox0vNF1Xu88ECFpxO/VOCZuiAacBhQjpaW+Wbt0AoFw6sRIKIoYdEKkhhix9DbLftVO1jerVt2aAC4SuyBVUKDZN78dP8JJSLjCDEnZs61YuWm+EDOSVZxEkhjhERqQnqYchUS66eRHGTzSig+DSOjHFZyofydSFEo5Dj3dmfuV87Vc/K/WS1Rw4aaUx4kiHE8PBQmDKoJ5PNCngmDFxpogLKj2CvEQ6YSUDnHmii9za3ku9nwKi6R9XLfP6id3p9XGVZFQGRyAQ1ADNjgHDXADmqAFMHgEz+AFvBpPxpvxbnxMW0tGMbMPZmB8/QAvl6Z4</latexit> @z1 @W1 data X <latexit sha1_base64="OIzM9hBXAwa2CsqFH+Q4ck6rUCg=">AAACBXicdVDLSgMxFM3UV62vqks3wSK4Gma01C6LblxWsA+YjiWTybShmWRIMkIZuvYL3OoXuBO3focf4H+YaUewogcCh3Pu5Z6cIGFUacf5sEorq2vrG+XNytb2zu5edf+gq0QqMelgwYTsB0gRRjnpaKoZ6SeSoDhgpBdMrnK/d0+kooLf6mlC/BiNOI0oRtpI3kCl8V0WIo1mw2rNtZ05oGM36nXnvGlIoXxbNVCgPax+DkKB05hwjRlSynOdRPsZkppiRmaVQapIgvAEjYhnKEcxUX42jzyDJ0YJYSSkeVzDufpzI0OxUtM4MJMx0mP128vFvzwv1VHTzyhPUk04XhyKUga1gPn/YUglwZpNDUFYUpMV4jGSCGvT0tKVUOXRlnr5n3TPbLdhn9/Ua63LoqEyOALH4BS44AK0wDVogw7AQIBH8ASerQfrxXq13hajJavYOQRLsN6/AM2Bmhg=</latexit> <latexit sha1_base64="csp6ZDfedoxqnf2wU3ILcZcoXx0=">AAAD7HicdZLdbtMwFMe9ho9Rvjq45CaiQuKGKulGx26maUOwCyYVtK5DTVU5zkkXzbEj29lWrDwF3CFu4Rm4hafgbXA+BmubWYp0/D/nd5zzt/2ERlI5zp+VhnXj5q3bq3ead+/df/CwtfboSPJUEBgQTrk49rEEGjEYqEhROE4E4NinMPRP9/L88AyEjDg7VLMExjGesiiMCFZGmrReeEUPLSDIml4oMNFegoWKMLU/TZzs/25odpNW2+04xbKdTm9jfWszDyrlMtVG1epP1ho/vICTNAamCMVSjlwnUWOd9yQUzJGphASTUzyFkQkZjkGOdfFPmf3MKIEdcmE+puxCnSOCsyiRFXNxCcVcQFkrm14AoXGmnPEjUMrPu5nePzx4l+m93u7r7hsDkFQqHi+31ziWchb7eU+sTuRiLhfrcqNUha/GOmJJqoCRco4wpbbidn4HdhAJIIrOTICJiIwVNjnBxnsFYv6UQOaTG5sYnBMex5gF2jN3peBCZSN3rPWV+9NtN8sWaqcCjPWL1X3jyds8U8v4NIUl5AOfYbprMrXIrLB2CfrneA3CBWbT5XNKuZYwT9fMgpeQg1KvZZJUJPSaafpFrsKuPu3rg6Nux+11Xr7faO9sV498FT1BT9Fz5KJNtIP2UR8NEEGf0U/0C/22mPXF+mp9K0sbKxXzGM0t6/tfg3NZ9Q==</latexit> data X <latexit sha1_base64="/J5Xk+dXiOlf6omGGiJXLYgOMI8=">AAACBXicdVDLSsNAFJ34rPVVdelmsAiuQlLb2u6KblxWsA9IY5lMJu3QmUmYmQgldO0XuNUvcCdu/Q4/wP8waSNY0QMXDufcy733eBGjSlvWh7Gyura+sVnYKm7v7O7tlw4OuyqMJSYdHLJQ9j2kCKOCdDTVjPQjSRD3GOl5k6vM790TqWgobvU0Ii5HI0EDipFOJWegYn6X+Eij2bBUtsxmo9asNqBlWnNkpFJv1mxo50oZ5GgPS58DP8QxJ0JjhpRybCvSboKkppiRWXEQKxIhPEEj4qRUIE6Um8xPnsHTVPFhEMq0hIZz9edEgrhSU+6lnRzpsfrtZeJfnhProOEmVESxJgIvFgUxgzqE2f/Qp5JgzaYpQVjS9FaIx0girNOUlrb4Kjsty+X7efg/6VZMu26e31TLrcs8oQI4BifgDNjgArTANWiDDsAgBI/gCTwbD8aL8Wq8LVpXjHzmCCzBeP8CCxKaQA==</latexit> @z0 @W0 15 <latexit sha1_base64="aPkGF++vmojr/PnNYcMYw96HEn0=">AAAD9XicdZLdbtMwFMe9ho9Rvjq45CaiQuKqasrWwQ2aNgS7YKKgdR1qqspxTjprjh3Zzrpi5U3gDsEl78AtvABvg5NmsLaZpUj2/5zfcc7fJ0gYVbrd/rNWc65dv3Fz/Vb99p279+43Nh4cKZFKAn0imJDHAVbAKIe+pprBcSIBxwGDQXC6l8cHZyAVFfxQzxIYxXjCaUQJ1lYaN7p+UcN8BMbE9J3EfAJZ3Y8kJsZPsNQUM/fT2Mv+nwb2NG40vVa7WG671d189mI735TKRaiJytUbb9S++6EgaQxcE4aVGnrtRI9MXpOw/MpUQYLJKZ7A0G45jkGNTPFzmfvEKqEbCWk/rt1CXSDCM5qokjm/gGIhYZ6r6n4IkbXocrOdzOwfHrzNzF5391XntQVIqrSIV8sbHCs1i4O8JtYnajmWi1WxYaqj5yNDeZJq4GTeR5QyVws3fww3pBKIZjO7wURSa4VLTrD1XoNcvCVUeefWJg5TIuIY89D4EkIN5zobeiNjyoe0WmaaXpYt5U4kWOuXs3vWkzd5pJIJWAoryAcxw2zXRiqRWWHtCvTP8QpEFEO3gszlSsLOsO0FryAHc72SSVKZsCu66RWxErs82ldvjjotr9vaer/Z3HlZDvk6eoQeo6fIQ9toB+2jHuojgj6jn+gX+u1MnS/OV+fbPLW2VjIP0cJyfvwFuo1d7g==</latexit> @z1 @W1 10 四則演算や初等関数の微分は内部で定義されている それらを連鎖させれば行列積で勾配が計算できる 後ろからかければ全て行列ベクトル積になる z0 = W00 x0 + W01 x1 + W02 x2 画像ごとにこれが行われ最後に和をとる z4 = W40 x0 + W41 x1 + W42 x2 <latexit sha1_base64="7W+6tEzVHfqdij3v3e7WLrq4Jos=">AAACr3icdZHbThsxEIa9S0shFAj0CnFjNUJCqhTZm4UmF6gIbuCOSs1BStLF6ziJFe9BthcRVnkqnoYH4D2ww3ZTqjKSpV//fKMZz4Sp4Eoj9OS4ax8+rn/a2Kxsfd7e2a3u7XdUkknK2jQRieyFRDHBY9bWXAvWSyUjUShYN5xd2nz3jknFk/iXnqdsGJFJzMecEm2soPr4EKCzbpAjtLgP0DersFF4qTyjvMGg8hBgy+CSwSWDV4xnGa9kvJLxVkzDMo2SaZRMY8X4lvFLxi8Zv2CCag3VW82Tlt+EqI6WYYV32jrBEBdODRRxE1SfB6OEZhGLNRVEqT5GqR7mRGpOBVtUBpliKaEzMmF9I2MSMTXMl8tdwCPjjOA4kebFGi7dvytyEik1j0JDRkRP1b85a/4v18/0uDnMeZxmmsX0tdE4E1An0F4KjrhkVIu5EYRKbmaFdEokodrc802XkbKjLcxe/nwevi86Xh2f1v2ffu38otjQBjgEX8ExwOA7OAdX4Aa0AXUOnB/OlXPtYrfr/nZvX1HXKWq+gDfh8heY0c23</latexit> z1 = W10 x0 + W11 x1 + W12 x2 z2 = W20 x0 + W21 x1 + W22 x2 z3 = W30 x0 + W31 x1 + W32 x2 <latexit sha1_base64="RlrtYxiGwNDm/OSOojM6YjHJMWs=">AAACI3icbVC7TsMwFHV4lvIKMLJYVAimKgEEjBUsDAxFog+piSrHdVqrjmPZDlIV5RP4DL6AFb6ADbEwMPIfOG0kaMuRLB2d+zo+gWBUacf5tBYWl5ZXVktr5fWNza1te2e3qeJEYtLAMYtlO0CKMMpJQ1PNSFtIgqKAkVYwvM7rrQciFY35vR4J4keoz2lIMdJG6tpHXigRTj2BpKaIQS9CeoARS2+z7FcVWdeuOFVnDDhP3IJUQIF61/72ejFOIsI1ZkipjusI7af5QsxIVvYSRQTCQ9QnHUM5iojy0/GHMnholB4MY2ke13Cs/p1IUaTUKApMZ+5XzdZy8b9aJ9HhpZ9SLhJNOJ4cChMGdQzzdGCPSoI1GxmCsKTGK8QDZBLSJsOpKz2VW8tzcWdTmCfNk6p7Xj29O6vUroqESmAfHIBj4IILUAM3oA4aAINH8AxewKv1ZL1Z79bHpHXBKmb2wBSsrx/HuKZK</latexit> @L @p
https://losslandscape.com 最適化手法 重みWとバイアスbを合わせてθとする SGD <latexit sha1_base64="IOyR436oWLZbrKn8O0Lz+NybarM=">AAACMXicbVDLSsNAFJ34rPVVdekmWARFLInvjSC6ceGigrVCU8rNdGqHTiZh5kYoIV/iZ/gFbvUL3Ingyp9w0kaw1QvDnDnnXu6Z40eCa3ScN2ticmp6ZrYwV5xfWFxaLq2s3uowVpTVaChCdeeDZoJLVkOOgt1FikHgC1b3exeZXn9gSvNQ3mA/Ys0A7iXvcApoqFbp0MMuQ2gluOOmp/kDdz1zeRJ8AV4A2KUgkqt060febpXKTsUZlP0XuDkok7yqrdKn1w5pHDCJVIDWDdeJsJmAQk4FS4terFkEtAf3rGGghIDpZjL4XmpvGqZtd0JljkR7wP6eSCDQuh/4pjMzq8e1jPxPa8TYOWkmXEYxMkmHizqxsDG0s6zsNleMougbAFRx49WmXVBA0SQ6sqWtM2upycUdT+EvuN2ruEeV/euD8tl5nlCBrJMNskVcckzOyCWpkhqh5JE8kxfyaj1Zb9a79TFsnbDymTUyUtbXNx2yq3o=</latexit> ✓t+1 = ✓t ⌘rL(✓t ) momentum SGD semi-implicit Euler風に書くと mt+1 = mt + rL(✓t ) 慣性項 <latexit sha1_base64="LsH7AQBt4F+hCIDybv0KUO4ZQyE=">AAAD7XicdZJbaxNBFMenWS813lJ99GUxCJVCyAaJvgglFe1DC1GatpINy+zsSbJ0LsvMbNsw7LfQN/G138FX/RJ+G2cvCsmuBxbO/s/5zXD+c8KExkr3+7+3Ws6t23fubt9r33/w8NHjzs6TUyVSSWBCBBXyPMQKaMxhomNN4TyRgFlI4Sy8OMjrZ5cgVSz4iV4lMGN4weN5TLC2UtDpscDoPS976y8wY9hlgd7zOQ4p9hnWS4KpOcp2fb0EjQP9Muh0+71+EW498aqki6oYBzutGz8SJGXANaFYqanXT/TMYKljQiFr+6mCBJMLvICpTTlmoGamGCxzX1glcudC2o9rt1DXiOgyTlTFXP+FmJBQ9qq2H8HcWlP8mc9AqbgaZObw5PgoMwfD0bvBewuQVGnB6scbzJRasTA/05qhNmu52FSbpnr+ZmZinqQaOCnnmKfU1cLNH8GNYglE05VNMJGxtcIlSywx0SDXb4lUPrm1icMVEfaFeGR8CZGGa51NvZkxfjmb1TLT9bJso3chwVq/2T22nnzIK41MSFOoIZ/ECtORrTQiq8LaGvTP8QZESMwX9XtKuZGwu2tnwTXkuNQbmSSVCf3PNOOiVmF2tb3NRa4np4OeN+wNP77q7o+qJd9Gz9BztIs89Brto0M0RhNE0Bf0A/1EvxzhfHW+Od/L1tZWxTxFa+Hc/AF3O1mq</latexit> ✓t+1 = ✓t <latexit sha1_base64="q5SwalYvgTBvrmPdRWqvkRQe7eY=">AAAD4XicdZJbaxNBFMenXS813lJ99GVpEAQxZINEX4SSivbBQpSmrWRDmJ09SYbOZZmZbRuGfdc38dUP4Juv+lX8Ns5eFJJdDyx75n/Ob4bz50QJo9r0er+3tr1r12/c3LnVun3n7r377d0HJ1qmisCYSCbVWYQ1MCpgbKhhcJYowDxicBqdH+T10wtQmkpxbFYJTDleCDqnBBsnzdp7oVmCwTNrngbZq+pgnoXu5/NSnbU7vW6vCL+eBFXSQVWMZrvb38NYkpSDMIRhrSdBLzFTi5WhhEHWClMNCSbneAETlwrMQU9tMUzmP3ZK7M+lcp8wfqGuEfEFTXTFXP2FuFRQ9upWGMPc2VGc7EdgTF72M3t4fPQusweD4ev+GweQVBvJ69dbzLVe8Si/E5ul3qzlYlNtkpr5y6mlIkkNCFLOMU+Zb6SfG+/HVAExbOUSTBR1VvhkiRUmBtT6K7HOJ3c2CbgkknMsYhsqiA1cmWwSTK0Ny9mcltlOkGUbvQsFzvrN7pHz5G1eaWQilkIN+SBXmA1dpRFZFdbWoH+ONyBSYbGov1PKjYTbVzcLriFHpd7IJKlK2H+mGRW1CnOrHWwucj056XeDQXfw/nlnf1gt+Q56hPbQExSgF2gfHaIRGiOCPqEf6Cf65RHvs/fF+1q2bm9VzEO0Ft63P6IjVJ8=</latexit> ⌘mt+1 <latexit sha1_base64="MjuWH5G898k351kRZEFDIU570Os=">AAACHHicbVDLSsNAFJ34rPVVdelmsAi6sCQq6kYQ3bhwoWC10JRyM5naoZNJmLkRSujWz/AL3OoXuBO3gh/gfzhps9DqgYHDOfdyz5wgkcKg6346E5NT0zOzpbny/MLi0nJlZfXGxKlmvM5iGetGAIZLoXgdBUreSDSHKJD8Nuid5f7tPddGxOoa+wlvRXCnREcwQCu1KxTaeLzjKwgk+BFgl4HMLgZbPnY5Wm+7Xam6NXcI+pd4BamSApftypcfxiyNuEImwZim5ybYykCjYJIPyn5qeAKsB3e8aamCiJtWNvzJgG5aJaSdWNunkA7VnxsZRMb0o8BO5mHNuJeL/3nNFDtHrUyoJEWu2OhQJ5UUY5rXQkOhOUPZtwSYFjYrZV3QwNCW9+tKaPJoA9uLN97CX3KzW/MOantX+9WT06KhElknG2SLeOSQnJBzcknqhJEH8kSeyYvz6Lw6b877aHTCKXbWyC84H9+Vz6Jo</latexit> at = <latexit sha1_base64="6uRgnqbwem00uROeNjK2GJZ+8vY=">AAACHnicbZDPSsNAEMY39f//qkcvwSIIhZKoqBeh6MWjglWhKWGy3bRLd5OwOymUkLuP4RN41SfwJl71AXwPNzUHa/1g4eObGWb2FySCa3ScT6syMzs3v7C4tLyyura+Ud3cutVxqihr0VjE6j4AzQSPWAs5CnafKAYyEOwuGFwU9bshU5rH0Q2OEtaR0It4yCmgifzq7tDPsO7mZ0Mf6+CjFyqgmccQ8szrgZSQ+9Wa03DGsqeNW5oaKXXlV7+8bkxTySKkArRuu06CnQwUcipYvuylmiVAB9BjbWMjkEx3svFfcnvPJF07jJV5Edrj9PdEBlLrkQxMpwTs67+1Ivyv1k4xPO1kPEpSZBH9WRSmwsbYLsDYXa4YRTEyBqji5lab9sHQQINvYktXF6cVXNy/FKbN7UHDPW4cXh/VmucloUWyQ3bJPnHJCWmSS3JFWoSSB/JEnsmL9Wi9Wm/W+09rxSpntsmErI9vHfqj0w==</latexit> rL(✓t ) vt+1 = vt + at ⌘ ミニバッチごとに損失関数の形状は変化する ✓t+1 = ✓t + vt+1 <latexit sha1_base64="+wvl9bAzEQ9hRPZ+KyDyIC4ih2s=">AAACH3icbVBLSgNBEO2Jvxh/UZduBoMgBMKMiroRgm5cRjAfSEKo6XSSJt0zQ3dNIAw5gMfwBG71BO7EbQ7gPexJZmESHzS8eq+Kqn5eKLhGx5lambX1jc2t7HZuZ3dv/yB/eFTTQaQoq9JABKrhgWaC+6yKHAVrhIqB9ASre8OHxK+PmNI88J9xHLK2hL7Pe5wCGqmTL7RwwBA6MRbdyV1aYHE0F1p9kBJMl1NyZrBXiZuSAklR6eR/Wt2ARpL5SAVo3XSdENsxKORUsEmuFWkWAh1CnzUN9UEy3Y5nn5nYZ0bp2r1AmeejPVP/TsQgtR5Lz3RKwIFe9hLxP68ZYe+2HXM/jJD5dL6oFwkbAztJxu5yxSiKsSFAFTe32nQACiia/Ba2dHVy2sTk4i6nsEpqFyX3unT5dFUo36cJZckJOSXnxCU3pEweSYVUCSUv5I28kw/r1fq0vqzveWvGSmeOyQKs6S8iWKPA</latexit> RMSProp 2 <latexit sha1_base64="4YkWyJvS7AvVbbmFVeff+OrbRbQ=">AAACMnicbVDLSsNAFJ34rO+qSzeDRaiIJVGpbgTRjQsXClaFpoab6dQOTiZh5qZQQv7Ez/AL3OoP6E7EnR/hpHbh68DA4Zx7uWdOmEhh0HWfnZHRsfGJydLU9Mzs3PxCeXHpwsSpZrzBYhnrqxAMl0LxBgqU/CrRHKJQ8svw9qjwL3tcGxGrc+wnvBXBjRIdwQCtFJTrvSDDDS/f93U3pr0AN6reZsHXfQWhBD8C7DKQ2Ule9bHLEQJcv94KyhW35g5A/xJvSCpkiNOg/O63Y5ZGXCGTYEzTcxNsZaBRMMnzaT81PAF2Cze8aamCiJtWNvhfTtes0qadWNunkA7U7xsZRMb0o9BOFnHNb68Q//OaKXb2WplQSYpcsa9DnVRSjGlRFm0LzRnKviXAtLBZKeuCBoa20h9X2qaIlttevN8t/CUXWzWvXts+26kcHA4bKpEVskqqxCO75IAck1PSIIzckQfySJ6ce+fFeXXevkZHnOHOMvkB5+MTv8+qnQ==</latexit> vt+1 = ⇢vt + (1 ⇢)rL(✓t ) 勾配分散項 1 mt+1 = mt + p rL(✓t ) 慣性項+正規化 vt+1 + ✏ <latexit sha1_base64="CZUdQYoxHEKkfm3yV8MWCA6D8wA=">AAAEDnicdZLdahNBFMeniR81fqV66c1iECqBkA0leiOUVLQXLUZp2ko2hNnZk2TpfKwzs2nDMO/gW/gEeife+grirb6Hk+wqJLseWDj7P+c3w/nPCRMaK91u/9iqVK9dv3Fz+1bt9p279+7Xdx6cKpFKAgMiqJDnIVZAYw4DHWsK54kEzEIKZ+HFwbJ+NgepYsFP9CKBEcNTHk9igrWTxvU3bGx007cvgilmDHtsrJvBRGJifGsC9UFqM886mgEkKqaCWxtwHFIcMKxnBFNzZHcDPQONx/rpuN5ot9qr8IqJnycNlEd/vFP5FESCpAy4JhQrNfTbiR4ZLHVMKNhakCpIMLnAUxi6lGMGamRWk1vviVMibyKk+7j2VuoaEc3jROXM1V+ICQlZr6oFEUycd6s/8x4oFZcdaw5Pjo+sOej2XnZeOYCkSgtWPN5gptSChcsznRlqs7YUy2rDVE+ej0zMk1QDJ9kck5R6WnjLV/KiWALRdOESTGTsrPDIDLtn0SDXb4nUcnJnE4dLItwT8sgEEiINV9oO/ZExQTab06xp+NZu9E4lOOs3u/vOk9fLSikT0hQKyDuxwLTnKqXIYmVtAfrneAkiJObT4j2ZXEq45Xaz4AJynOmlTJLKhP5nmv6qlmNutf3NRS4mp52W32113+419nv5km+jR+gx2kU+eob20SHqowEi6DP6iX6h39WP1S/Vr9VvWWtlK2ceorWofv8DVK1oSQ==</latexit> ✓t+1 = ✓t <latexit sha1_base64="q5SwalYvgTBvrmPdRWqvkRQe7eY=">AAAD4XicdZJbaxNBFMenXS813lJ99GVpEAQxZINEX4SSivbBQpSmrWRDmJ09SYbOZZmZbRuGfdc38dUP4Juv+lX8Ns5eFJJdDyx75n/Ob4bz50QJo9r0er+3tr1r12/c3LnVun3n7r377d0HJ1qmisCYSCbVWYQ1MCpgbKhhcJYowDxicBqdH+T10wtQmkpxbFYJTDleCDqnBBsnzdp7oVmCwTNrngbZq+pgnoXu5/NSnbU7vW6vCL+eBFXSQVWMZrvb38NYkpSDMIRhrSdBLzFTi5WhhEHWClMNCSbneAETlwrMQU9tMUzmP3ZK7M+lcp8wfqGuEfEFTXTFXP2FuFRQ9upWGMPc2VGc7EdgTF72M3t4fPQusweD4ev+GweQVBvJ69dbzLVe8Si/E5ul3qzlYlNtkpr5y6mlIkkNCFLOMU+Zb6SfG+/HVAExbOUSTBR1VvhkiRUmBtT6K7HOJ3c2CbgkknMsYhsqiA1cmWwSTK0Ny9mcltlOkGUbvQsFzvrN7pHz5G1eaWQilkIN+SBXmA1dpRFZFdbWoH+ONyBSYbGov1PKjYTbVzcLriFHpd7IJKlK2H+mGRW1CnOrHWwucj056XeDQXfw/nlnf1gt+Q56hPbQExSgF2gfHaIRGiOCPqEf6Cf65RHvs/fF+1q2bm9VzEO0Ft63P6IjVJ8=</latexit> ⌘mt+1 Muon mt+1 = mt + rL(✓t ) 慣性項 <latexit sha1_base64="LsH7AQBt4F+hCIDybv0KUO4ZQyE=">AAAD7XicdZJbaxNBFMenWS813lJ99GUxCJVCyAaJvgglFe1DC1GatpINy+zsSbJ0LsvMbNsw7LfQN/G138FX/RJ+G2cvCsmuBxbO/s/5zXD+c8KExkr3+7+3Ws6t23fubt9r33/w8NHjzs6TUyVSSWBCBBXyPMQKaMxhomNN4TyRgFlI4Sy8OMjrZ5cgVSz4iV4lMGN4weN5TLC2UtDpscDoPS976y8wY9hlgd7zOQ4p9hnWS4KpOcp2fb0EjQP9Muh0+71+EW498aqki6oYBzutGz8SJGXANaFYqanXT/TMYKljQiFr+6mCBJMLvICpTTlmoGamGCxzX1glcudC2o9rt1DXiOgyTlTFXP+FmJBQ9qq2H8HcWlP8mc9AqbgaZObw5PgoMwfD0bvBewuQVGnB6scbzJRasTA/05qhNmu52FSbpnr+ZmZinqQaOCnnmKfU1cLNH8GNYglE05VNMJGxtcIlSywx0SDXb4lUPrm1icMVEfaFeGR8CZGGa51NvZkxfjmb1TLT9bJso3chwVq/2T22nnzIK41MSFOoIZ/ECtORrTQiq8LaGvTP8QZESMwX9XtKuZGwu2tnwTXkuNQbmSSVCf3PNOOiVmF2tb3NRa4np4OeN+wNP77q7o+qJd9Gz9BztIs89Brto0M0RhNE0Bf0A/1EvxzhfHW+Od/L1tZWxTxFa+Hc/AF3O1mq</latexit> X = mt /(|mt | + ✏) <latexit sha1_base64="INKhROkmc0SXvm50TBMsRtWWjr4=">AAAD3XicdZJdaxNBFIanWT9q/Er1UpDBIFSEmA0Se1MoqWgvLERp2kg2hNnZk3To7MwyM9s2bPdO78Rbf4Lgrf4Z/42zHwrJrgeWPfOe88xwXo4fcaZNt/t7o+Fcu37j5uat5u07d+/db209ONYyVhRGVHKpxj7RwJmAkWGGwzhSQEKfw4l/tp/VT85BaSbFkVlGMA3JQrA5o8RYadZ6PMa7OJwZ/AJvX9n/FX6OPYg041I887xZq93tdPPA1cQtkzYqYzjbanz3AknjEIShnGg9cbuRmSZEGUY5pE0v1hARekYWMLGpICHoaZIPkuKnVgnwXCr7CYNzdYUIzlmkS+byLxRKBUWvbnoBzK0V+Sn5CJzLi16aHBwdvkuT/f7gde+NBWisjQyr1yck1HoZ+tmdxJzq9Vom1tUmsZnvTBMmotiAoMUc85hjI3FmOg6YAmr40iaEKmatwPSUKEINqNVXAp1Nbm0ScEFlGBIRJJ6CwMClSSfuNEm8YjarpUnbTdO13oUCa/1699B68jar1DI+j6GCfJBLwge2Uossc2sr0D/HaxCpiFhU3ynkWsLuqp2FVJDDQq9lolhF/D/TDPNaidnVdtcXuZoc9zpuv9N//7K9NyiXfBM9Qk/QNnLRK7SHDtAQjRBFn9AP9BP9cmbOZ+eL87VobWyUzEO0Es63P3oTUWs=</latexit> Adam for loop <latexit sha1_base64="hcu7JIK5zJuWJREk9Bj+qzd8oyg=">AAACNXicbVDLSsNAFJ34flt16SZYhIpYEhUfC0F048KFglWhKeFmOrVDZyZh5kYoId/iZ/gFbnXtwp269Rec1Cx8XRg495x7uWdOlAhu0POenaHhkdGx8YnJqemZ2bn5ysLipYlTTVmDxiLW1xEYJrhiDeQo2HWiGchIsKuod1zoV7dMGx6rC+wnrCXhRvEOp4CWCiv7Msxw3c8PgoghhL4Mcb3mb5TdWqAgEhBIwC4FkZ3mtQC7hYRrYaXq1b1BuX+BX4IqKessrLwF7ZimkimkAoxp+l6CrQw0cipYPhWkhiVAe3DDmhYqkMy0ssEXc3fVMm23E2v7FLoD9vtGBtKYvozsZGHW/NYK8j+tmWJnr5VxlaTIFP061EmFi7Fb5OW2uWYURd8CoJpbry7tggaKNtUfV9qmsJbbXPzfKfwFl5t1f6e+db5dPTwqE5ogy2SF1IhPdskhOSFnpEEouSMP5JE8OffOi/PqvH+NDjnlzhL5Uc7HJwCnq78=</latexit> mt+1 = 1 mt + (1 1 )rL(✓t ) 慣性項 2 vt + (1 2 <latexit sha1_base64="b0A57HXrWLeK1gdAKZRl7DCDL2w=">AAACN3icbVDLSsNAFJ34rO+qSzfBIlTEklRRQYSiGxcuFKwKTQ0306kdnEzCzE2hhHyMn+EXuNWlK1eKW//ASc3C14WBc8+5l3vmBLHgGh3n2RoZHRufmCxNTc/Mzs0vlBeXLnSUKMqaNBKRugpAM8ElayJHwa5ixSAMBLsMbo9y/bLPlOaRPMdBzNoh3Eje5RTQUH55v++nuOFmB17AEPx638eNqrtZdOuehECAFwL2KIj0JKt62MslXL+u++WKU3OGZf8FbgEqpKhTv/zqdSKahEwiFaB1y3VibKegkFPBsmkv0SwGegs3rGWghJDpdjr8ZGavGaZjdyNlnkR7yH7fSCHUehAGZjK3q39rOfmf1kqwu9dOuYwTZJJ+HeomwsbIzhOzO1wximJgAFDFjVeb9kABRZPrjysdnVvLTC7u7xT+got6zd2pbZ1tVxqHRUIlskJWSZW4ZJc0yDE5JU1CyR15II/kybq3Xqw36/1rdMQqdpbJj7I+PgF9f6x3</latexit> vt+1 = <latexit sha1_base64="xhnZcIhN39z2hCSemrfnC5bRMYE=">AAACMnicbVDLSsNAFJ34rPVVdekmWARBLEmV6kYounFZwT6gqWEynbRDJw9nboQS8id+hl/gVn9AdyLu/AgnaRa29TADh3Pu5d57nJAzCYbxri0sLi2vrBbWiusbm1vbpZ3dlgwiQWiTBDwQHQdLyplPm8CA004oKPYcTtvO6Dr1249USBb4dzAOac/DA5+5jGBQkl2qOXYMx2ZyabkCk9iSDwJi88RyKOD7zLGrSTKjqFcqGxUjgz5PzJyUUY6GXfq2+gGJPOoD4VjKrmmE0IuxAEY4TYpWJGmIyQgPaFdRH3tU9uLsvkQ/VEpfdwOhvg96pv7tiLEn5dhzVKWHYShnvVT8z+tG4F70YuaHEVCfTAa5Edch0NOw9D4TlAAfK4KJYGpXnQyxCgpUpFNT+jJdLc3FnE1hnrSqFbNWOb09K9ev8oQKaB8doCNkonNURzeogZqIoCf0gl7Rm/asfWif2tekdEHLe/bQFLSfX6nFqyE=</latexit> bt+1 = q 1 1 <latexit sha1_base64="BQOEnUDMHeQ6OrIAuzduradh4Ss=">AAAECHicdZLdihMxFMezHT/W+tXVS28GiyAslk5ZqjfC0hVdwYUq292VTimZzJk2bCYZk0x3S5gX8EW81Dvx1nfwwht9FTMfCu3UwDAn///5JZyTEySMKt3t/txqOFeuXru+faN589btO3dbO/dOlEglgRERTMizACtglMNIU83gLJGA44DBaXB+kPunC5CKCn6slwlMYjzjNKIEaytNW699PQeNp0bvetnzaqOf+Pbn+pHExMSllxlffZDaLKrtrg+JokzwLCiVaavd7XSL5dYDrwraqFrD6U7jkx8KksbANWFYqbHXTfTEYKkpYZA1/VRBgsk5nsHYhhzHoCamqDlzH1kldCMh7ce1W6grRLigiaqYy79QLCSUuarphxDZrhU78x4YExe9zBweH73JzEF/8KL30gIkVVrE9eMNjpVaxkF+JtZzte7l4iZvnOro2cRQnqQaOCnriFLmauHm7+OGVALRbGkDTCS1rXDJHNuH0CBXbwlVXrltE4cLIuIY89D4EkINlzobexNj/LI2q2Wmbd9sLXcmwbZ+PXtoe/IqdzYyAUuhhrwTS8wG1tmILIvW1qB/Hd+ACIn5rH5PKW8k7FjbWnANOSr1jUySyoT9p5ph4VWYHW1vfZDrwUmv4/U7/bd77f1BNeTb6AF6iB4jDz1F++gQDdEIEfQZ/UC/0G/no/PF+ep8K1MbWxVzH60s5/sfR4Jl0g==</latexit> ✓t+1 = ✓t <latexit sha1_base64="9aqJDLm7aAAL60r+3t9CC5EhAek=">AAADxnicdZJNbxMxEIbdLB8lfLVw5GIRIXGKshEKXJBKiiAHKgWUtKmSUHm9s6lVr72yvU0jayX+A1c485v4N3g/QEp2GWml8Tvz2DuvJkg406bX+73X8m7dvnN3/177/oOHjx4fHD451TJVFKZUcqlmAdHAmYCpYYbDLFFA4oDDWXB1nNfPrkFpJsXEbBJYxmQlWMQoMU46f4ff4hmefZ1cHHR63V4RuJ74VdJBVYwvDlu/FqGkaQzCUE60nvu9xCwtUYZRDll7kWpICL0iK5i7VJAY9NIWf5zhF04JcSSV+4TBhbpFhNcs0RVz8xeKpYKyV7cXIURu5uJkz4Fzue5ndjQ5+ZTZ48Hwff+DA2iqjYzr11sSa72Jg/xOYi71bi0Xm2rz1ERvlpaJJDUgaDlHlHJsJM7dxSFTQA3fuIRQxZwVmF4SRagBtf1KqPPJnU0C1lTGMRGhXSgIDdyYbO4vrV2Uszktsx0/y3Z6Vwqc9bvdY+fJx7zSyAQ8hRryRW4IH7pKI7IprK1B/xxvQKQiYlV/p5QbCbeUbhZSQ05KvZFJUpXw/0wzLmoV5lbb313kenLa7/qD7uDzq87RsFryffQMPUcvkY9eoyM0QmM0RRTF6Dv6gX56I094qbcuW1t7FfMUbYX37Q+9Ikki</latexit> A = XX B = c0 A + c1 AA X = c2 X + BX ot+1 = X 2 )rL(✓t ) 勾配分散項 t+1 2 初期バイアス補正項 t+1 1 mt+1 ⌘p bt+1 正規化 vt+1 + ✏ T <latexit sha1_base64="EAW7dzNnoLGBJhFvZrgoHqg3E1E=">AAADy3icdZJdb9MwFIa9ho9RvrZxyU1EhYSEVCXVVLiZtHYIdkGlgtZtqC2V45x21hw7sp1tweSS/8At3PCb+Dc4HyC1CUeKdPye89g5r04QM6q05/3eajm3bt+5u32vff/Bw0ePd3b3TpVIJIEJEUzI8wArYJTDRFPN4DyWgKOAwVlweZTXz65AKir4iU5jmEd4xemSEqyt9HnoHrhk4Q1ekoU/GCx2Ol7XK8KtJ36VdFAV48Vu69csFCSJgGvCsFJT34v13GCpKWGQtWeJghiTS7yCqU05jkDNTfHbmfvcKqG7FNJ+XLuFukaEVzRWFXPzF4qEhLJXtWchLO3gxcl8AsbEdS8zxyej95k56g/f9N5agCRKi6h+vcGRUmkU5HdifaE2a7nYVJsmevl6biiPEw2clHMsE+Zq4eYWuyGVQDRLbYKJpNYKl1xgiYkGuf5KqPLJrU0cromIIsxDM5MQarjR2dSfGzMrZ7NaZjp+lm30riRY6ze7x9aTd3mlkQlYAjXko0gxG9pKI5IW1tagf443IEJivqq/U8qNhN1MOwuuIaNSb2TiRMbsP9OMi1qF2dX2Nxe5npz2un6/2/+w3zkcVku+jZ6iZ+gF8tErdIiO0RhNEEESfUc/0E9n5Cjni/O1bG1tVcwTtBbOtz/6qEqm</latexit> <latexit sha1_base64="ce6eRWgV0LsQmQFECG7mtk5ZuQ4=">AAADyHicdZJda9swFIbVeB9d9tV2l7sxC4PBIMShZLsplHRsZayQjab1iEOQ5ZNUVJaMJDc1wjf7D7vdLveb9m8mf2yQ2DtgOHrPeSSflxMmjCo9GPze6Th37t67v/ug+/DR4ydP9/YPLpRIJYEpEUxIP8QKGOUw1VQz8BMJOA4ZXIbXJ0X98gakooKf6yyBeYxXnC4pwdpKM989csli6L8e+4u93qA/KMNtJl6d9FAdk8V+51cQCZLGwDVhWKmZN0j03GCpKWGQd4NUQYLJNV7BzKYcx6Dmpvzn3H1plchdCmk/rt1S3SCiG5qomrn9C8VCQtWrukEESzt1eTJfgTGxHubm9PzsU25ORuN3w/cWIKnSIm5eb3CsVBaHxZ1YX6ntWiG21WapXr6dG8qTVAMn1RzLlLlauIW/bkQlEM0ym2AiqbXCJVdYYqJBbr4SqWJyaxOHNRFxjHlkAgmRhludz7y5MUE1m9Vy0/PyfKt3JcFav909sZ58KCqtTMhSaCBfRIbZ2FZakay0tgH9c7wFERLzVfOdSm4l7FraWXADOav0ViZJZcL+M82krNWYXW1ve5GbycWw7436o8+HveNxveS76Dl6gV4hD71Bx+gUTdAUESTQd/QD/XQ+OomzdrKqtbNTM8/QRjjf/gA9aknc</latexit> <latexit sha1_base64="Iv2C7KbmvJHat6IP8HO4/RvLats=">AAADxnicdZJbaxNBFMenWS813lp99GUxCIIQsqFEX4SSiubBQpSmTUlCmJ09SYfOZZmZbboMC34HX/XZz+S3cfaikOx6YOHM/5zfzJ4/J4wZ1abX+73X8u7cvXd//0H74aPHT54eHD471zJRBCZEMqmmIdbAqICJoYbBNFaAecjgIrw+yesXN6A0leLMpDEsOF4LuqIEGyddyqU1b4Ls/XR50Ol1e0X49SSokg6qYrw8bP2aR5IkHIQhDGs9C3qxWVisDCUMsvY80RBjco3XMHOpwBz0whZ/nPmvnBL5K6ncJ4xfqFtEdENjXTG3fyEuFZS9uj2PYOVmLk72EhiTm35mR2ennzN7Mhh+6H90AEm0kbx+vcVc65SH+Z3YXOndWi421WaJWb1bWCrixIAg5RyrhPlG+rm7fkQVEMNSl2CiqLPCJ1dYYWJAbb8S6XxyZ5OADZGcYxHZuYLIwK3JZsHC2nk5m9My2wmybKd3rcBZv9s9dp58yiuNTMgSqCFfZYrZ0FUakbSwtgb9c7wBkQqLdf2dUm4k3FK6WXANOS31RiZOVMz+M824qFWYW+1gd5HryXm/Gwy6gy9HneNhteT76AV6iV6jAL1Fx2iExmiCCOLoO/qBfnojT3iJtylbW3sV8xxthfftD9bWSg0=</latexit> ✓t+1 = ✓t <latexit sha1_base64="21InqRv1foyeDciGDFDpo832EnA=">AAAD4XicdZJNb9NAEIa3NR8lfKVw5GI1QkJCRHGEAhekKkXQA5UCatqiOIrW60my6tpr7Y7bRivf4Ya48gO4cYW/wr9h/QFSYjOS5dl35tnVvJogEVxjr/d7a9u5dv3GzZ1brdt37t673959cKJlqhiMmRRSnQVUg+AxjJGjgLNEAY0CAafB+UFeP70ApbmMj3GVwDSii5jPOaNopVl7z8clIJ0ZfOplr6oDPvPtz5WlOmt3et1eEW498aqkQ6oYzXa3v/uhZGkEMTJBtZ54vQSnhirkTEDW8lMNCWXndAETm8Y0Aj01xTCZ+9gqoTuXyn4xuoW6RoQXPNEVc/UXiqSCsle3/BDm1o7iZD6CEPKyn5nD46N3mTkYDF/331iApRplVL/e0EjrVRTkd1Jc6s1aLjbVJinOX04Nj5MUIWblHPNUuCjd3Hg35AoYipVNKFPcWuGyJVWUIaj1V0KdT25tiuGSySiicWh8BSHCFWYTb2qMX85mtcx0vCzb6F0osNZvdo+sJ2/zSiMTiBRqyAe5omJoK43IqrC2Bv1zvAGRisaL+jul3EjYfbWz0BpyVOqNTJKqRPxnmlFRqzC72t7mIteTk37XG3QH75939ofVku+QR2SPPCEeeUH2ySEZkTFh5BP5QX6SXw5zPjtfnK9l6/ZWxTwka+F8+wOo41Sh</latexit> ⌘ot+1 Newton-Schultz による直交化
畳み込みニューラルネット 入力画像 GEMM 入力チャネル3,出力チャネル2の例 フィルタ https://arxiv.org/abs/1410.0759 Winograd 入力画像 フィルタ batched GEMM [入出力テンソルの次元] N: バッチサイズ C: チャネル数 H: 画像の高さ W: 画像の幅 [入力] [出力] N: 1 Cin: 3 Hin: 5 Win: 5 N: 1 Cout: 2 Hout: 3 Wout: 3 P: パディングの幅 S: ストライド https://www.slideshare.net/nervanasys/an-analysis-of-convolution-for-inference FFT [畳み込みのパラメータ] F: フィルタの大きさ 出力画像 F: 3 P: 1 S: 2 http://cs231n.stanford.edu/reports/2016/pdfs/117_Report.pdf
注意機構 (Attention) y = Wx <latexit sha1_base64="qp3Cl1iSGIm8rJKYevwXiPcYO6g=">AAADwXicdZLdbtMwFMe9ho9Rvja45CaiQuKqSipUuEGaWgS7YFJB6zrURJPjnLbWHDuyna2RlVfgFt6AZ+JtcD5AahOOFOn4f87PzvnrRCmjSnve74Oec+fuvfuHD/oPHz1+8vTo+NmFEpkkMCeCCXkZYQWMcphrqhlcphJwEjFYRNfTsr64Aamo4Oc6TyFM8JrTFSVYl1L+frG9Ohp4Q68Kt534TTJATcyujnu/gliQLAGuCcNKLX0v1aHBUlPCoOgHmYIUk2u8hqVNOU5Ahab62cJ9ZZXYXQlpP67dSt0h4huaqobZ/oUSIaHuVf0ghpUdtzqZb8CYuB0V5vT87HNhpuPJh9FHC5BMaZG0rzc4USpPovJOrDdqv1aKXbVlplfvQkN5mmngpJ5jlTFXC7c01o2pBKJZbhNMJLVWuGSDJSYa5O4rsSontzZxuCUiSTCPTSAh1rDVxdIPjQnq2axWmIFfFHu9awnW+v3umfXkU1npZCKWQQv5KnLMJrbSieSVtS3on+MdiJCYr9vv1HInYffRzoJbyFmtdzJpJlP2n2lmVa3B7Gr7+4vcTi5GQ388HH95MziZNEt+iF6gl+g18tFbdIJO0QzNEUEb9B39QD+dqUOd1JF1a++gYZ6jnXDMH4LbSDU=</latexit> Wがデータによって動的に変化する データベースを参照するように Convolution Global Attention (Token Mixer) Local Attention Point-wise MLP (Channel Mixer) MLP (Fully Connected)
正規化 (normalization) Batch normalization (BN) ✓ ◆ https://theaisummer.com/normalization/ Group normalization (GN) <latexit sha1_base64="RqN4YpnAkqmVd2gk3UwpB5hXPg8=">AAACN3icbZDPShxBEMZ7jEZjYlyTYy6DS2BFXGZUYiAEJF48ruCqsL0sNb01s43dM0N3TdhlmIfxMXyCXJNjTjkpXn0De3b34J980PDjqyqq+otyJS0FwV9v4dXi0uvllTerb9+tvV9vbHw4s1lhBHZFpjJzEYFFJVPskiSFF7lB0JHC8+jyqK6f/0RjZZae0iTHvoYklbEUQM4aNL7xEVA5rr7zBLQGrjCmFo8NiHK8w3XRGm9VJbcy0VAjNzIZ0dY2j5Bg0GgG7WAq/yWEc2iyuTqDxg0fZqLQmJJQYG0vDHLql2BICoXVKi8s5iAuIcGewxQ02n45/WTlf3bO0I8z415K/tR9PFGCtnaiI9epgUb2ea02/1frFRR/7ZcyzQvCVMwWxYXyKfPrxPyhNChITRyAMNLd6osRuITI5fpky9DWp1Uul/B5Ci/hbLcdfmnvnew3D3/ME1phn9gma7GQHbBDdsw6rMsEu2K/2G/2x7v2/nm33t2sdcGbz3xkT+TdPwDfNq3S</latexit> x̂ = <latexit sha1_base64="RqN4YpnAkqmVd2gk3UwpB5hXPg8=">AAACN3icbZDPShxBEMZ7jEZjYlyTYy6DS2BFXGZUYiAEJF48ruCqsL0sNb01s43dM0N3TdhlmIfxMXyCXJNjTjkpXn0De3b34J980PDjqyqq+otyJS0FwV9v4dXi0uvllTerb9+tvV9vbHw4s1lhBHZFpjJzEYFFJVPskiSFF7lB0JHC8+jyqK6f/0RjZZae0iTHvoYklbEUQM4aNL7xEVA5rr7zBLQGrjCmFo8NiHK8w3XRGm9VJbcy0VAjNzIZ0dY2j5Bg0GgG7WAq/yWEc2iyuTqDxg0fZqLQmJJQYG0vDHLql2BICoXVKi8s5iAuIcGewxQ02n45/WTlf3bO0I8z415K/tR9PFGCtnaiI9epgUb2ea02/1frFRR/7ZcyzQvCVMwWxYXyKfPrxPyhNChITRyAMNLd6osRuITI5fpky9DWp1Uul/B5Ci/hbLcdfmnvnew3D3/ME1phn9gma7GQHbBDdsw6rMsEu2K/2G/2x7v2/nm33t2sdcGbz3xkT+TdPwDfNq3S</latexit> x̂ = x <latexit sha1_base64="dpuT1CUY56ArR9bRnSs4Ys0fxb4=">AAACPHicbZDNSgMxFIUz/tb/UZduBotQN2VGRd0IRTddlQrWCp06ZNKMDU0yQ5LRljCv42P4BG4V3OtK3Lo201ZQ64XAxzn3cm9OmFAileu+WFPTM7Nz84WFxaXlldU1e33jUsapQLiBYhqLqxBKTAnHDUUUxVeJwJCFFDfD3lnuN2+xkCTmF2qQ4DaDN5xEBEFlpMCu+Cwt9XdP/EhApL1M16rNzJcpu64Fmp94I64GuvvNzUDfGe4bG3XvssAuumV3WM4keGMognHVA/vN78QoZZgrRKGULc9NVFtDoQiiOFv0U4kTiHrwBrcMcsiwbOvhTzNnxygdJ4qFeVw5Q/XnhIZMygELTSeDqiv/ern4n9dKVXTc1oQnqcIcjRZFKXVU7OSxOR0iMFJ0YAAiQcytDupCk5ky4f7a0pH5aXku3t8UJuFyr+wdlvfPD4qV03FCBbAFtkEJeOAIVEAV1EEDIHAPHsETeLYerFfr3foYtU5Z45lN8Kuszy9NhbAf</latexit> <latexit sha1_base64="mMlCkP7IjuvtvnbWTjoEnGSHv/w=">AAACUXicbZDLSgMxFIZPx/u96tLNYBHqwjKjom4E0U1XomCt0KlDJs20wSQzJhm1hHkyH8OVSxdu9AncmWkreDsQ+Pj/k5yTP0oZVdrznkvO2PjE5NT0zOzc/MLiUnl55VIlmcSkgROWyKsIKcKoIA1NNSNXqSSIR4w0o5uTwm/eEaloIi50PyVtjrqCxhQjbaWw3AgU7XJUfdg8DNSt1CaIJcLGz81pvZkHKuPXp6ERh/6Q66HpfXEzNPeWqw/Wx737fCvgmX1n83o7D8sVr+YNyv0L/ggqMKqzsPwadBKccSI0Zkiplu+lum2Q1BQzks8GmSIpwjeoS1oWBeJEtc3g+7m7YZWOGyfSHqHdgfr9hkFcqT6PbCdHuqd+e4X4n9fKdHzQNlSkmSYCDwfFGXN14hZZuh0qCdasbwFhSe2uLu4hm5+2if+Y0lHFakUu/u8U/sLlds3fq+2c71aOjkcJTcMarEMVfNiHI6jDGTQAwyO8wBu8l55KHw44zrDVKY3urMKPcuY+AT9ntVY=</latexit> µ(x) (x) + N W H XXX 1 µ(x) = xnchw N HW n=1 h=1 w=1 v u N X H X W u 1 X (x) = t (xnchw N HW n=1 w=1 <latexit sha1_base64="+mSuVgDmFbFdQz9MORmrfpjKhzI=">AAACQHicbZBNSwMxEIazflu/qh69LBZBL3W3inoRxHrosYL9gG5dsmm2DSbZJclaS9g/5M/wF3jVu+BNvHoy21aw1YHAM+/MMJM3iCmRynFerJnZufmFxaXl3Mrq2vpGfnOrLqNEIFxDEY1EM4ASU8JxTRFFcTMWGLKA4kZwV87qjXssJIn4jRrEuM1gl5OQIKiM5OevPJbsPxyce6GASJdSXa40Uk8m7FaXD0upr9G5O8orvu79cMPXfcMPvuao10/9fMEpOsOw/4I7hgIYR9XPv3mdCCUMc4UolLLlOrFqaygUQRSnOS+ROIboDnZxyyCHDMu2Hv42tfeM0rHDSJjHlT1Uf09oyKQcsMB0Mqh6crqWif/VWokKz9qa8DhRmKPRojChtorszDq7QwRGig4MQCSIudVGPWh8U8bgiS0dmZ2W+eJOu/AX6qWie1I8uj4uXFyOHVoCO2AX7AMXnIILUAFVUAMIPIJn8AJerSfr3fqwPketM9Z4ZhtMhPX1DSxusYA=</latexit> <latexit sha1_base64="VgwH0utETMAfcEv0IoDUtgzOTWk=">AAACVXicbZDdTsIwGIbLxD/8Qz30ZJGY4IG4oVFPSIyccIiJiIbh0pUOGttutp1Cml2bl2G8AE/1CkzsABNRv6TJ2/f7zRPElEjlOK85ay4/v7C4tFxYWV1b3yhubl3LKBEIt1BEI3ETQIkp4biliKL4JhYYsoDidnBfz/LtRywkifiVGsW4y2Cfk5AgqIzlF289SfoMlof7NU8+CKW9UECkq6muN9qpJxN2p+uH1dTXqOZO/g1fD75129dPRpeHvuZo8JQeeCwxs/bvTEex5FSccdh/hTsVJTCNpl9883oRShjmClEoZcd1YtXVUCiCKE4LXiJxDNE97OOOkRwyLLt6jCC194zTs8NImMeVPXZ/dmjIpByxwFQyqAbydy4z/8t1EhWedTXhcaIwR5NFYUJtFdkZT7tHBEaKjoyASBBzq40G0DBUhvrMlp7MTsu4uL8p/BXX1Yp7Ujm6PC6dX0wJLYEdsAvKwAWn4Bw0QBO0AALP4A28g4/cS+7TylsLk1IrN+3ZBjNhbXwBPOS2tw==</latexit> µ(x))2 <latexit sha1_base64="jWwokzwi9lm2DSevW/HMsFrzSug=">AAACPHicbZDNSgMxFIUz/tb6V3XpZrAIdVNmVNSNUOymywq2FTp1yKQZG0wyQ5KxljCv42P4BG4V3OtK3Lo2045g1QOBj3Pv5d6cIKZEKsd5sWZm5+YXFgtLxeWV1bX10sZmW0aJQLiFIhqJywBKTAnHLUUUxZexwJAFFHeCm3pW79xiIUnEL9Qoxj0GrzkJCYLKWH6p5rGkcrd36oUCIu2mut7opJ5M2FXd1+jUnXDD14Nv7vh6aPjO1xwNhqlfKjtVZyz7L7g5lEGupl968/oRShjmClEoZdd1YtXTUCiCKE6LXiJxDNENvMZdgxwyLHt6/NPU3jVO3w4jYR5X9tj9OaEhk3LEAtPJoBrI37XM/K/WTVR40tOEx4nCHE0WhQm1VWRnsdl9IjBSdGQAIkHMrTYaQJOZMuFObenL7LQsF/d3Cn+hvV91j6oH54fl2lmeUAFsgx1QAS44BjXQAE3QAgjcg0fwBJ6tB+vVerc+Jq0zVj6zBaZkfX4BE+av/g==</latexit> <latexit sha1_base64="ih89Pzc2EBidrxV+yRGOgCPpnQE=">AAACUXicbZC7TsMwFIZPw/1eYGSJqJDKQJUAAhYkRJeOIFGK1JTIcZ3WwnaC7QCVlSfjMZgYGVjgCdhw2iJxO5KlT/+5+o9SRpX2vOeSMzE5NT0zOze/sLi0vFJeXbtUSSYxaeKEJfIqQoowKkhTU83IVSoJ4hEjreimXuRbd0QqmogLPUhJh6OeoDHFSFspLDcDRXscVR+2jwN1K7UJYomw8XNTb7TyQGX8uh4afOyPuBGa/he3QnNvufoQGoH79/lOwDM7Z/t6Nw/LFa/mDcP9C/4YKjCOs7D8GnQTnHEiNGZIqbbvpbpjkNQUM5LPB5kiKcI3qEfaFgXiRHXM8Pu5u2WVrhsn0j6h3aH6vcMgrtSAR7aSI91Xv3OF+F+unen4qGOoSDNNBB4tijPm6sQtvHS7VBKs2cACwpLaW13cR9Y/bR3/saWritMKX/zfLvyFy92af1DbO9+vnJyOHZqFDdiEKvhwCCfQgDNoAoZHeIE3eC89lT4ccJxRqVMa96zDj3AWPgEEPLU1</latexit> + C W H 1 XX X µ(x) = xnchw CHW c=1 h=1 w=1 v u C X H X W u 1 X (x) = t (xnchw CHW c=1 w=1 h=1 C/2 H + W 2 XX X µ(x) = xnchw CHW c=1 h=1 w=1 v u C/2 H W u 2 X XX t (x) = (xnchw CHW c=1 w=1 <latexit sha1_base64="Pt2kdQmIa3vxI/2P4C+R6402oY0=">AAACK3icbZDNTsJAFIWn+If4h7p000hMYCG2atSNCdGNS0yEklBCpsMUJsy0zcytCWn6GD6GT+BWn8CVxi3v4RRYCHiSSb6ce2/uneNFnCmwrC8jt7K6tr6R3yxsbe/s7hX3D5oqjCWhDRLyULY8rChnAW0AA05bkaRYeJw63vA+qzvPVCoWBk8wimhH4H7AfEYwaKtbPHMHGBInvXU59aHs+hKTxDl1RVx2KmniKtYXOENXsv4AKt1iyapaE5nLYM+ghGaqd4tjtxeSWNAACMdKtW0rgk6CJTDCaVpwY0UjTIa4T9saAyyo6iSTj6XmiXZ6ph9K/QIwJ+7fiQQLpUbC050Cw0At1jLzv1o7Bv+mk7AgioEGZLrIj7kJoZmlZPaYpAT4SAMmkulbTTLAOhvQWc5t6anstFTnYi+msAzN86p9Vb14vCzV7mYJ5dEROkZlZKNrVEMPqI4aiKAX9Ibe0Yfxanwa38bPtDVnzGYO0ZyM8S8fuqhU</latexit> Ŵ = <latexit sha1_base64="RqN4YpnAkqmVd2gk3UwpB5hXPg8=">AAACN3icbZDPShxBEMZ7jEZjYlyTYy6DS2BFXGZUYiAEJF48ruCqsL0sNb01s43dM0N3TdhlmIfxMXyCXJNjTjkpXn0De3b34J980PDjqyqq+otyJS0FwV9v4dXi0uvllTerb9+tvV9vbHw4s1lhBHZFpjJzEYFFJVPskiSFF7lB0JHC8+jyqK6f/0RjZZae0iTHvoYklbEUQM4aNL7xEVA5rr7zBLQGrjCmFo8NiHK8w3XRGm9VJbcy0VAjNzIZ0dY2j5Bg0GgG7WAq/yWEc2iyuTqDxg0fZqLQmJJQYG0vDHLql2BICoXVKi8s5iAuIcGewxQ02n45/WTlf3bO0I8z415K/tR9PFGCtnaiI9epgUb2ea02/1frFRR/7ZcyzQvCVMwWxYXyKfPrxPyhNChITRyAMNLd6osRuITI5fpky9DWp1Uul/B5Ci/hbLcdfmnvnew3D3/ME1phn9gma7GQHbBDdsw6rMsEu2K/2G/2x7v2/nm33t2sdcGbz3xkT+TdPwDfNq3S</latexit> x̂ = µ(x) (x) µ(x))2 Weight standardization (WS) Layer normalization (LN) ✓ ◆ µ(x) (x) x ◆ h=1 h=1 x ✓ ✓ W µ(W ) (W ) <latexit sha1_base64="kB8kd70X3SXw/F5W+lkWy3s6d8o=">AAACUXicbZDNSsNAFIVv41+tf1WXboJF0E1JVNSNUCyULivYptDUMJlO2sGZJMxMhBLyZD6GK5cu3OgTuHPSZtGqFwbO/c4d5s7xY0alsqy3krGyura+Ud6sbG3v7O5V9w96MkoEJl0csUj0fSQJoyHpKqoY6ceCIO4z4vhPzdx3nomQNAof1DQmQ47GIQ0oRkojr9p1eXLqnN26gUA4Pc/SZstrtzwnc2XCH9Nm5qX41i46bek+8NoLxJkRRxNHj2ovyFG1ZtWtWZl/hV2IGhTV8aof7ijCCSehwgxJObCtWA1TJBTFjGQVN5EkRvgJjclAyxBxIofp7PuZeaLJyAwioU+ozBldvJEiLuWU+3qSIzWRv70c/ucNEhXcDFMaxokiIZ4/FCTMVJGZZ2mOqCBYsakWCAuqdzXxBOkglU586ZWRzFfLc7F/p/BX9M7r9lX94v6y1rgrEirDERzDKdhwDQ1oQwe6gOEF3uETvkqvpW8DDGM+apSKO4ewVMbWD9qTtTQ=</latexit> µ(W ) = <latexit sha1_base64="PSqEPc30bQy6FHyPWYe6IWsLh00=">AAACZnicbVFNS8MwGM7q9/dUxIOX4BDmwdFOUS/CcDB2VHBWWGdJs3SLS9qapMII/Y9e/QGCv8CrprMH3Xwh8LzP837xJEgYlcq230rW3PzC4tLyyura+sbmVnl7517GqcCkg2MWi4cAScJoRDqKKkYeEkEQDxhxg1Ez190XIiSNozs1TkiPo0FEQ4qRMpRffvIkHXBUdY+vPPkslPZCgbB2Mt1s+e2W72aeTPmjbma+xldOkRnJ5KHf/sW4E8Y1TNU1tUY0WXbi8dQMP36sZ365YtfsScBZ4BSgAoq48cvvXj/GKSeRwgxJ2XXsRPU0EopiRrJVL5UkQXiEBqRrYIQ4kT098SSDR4bpwzAW5kUKTtjfHRpxKcc8MJUcqaGc1nLyP62bqvCyp2mUpIpE+GdRmDKoYpgbDPtUEKzY2ACEBTW3QjxExlRlvuHPlr7MT8t9caZdmAX39ZpzXju9Pas0rguHlsEBOARV4IAL0ABtcAM6AINX8Am+SqD0YW1ae9b+T6lVKnp2wZ+w4Dc1/7tI</latexit> µ(x))2 2 FH FW C X X X WcfH fW FH FW C X X X (WcfH fW CFH FW c=1 v u u (W ) = t ◆ 1 fH =1 fW =1 CFH FW c=1 µ(W ))2 fH =1 fW =1 BN + LN BN + LN (higher is better) BN + LN BN + LN
データ拡張 (data augmentation) Flipping Mixup AutoAugment 強化学習を使って最適なデータ拡張を探索 Fast AutoAugment 強化学習+ベイズ最適化により探索時間短縮 Random crop CutMix Scale Faster AutoAugment AugMix 勾配ベースの探索によりさらに時間短縮 Rotation Cutout Random Erasing https://github.com/xkumiyu/numpy-data-augmentation https://openreview.net/pdf?id=S1gmrxHFvB (lower is better)
正則化 (regularization) 損失関数 data X L= Sharpness Aware Minimization (SAM) data X L= max[ log p(W + ✏)] <latexit sha1_base64="PTOETKQJ9sDV108G8utVEdMYksY=">AAACGnicbVDLSsNAFJ34rPUVdSnIYBHcWJIi6kYounHhooJ9QBPLzWTaDp08mJkIJWTnZ/gFbvUL3IlbN36A/+Gk7cK2Hhg4nHMv98zxYs6ksqxvY2FxaXlltbBWXN/Y3No2d3YbMkoEoXUS8Ui0PJCUs5DWFVOctmJBIfA4bXqD69xvPlIhWRTeq2FM3QB6IesyAkpLHfPACUD1CfD0Nrt0ZBI8pD4oyE4cHvVw3DFLVtkaAc8Te0JKaIJax/xx/IgkAQ0V4SBl27Zi5aYgFCOcZkUnkTQGMoAebWsaQkClm47+keEjrfi4Gwn9QoVH6t+NFAIph4GnJ/PUctbLxf+8dqK6F27KwjhRNCTjQ92EYxXhvBTsM0GJ4kNNgAims2LSBwFE6eqmrvgyj5bpXuzZFuZJo1K2z8qVu9NS9WrSUAHto0N0jGx0jqroBtVQHRH0hF7QK3ozno1348P4HI8uGJOdPTQF4+sXt5uh/g==</latexit> L2正則化 data X L= log p <latexit sha1_base64="9vmQ2Pyai29yDGuUrPziv2Rg5HE=">AAACPnicbVBNSxxBEO0xfn+uydFL4yIo4jKjol6EZb3kkIOBrCvsjEtNT+1uY0/30N0jLsP8n/yM/IJcE39AvEmuOaZnXcGvBwWP96qoqhdnghvr+3fe1Ifpmdm5+YXFpeWV1bXa+scLo3LNsM2UUPoyBoOCS2xbbgVeZhohjQV24uuzyu/coDZcyW92lGGUwkDyPmdgndSrtcIU7JCBKL6Up6HJ06siAQulk297RYiZ4ULJUGCoh6rs7oVCDWi23dl9snaiXq3uN/wx6FsSTEidTHDeq/0JE8XyFKVlAozpBn5mowK05UxguRjmBjNg1zDArqMSUjRRMf61pFtOSWhfaVfS0rH6fKKA1JhRGrvO6jPz2qvE97xubvsnUcFllluU7HFRPxfUKloFRxOukVkxcgSY5u5WyoaggVkX74stialOK10uwesU3pKL/UZw1Dj4elhvtiYJzZMNskm2SUCOSZN8JuekTRj5Tn6SX+S398O79x68v4+tU95k5hN5Ae/ff/ElsWk=</latexit> ✏⇢ https://arxiv.org/abs/2010.01412 <latexit sha1_base64="Yy8xPfnTLiqjWpL4JDjdct97CAw=">AAACJ3icbVDLSgMxFM34rPU16tJNsAhCscxUUTdC0Y0LFxXsAzq13MmkNTTzIMkIZTof4Wf4BW71C9yJLl34H2baLmz1QOBwzr259x434kwqy/o05uYXFpeWcyv51bX1jU1za7suw1gQWiMhD0XTBUk5C2hNMcVpMxIUfJfThtu/zPzGAxWShcGtGkS07UMvYF1GQGmpYxYdH9Q9AZ5cp+eOjP27xAMF6aHDwx6Oig7Xf3kwbAzvyh2zYJWsEfBfYk9IAU1Q7ZjfjheS2KeBIhykbNlWpNoJCMUIp2neiSWNgPShR1uaBuBT2U5GR6V4Xyse7oZCv0Dhkfq7IwFfyoHv6srsBDnrZeJ/XitW3bN2woIoVjQg40HdmGMV4iwh7DFBieIDTYAIpnfF5B4EEKVznJriyWy1VOdiz6bwl9TLJfukdHRzXKhcTBLKoV20hw6QjU5RBV2hKqohgh7RM3pBr8aT8Wa8Gx/j0jlj0rODpmB8/QDq3qdI</latexit> log p + |W |2 L1正則化 <latexit sha1_base64="qqFUGj5bRbXJTuGZgDLorSbuXL0=">AAACJXicbVDLSgMxFM3UV62vqks3wSIoYplRUTdC0Y0LFxXsAzq13MmkbWjmQZIRynS+wc/wC9zqF7gTwZUr/8NM24VtPRA4nHNv7r3HCTmTyjS/jMzc/MLiUnY5t7K6tr6R39yqyiAShFZIwANRd0BSznxaUUxxWg8FBc/htOb0rlO/9kiFZIF/r/ohbXrQ8VmbEVBaauUPbA9UlwCPb5NLW0beQ+yCguTI5kEHh4c213+5MKgNWvmCWTSHwLPEGpMCGqPcyv/YbkAij/qKcJCyYZmhasYgFCOcJjk7kjQE0oMObWjqg0dlMx6elOA9rbi4HQj9fIWH6t+OGDwp+56jK9MD5LSXiv95jUi1L5ox88NIUZ+MBrUjjlWA03ywywQlivc1ASKY3hWTLgggSqc4McWV6WqJzsWaTmGWVI+L1lnx5O60ULoaJ5RFO2gX7SMLnaMSukFlVEEEPaEX9IrejGfj3fgwPkelGWPcs40mYHz/AprfpqQ=</latexit> L= data X Flooding data X L= | log p + |W | Dropout <latexit sha1_base64="+29JRp4dO+SSQAn2+lrjhc+5WsE=">AAACIHicbVDLSgMxFM34rPVVdekmWAVBWmZU1I1QdOPCRQX7gE4tdzJpG5p5kGSEMp0f8DP8Arf6Be7Epe79DzNtF7b1QOBwzr3ck+OEnEllml/G3PzC4tJyZiW7ura+sZnb2q7KIBKEVkjAA1F3QFLOfFpRTHFaDwUFz+G05vSuU7/2SIVkgX+v+iFtetDxWZsRUFpq5fZtD1SXAI9vk0tbRt5D7IKCZFCwedDBYcEZHDmtXN4smkPgWWKNSR6NUW7lfmw3IJFHfUU4SNmwzFA1YxCKEU6TrB1JGgLpQYc2NPXBo7IZD3+T4AOtuLgdCP18hYfq340YPCn7nqMn0+xy2kvF/7xGpNoXzZj5YaSoT0aH2hHHKsBpNdhlghLF+5oAEUxnxaQLAojSBU5ccWUaLdG9WNMtzJLqcdE6K57cneZLV+OGMmgX7aFDZKFzVEI3qIwqiKAn9IJe0ZvxbLwbH8bnaHTOGO/soAkY378ypqRP</latexit> log p b| + b https://arxiv.org/abs/2002.08709 Stochastic depth https://arxiv.org/abs/1603.09382
分散並列化 データ並列 (DP) シーケンス並列 (SP) シャード並列 (ZeRO/FSDP) パイプライン並列 (PP) テンソル並列 (TP) コンテキスト並列 (CP) データ:分散 (batch) モデル:冗長 通信内容:勾配 通信形式:AllReduce 通信頻度:ステップ毎 長所:実装が簡単 短所:ラージバッチ問題 メモリ消費量 データ:分散 (batch) モデル:一時的に分散 通信内容:勾配+重み 通信形式:ReduceScatter +AllGather 通信頻度:層毎 長所:実装が簡単 省メモリ 短所:ラージバッチ問題 データ:冗長 モデル:分散 (layer) 通信内容:活性 通信形式:SendRecv 通信頻度:層毎 長所:省メモリ 演算量低減 短所:パイプラインバブル データ:冗長 モデル:分散 (hidden/sequence) 通信内容:活性 通信形式:AllReduce 通信頻度:層毎 長所:省メモリ 演算量低減 短所:通信オーバーヘッド オーバーラップ困難 実装が複雑
シャード並列 (ZeRO/FSDP) データ並列におけるモデルの冗長性を緩和 → 消費メモリはプロセス数に反比例 ZeRO-1:Optimizer stateを分散 ZeRO-2:Optimizer state+勾配を分散 ZeRO-3:Optimizer state+勾配+重みを分散(重みをAllGather) AllGatherして逐次計算をするMPIコードのようなもの → 演算量はデータ並列と同じ バッチサイズが増大する問題もデータ並列と同じ データ並列の通信量:AllReduce = ReduceScatter + AllGather ZeRO-3の通信量:ReduceScatter + AllGather x 2 (データ並列の1.5倍) GPU2 GPU1 GPU0
パイプライン並列 Local batchをさらに細かくmicro batchに分けてパイプライン処理する 隣の層を担当するプロセスとの近接通信(SendRecv) 自分が担当する層のみを計算する → メモリも演算もプロセス数に反比例 grad accumulationはpipeline bubble対策になりうる パイプラインバブルを低減するために様々な手法が提案されている ・GPipe:micro batch [https://arxiv.org/abs/1811.06965] ・PipeDream:非同期 [https://arxiv.org/abs/1806.03377] ・Interleaved 1F1B:層数を プロセス数x2に分割 GPU2 GPU1 GPU0 [https://arxiv.org/abs/2104.04473] ・Chimera:双方向パイプライン [https://arxiv.org/abs/2107.06925] ・ZeroBubble:Backward のW微分とx微分を分ける [https://arxiv.org/abs/2401.10241] GPU2 GPU1 GPU0
テンソル・コンテキスト並列 層内の行列積の分散並列処理 → メモリも演算もプロセス数に反比例 → 実装がモデルアーキテクチャに依存するため自動並列化が困難 → 次の層の計算をするには必ず活性のAllReduceが必要 AllReduceをするには前の層の計算が完了していることが必要 →つまり、計算と通信をオーバーラップする余地が少ない Ef cient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM GPU2 GPU1 fi GPU0
様々な分散並列化(まとめ) データ並列 (DP) シーケンス並列 (SP) シャード並列 (ZeRO/FSDP) パイプライン並列 (PP) テンソル並列 (TP) コンテキスト並列 (CP) データ:分散 (batch) モデル:冗長 通信内容:勾配 通信形式:AllReduce 通信頻度:ステップ毎 長所:実装が簡単 短所:ラージバッチ問題 メモリ消費量 データ:分散 (batch) モデル:一時的に分散 通信内容:勾配+重み 通信形式:ReduceScatter +AllGather 通信頻度:層毎 長所:実装が簡単 省メモリ 短所:ラージバッチ問題 データ:冗長 モデル:分散 (layer) 通信内容:活性 通信形式:SendRecv 通信頻度:層毎 長所:省メモリ 演算量低減 短所:パイプラインバブル データ:冗長 モデル:分散 (hidden/sequence) 通信内容:活性 通信形式:AllReduce 通信頻度:層毎 長所:省メモリ 演算量低減 短所:通信オーバーヘッド オーバーラップ困難 実装が複雑
4-D Parallelism プロセスを4次元分割し、 データ・シーケンス・シャード並列の軸、 パイプライン並列の軸、 テンソル並列の軸、 コンテキスト並列の軸 の4軸でそれぞれの分散並列化を行う それぞれのAllReduce等は サブコミュニケータ上で行われる よくある設定としては ノード内のGPU間でテンソル・コンテキスト並列 ラック内のノード間でパイプライン並列 ラック間でデータ・シーケンス・シャード並列 → ラックが増えるとラージバッチ問題が起きる
深層学習フレームワーク Megatron-LMによる並列化 Transformer Engineによる高速化 ・データ並列 ・テンソル並列 ・パイプライン並列 ・コンテキスト並列 ・エキスパート並列 ・FlashAttention1/2/3 ・Kernel fusion ・Gradient accumulation fusion ・FP8 training ・FP8 weight caching Megatron-LM Nemo Megatron Core Transformer Engine PyTorch cuDNN NCCL
並列化パラメータの設定 ・world̲size = tensor̲model̲parallel̲size * pipeline̲model̲parallel̲size * data̲parallel̲size ・global̲batch̲size = micro̲batch̲size * gradient̲accumulation̲steps * data̲parallel̲size ・hidden̲size, num̲attention̲heads, n̲hidden̲sizeはtensor̲model̲parallel̲sizeの倍数 ・num̲layersはpipeline̲model̲parallel̲size * virtual̲pipeline̲model̲parallel̲sizeの倍数 ・max̲position̲embeddings >= seq̲length ・--sequence-parallelは tensor̲model̲parallel̲sizeが2以上でないと意味がない ・micro-batch size:micro-batch数=PPの倍数 ff ・CUDA̲DEVICE̲MAX̲CONNECTIONS=1は--tp-comm-overlapに効く
メモリ消費量の見積もり 各変数の精度 メモリ消費の影響因子 Attention a g= k <latexit sha1_base64="JWBfTihLpZOjvyq8LXwlEmldy8c=">AAADynicdZJNa9tAEIY3Vj9S9ytpj72ImkJPxjLF6aUQnNLmkIBT4iTFEmG1GjmLV7vq7iqJWHTrf+i1PfU39d9k9dGCLXVAMPvOPLualwlTRpUejf5s9Zx79x883H7Uf/zk6bPnO7svzpTIJIE5EUzIixArYJTDXFPN4CKVgJOQwXm4Oijr59cgFRX8VOcpBAlechpTgrWVguUHP5aYGFyYVXG5MxgNR1W47cRrkgFqYna52/vtR4JkCXBNGFZq4Y1SHRgsNSUMir6fKUgxWeElLGzKcQIqMNVfF+4bq0RuLKT9uHYrdY2IrmmqGub2L5QICXWv6vsRxHbu6mS+AmPiZlyYw9Pjo8IcTKYfx58sQDKlRdK+3uBEqTwJyzuxvlKbtVLsqi0yHb8PDOVppoGTeo44Y64WbumwG1EJRLPcJphIaq1wyRW2LmuQ669Eqpzc2sThhogkwTwyvoRIw60uFl5gjF/PZrXCDLyi2OhdSrDWb3bPrCefy0onE7IMWsgXkWM2tZVOJK+sbUH/HO9AhMR82X6nljsJu5h2FtxCjmu9k0kzmbL/TDOrag1mV9vbXOR2cjYeepPh5OTdYH/aLPk2eoVeo7fIQ3toHx2iGZojgr6hH+gn+uUcOdLJHVO39rYa5iVaC+f7Hc2UTGI=</latexit> GQA group size Linear WK = WV = h ⇥ h/g <latexit sha1_base64="Xld7trZvlGDaqCSSGEovgY1RO34=">AAAD0nicdZJbi9NAFMdnGy9rvXXdR1+CRfCpNkWqLwtLV3RBF6r0stKUMpmctMNOMmFmsrtxyIP46nfwUf1IfhsnF4U2cSBw5n/Ob07On+PFjErV7//ea1k3bt66vX+nfffe/QcPOwePZpIngsCUcMbFuYclMBrBVFHF4DwWgEOPwdy7OMnz80sQkvJootIYliFeRzSgBCsjrTqH89W7o/lqdrRxFQ1B2pvn61Wn2+/1i2PXA6cKuqg649VB64frc5KEECnCsJQLpx+rpcZCUcIga7uJhBiTC7yGhQkjbDotdfH3mf3UKL4dcGG+SNmFukX4lzSWFXP9Fwq5gLJWtl0fAjN/cdOfgDF+Ncj06eTsfaZPhqPXgzcGIIlUPKw/r3EoZRp6+ZtYbeRuLhebcotEBa+WmkZxoiAi5RxBwmzF7dxp26cCiGKpCTAR1Fhhkw0WmCgQ2118mU9ubIrgivAwxJGvXQG+gmuVLZyl1m45m9Ey3XWybKd2LcBYv1s9Np68zTONjMcSqCEfeYrZyGQakbSwtgb9c7wB4QJH63qfUm4kzIKaWXANOSv1RiZORMz+M824yFWYWW1nd5HrwWzQc4a94YcX3eNRteT76DF6gp4hB71Ex+gUjdEUEZSi7+gn+mVNrM/WF+trWdraq5hDtHWsb38AiX1OAg==</latexit> WQ = WO = h ⇥ h <latexit sha1_base64="w/rDWupl+k5FBgmwGyXV21yBfI4=">AAAD0HicdZJda9swFIbVeh9d9pVsl7sxC4NdhTiMdDeFko6tFytLS9N0xCHI8kkiKktGkpsEYcZu9x92u/6m/ZvJHxsk9gSGo/ecR8fn5QQxo0p3u7/39p179x88PHjUePzk6bPnzdaLKyUSSWBEBBPyOsAKGOUw0lQzuI4l4ChgMA5uTrL8+BakooJf6k0M0wgvOJ1TgrWVZs3WeHZ+NJ59OVr6mkag3OWs2e52uvlxq4FXBm1UnuGstX/nh4IkEXBNGFZq4nVjPTVYakoYpA0/URBjcoMXMLEhx7bP1OT/nrpvrBK6cyHtx7Wbq1tEeEtjVTLrv1AkJBS1quGHMLfT5zfzFRgTq15qTi/PPqfmpD/40PtoAZIoLaLq8wZHSm2iIHsT66XazWViXW6S6Pn7qaE8TjRwUswxT5irhZv57IZUAtFsYwNMJLVWuGSJJSYa5HaXUGWTW5s4rIiIIsxD40sINax1OvGmxvjFbFZLTdtL053ahQRr/W710HryKcvUMgFLoIJciA1mA5upRTa5tRXon+M1iJCYL6p9CrmWsOtpZ8EV5KzQa5k4kTH7zzTDPFdidrW93UWuBle9jtfv9M/ftY8H5ZIfoFfoNXqLPHSIjtEpGqIRImiFfqJf6M65cNbON+d7Ubq/VzIv0dZxfvwBNUpNVw==</latexit> FFN Wup = Wgate = h ⇥ hf f n Wdown = hf f n ⇥ h <latexit sha1_base64="9UKvzcif1VFCOoYqnf+r6xfzRC8=">AAAD3nicdZJNb9NAEIa3MR8lfKVw5IBFhMQpiiMUuFSqUgQ9UCmgpimKo7Bej5NV17vW7rpttPIRbogr/4ADV/gx/BvWHyAlNitZnn1nnh3NqwkSRpXu93/vtJxr12/c3L3Vvn3n7r37nb0Hp0qkksCECCbkWYAVMMphoqlmcJZIwHHAYBqcH+b56QVIRQU/0esE5jFechpRgrWVFp3H04VJk2zf/pZYQ7a/8jWNQbmrhYkini063X6vXxy3HnhV0EXVGS/2Wt/9UJA0Bq4Jw0rNvH6i5wZLTQmDrO2nChJMzvESZjbk2Habm2KSzH1qldCNhLQf126hbhDhBU1UxVz9hWIhoaxVbT+EyHpR3MwHYExcDjJzdHL8NjOHw9GrwWsLkFRpEdefNzhWah0H+ZtYr9R2LhebcrNURy/nhvIk1cBJOUeUMlcLN3fdDakEotnaBphIaq1wyQpLTDTIzS6hyie3NnG4JCKOMQ+NLyHUcKWzmTc3xi9ns1pmul6WbdUuJVjrt6vH1pM3eaaRCVgKNeS9WGM2splGZF1YW4P+Od6ACIn5st6nlBsJu6x2FlxDjku9kUlSmbD/TDMuchVmV9vbXuR6cDroecPe8N3z7sGoWvJd9Ag9Qc+Qh16gA3SExmiCCPqEfqCf6Jfz0fnsfHG+lqWtnYp5iDaO8+0PWLVUSg==</latexit> <latexit sha1_base64="SgUcfndqVIrDYGu+LVcgR19OWDg=">AAAD13icdZJNixNBEIZ7d/xY41dWL4KXwSB4CpmwRC/CkhXdgwtRNptdMiH09NQkzfZ0D909mw3N6E28+h8ET/p//Df2fCgkMxYM1LxVTxf1UkHCqNK93u+dXefGzVu39+607t67/+Bhe//RmRKpJDAmggl5HmAFjHIYa6oZnCcScBwwmASXR3l9cgVSUcFP9TqBWYwXnEaUYG2lefvJZG5CseLZ6+XcRBHPfE1jUO5y3u70ur0i3HriVUkHVTGa7+/+8ENB0hi4JgwrNfV6iZ4ZLDUlDLKWnypIMLnEC5jalGM7Z2aKFTL3uVVCNxLSfly7hbpBhFc0URVz/ReKhYSyV7X8ECJrQvFnLoAxsepn5vj05H1mjgbDN/23FiCp0iKuP29wrNQ6DvI3sV6q7VouNtWmqY5ezQzlSaqBk3KPKGWuFm5utxtSCUSztU0wkdRa4ZIllphokJtTQpVvbm3isCIijjEPjS8h1HCts6k3M8Yvd7NaZjpelm31LiRY67e7R9aTd3mlkQlYCjXko1hjNrSVRmRdWFuD/jnegAiJ+aI+p5QbCXuldhdcQ05KvZFJUpmw/2wzKmoVZk/b2z7kenLW73qD7uDDQedwWB35HnqKnqEXyEMv0SE6RiM0RgR9Qt/RT/TLuXA+O1+cr2Xr7k7FPEYb4Xz7A96fUUs=</latexit> パラメータ数 Embedding +Linear RMS Norm RMS Norm
並列化を考慮した時のメモリ消費量の見積もり ベースライン データ並列 (ZeRO 1) コンテキスト並列 テンソル並列 パイプライン並列
GPUあたりのアクティベーション数 ベースライン FFN ✓ Attention ◆Embedding k hf f n sbh (12 + 4 + 8 )L + 8 a h <latexit sha1_base64="QDftby2kONfKl1Db77ld6qyg/Vc=">AAAD9nicdZLNattAEMc3Vj9S9yNOe+xF1BQcAsYywckxOKXNIQG35KtYxqxWI0t4pRW7q8Rm2UdpbyXXvkOv7QP0bbqy1IItdUAw+5/57TJ/jZfSSMhe7/dWw3rw8NHj7SfNp8+ev9hp7b68EizjBC4Jo4zfeFgAjRK4lJGkcJNywLFH4dqbn+T161vgImLJhVymMInxLImCiGBppGnrUHihSyGQnY7T3z9wA46JmmuF9f5RcQinKggSrVWo986MyKNZKPemrXav21uFXU2cMmmjMkbT3ca96zOSxZBIQrEQY6eXyonCXEaEgm66mYAUkzmewdikCY5BTNRqQm2/NYpvB4ybL5H2Sl0j/NsoFSWz+AvFjEPRK5quD4HxaHVSn4FSdtfX6vTi/Eyrk8HwXf+9AUgmJIur1yscC7GMvfxOLEOxWcvFuto4k8HRREVJmklISDFHkFFbMjv/G7YfcSCSLk2CCY+MFTYJsbFdAl9/xRf55MamBO4Ii2Oc+Mrl4EtYSD12Jkq5xWxG06rtaL3RO+NgrN/sHhlPPuSVWsajGVSQT2yJ6dBUapHlytoK9M/xGoRxnMyq7xRyLWGW2MyCK8h5odcyacZT+p9pRqtaiZnVdjYXuZpc9bvOoDv4eNA+HpZLvo1eozeogxx0iI7RKRqhS0TQF/QD/US/rIX11fpm3Retja2SeYXWwvr+B6rzXRA=</latexit> テンソル並列 <latexit sha1_base64="9X8lZsXe0VFI7t6GnJXp50TR74U=">AAAEAHicdZJNa9tAEIY3dj9S98tpj72ImoJDwFgmuDmmTmlzSMAtcZJiGbNajSzhlVbsrpKYZS/9M+mt9Nr/0GtL/01XHy3YUgcEs+/Ms8u8GjehoZD9/u+tRvPO3Xv3tx+0Hj56/ORpe+fZuWApJzAhjDJ+6WIBNIxhIkNJ4TLhgCOXwoW7PMrqF1fARcjiM7lKYBbhRRz6IcHSSPP2G8fnmCjhBlpJ7VDwZbdrD/b2C32pFdZ7B8UhmCvfj7VWgd49MSIPF4Hcnbc7/V4/D6ua2GXSQWWM5zuNW8djJI0gloRiIaZ2P5EzhbkMCQXdclIBCSZLvICpSWMcgZipfFZtvTKKZ/mMmy+WVq6uEd5VmIiSufkLRYxD0Stajge+cSs/qU9AKbseaHV8dnqi1dFw9HbwzgAkFZJF1esVjoRYRW52J5aB2KxlYl1tmkr/YKbCOEklxKSYw0+pJZmV/RfLCzkQSVcmwYSHxgqLBNjYLoGvv+KJbHJjUwzXhEURjj3lcPAk3Eg9tWdKOcVsRtOqY2u90bvgYKzf7B4bT95nlVrGpSlUkI9shenIVGqRVW5tBfrneA3COI4X1XcKuZYw62xmwRXktNBrmSTlCf3PNOO8VmJmte3NRa4m54OePewNP+x3Dkflkm+jF+gl6iIbvUaH6BiN0QQRdIt+oJ/oV/Nz80vza/Nb0drYKpnnaC2a3/8A0t9h0A==</latexit> sbh t ✓ k hf f n (12 + 4 + 8 )L + 8 a h ◆ パイプライン並列 <latexit sha1_base64="XQ5SoSHF/GQ+F+dIjgeRxeoM7i8=">AAAEFnicdZLNbhMxEMfdhI8SPprCkcuKCClRUMhGUcixShH00EoBNU1RNoq83tnsKt71yva2jSy/B2/AU8ANceXKmSu8A042ICW7WLI0/s/8bM3f4yY0FLLd/rFXKt+6fefu/r3K/QcPHx1UDx9fCJZyAiPCKOOXLhZAwxhGMpQULhMOOHIpjN3F8So/vgIuQhafy2UC0wjP49APCZZGmlXHjs8xUcINtJLaoeDLet3uNLuZvtAK62Y/OwQz5fux1irQjdNmP2k6HlCJZyp5Yetu3W5evQwaDg/ngWzMqrV2q71eVj6wN0ENbdZwdlj65HiMpBHEklAsxMRuJ3KqMJchoaArTiogwWSB5zAxYYwjEFO1dkBbz43iWT7jZsfSWqtbhHcVJmLD3PyFIsYhqxUV04tvPFyf1AeglF13tDo5PzvV6rg3eN15YwCSCsmi/PUKR0IsI3d1J5aB2M2txKLcJJV+f6rCOEklxCTrw0+pJZm1+i3LCzkQSZcmwISHxgqLBNh8hgS+/YonVp0bm2K4JiyKcOwph4Mn4UbqiT1Vysl6M5pWNVvrndo5B2P9bvXQePJ2lSlkXJpCDnnPlpgOTKYQWa6tzUH/HC9AGMfxPP9OJhcSZshNLziHnGV6IZOkPKH/6Wa4zm0wM9r27iDng4tOy+61eu+6taPBZsj30VP0DNWRjV6hI3SChmiECPqMfqJf6Hf5Y/lL+Wv5W1Za2tswT9DWKn//A1JAab8=</latexit> sbh t ✓ k hf f n (12 + 4 + 8 )L + 8p + p,1 4(1 + v/h) a h ◆ コンテキスト並列 <latexit sha1_base64="eFKD5oAIXC+jEZLhnutW4mf+mtQ=">AAAEF3icdZLNbhMxEMfdhI8SPprCkcuKCClRUMhGUcixShH00EoBNW1QNoq83tnsKt71yva2jSw/CE/AW8ANceXInSs8A042ICW7WLI0nv/8bM3f4yY0FLLd/rFXKt+6fefu/r3K/QcPHx1UDx9fCJZyAiPCKONjFwugYQwjGUoK44QDjlwKl+7ieKVfXgEXIYvP5TKBaYTnceiHBEuTmlXHjs8xUcINtJJEOxR8Wa/bnWY3ExZaYd3sZ4dgpnw/1loFunHa7CdNxwMq8UwlL2zdrdvNq5dBw+HhPJCNWbXWbrXXy8oH9iaooc0azg5LnxyPkTSCWBKKhZjY7UROFeYyJBR0xUkFJJgs8BwmJoxxBGKq1hZo67nJeJbPuNmxtNbZLcK7ChOxYW7+QhHjkNWKiunFNyauT+oDUMquO1qdnJ+danXcG7zuvDEASYVkUf56hSMhlpG7uhPLQOxqq2SRNkml35+qME5SCTHJ+vBTaklmrb7L8kIORNKlCTDhobHCIgE2nyGBb7/iiVXnxqYYrgmLIhx7yuHgSbiRemJPlXKy3kxOq5qt9U7tnIOxfrd6aDx5u1IKGZemkEPesyWmA6MUIsu1tTnon+MFCOM4nuffydKFhJly0wvOIWdZvpBJUp7Q/3QzXGsbzIy2vTvI+eCi07J7rd67bu1osBnyffQUPUN1ZKNX6AidoCEaIYI+o5/oF/pd/lj+Uv5a/paVlvY2zBO0tcrf/wDnv2os</latexit> sbh tc ✓ k hf f n (12 + 4 + 8 )L + 8p + p,1 4(1 + v/h) a h ◆
A100での実験結果 実験条件 ・Model:Llama3.1-8B ・GPU:A100 (40GB) ・Sequence length:8,192 実験パラメータ ・TP: テンソル並列数 ・CP: コンテキスト並列数 ・PP: パイプライン並列数 ・MBS: Micro Batch Size 判定条件 ・緑:安全 ・黄:懸念 ・赤:危険 それぞれの設定での学習速度[FLOP/s]
H100での実験結果 実験条件 ・Model:Llama3.1-8B ・GPU:H100 (96GB) ・Sequence length:8,192 それぞれの設定での学習速度[FLOP/s]
OpenAI vs Meta vs DeepSeek
全結合層の冗長化: Mixture of Experts
日本におけるLLM開発 Fugaku-LLM Members: 東工大, 理研, 富士通, CyberAgent 東北大, 名古屋大 Kotoba Technologies System: Fugaku (50,000,000 A64FX hours) Model: GPT 13B Framework: Megatron-DeepSpeed LLM-jp Swallow Members: NII他多数 Members: 東工大, 産総研 System: MDX (600,000 A100 hours) ABCI (900,000 A100 hours) GCP (?,000,000 H100 hours) TSUBAME4.0 (720,000 H100 hours) System: ABCI (350,000 A100 hours) Model: GPT 1.3B, 13B, 175B Llama2 172B Framework: Megatron-DeepSpeed, Megatron-LM Model: Llama2 7B, 13B, 70B, Mistral, Mixtral 7B, Llama3 8B, 70B Llama3.1 8B, 70B Framework: Megatron-LM
論文の通りにやってもうまくいかなかった例 Things we changed: GPT → Llama2 ・pre-norm 0.4 llm-jp-eval avg. 0.35 13B(eps=1e-8) 13B(eps=1e-5) 70B(eps=1e-5) 172B(eps=1e-5) ・RMS norm ・scaled embedding ・z-loss 0.3 LR (minLR):6e-5 (1e-6) → 1e-4 (1e-5) LR warm up:3433 → 2000 0.25 Adam eps:1e-8 → 1e-5 Init. STD:0.005 → 0.02 0.2 Seq. length:2048 → 4096 Batch size:1536 → 1728 0.15 0.1 5000 10000 15000 steps 20000 25000
スクラッチからの事前学習 vs 継続事前学習 https://medium.com/@lars.chr.wiik/gpt-4o-vs-gpt-4-vs-gemini-1-5-performance-analysis-6bd207a2c580 継続事前学習 長所 • 継続元のモデルの学習に使われたデータ を有効活用 短所 • 学習に使われたデータの詳細が不明 スクラッチからの事前学習 長所 オープンなモデルはクローズドなモデルに追い ついてきている • 学習に使われたデータの全容を把握 短所 • 膨大なデータと計算資源が必要 https://www.reddit.com/r/singularity/comments/1d9mi13/new̲alibabas̲llm̲qwen̲2̲72b̲surpasses̲llama̲3̲70b/
日本語のモデルが必要な理由 English Characters not in the vocabulary are broken down into UTF-8 bytes, consuming as many as three tokens per character 日本語 語彙に含まれない文字はUTF-8のバイト列に分解され1文字が3トークン程度も 消費することとなる Language Tokens English 1x Japanese 3x Chinese 3x Korean 5x
大規模言語モデルSwallow 29
データの前処理(岡崎研) 大きい単位から始めて、徐々に小さい単位へ 30