7dc5
»úе֮Ðı¨µÀ ±à¼£ºÕÅÙ» Ðû²¼ÓÚ 2017 ÄêµÄ Transformer ÂÛÎÄ¡ª¡ª¡¸Attention is all you need¡¹±»ÒýÁ¿ÒѾÁè¼Ý 17 Íò£¬³ÉΪÕâÂÖ AI ÊÖÒÕ¸ïÃüµÄ±ê¼ÇÐÔÂÛÎÄ¡£ À´×Ô Jeff Dean µÄÑݽ²»ÃµÆÆ¬ ͬʱ£¬Ò²ÓÐһЩÂÛÎĵĹâÏß±»ËüÑÚÊΣ¬ºÃ±ÈÐû²¼ÓÚ 2015 ÄêµÄ¡¸End-To-End Memory Networks¡¹¡£ ÂÛÎÄÒ»×÷¡¢Meta Ñо¿¿ÆÑ§¼Ò Sainbayar Sukhbaatar ÔÚ×î½üµÄÒ»ÔòÍÆÎÄÖÐ˵µÀ¡¸ »ØÊ×ÒÑÍù£¬ÕâÆªÂÛÎİüÀ¨ÁËÄ¿½ñ´óÐÍÓïÑÔÄ£×ÓµÄÐí¶àÒªËØ ¡£À˹ά¼Ó˹9888Ä£×ÓÊÇÊ׸öÍêÈ«ÓÃ×¢ÖØÁ¦»úÖÆÌæ»» RNN µÄÓïÑÔÄ£×Ó£»ËüÒýÈëÁË´ø¼üֵͶӰµÄµã»ýÈí×¢ÖØÁ¦»úÖÆ£¬¶ÑµþÁ˶à²ã×¢ÖØÁ¦£¬Ê¹Ä£×ÓÄܹØ×¢ÊäÈëµÄ²î±ð²¿·Ö£»Ëü»¹ÒýÈëλÖÃǶÈëÀ´½â¾ö×¢ÖØÁ¦»úÖÆÖеÄ˳ÐòÎȹÌÐÔÎÊÌâ¡¡¡¹ ËäÈ»ÕâÆªÂÛÎıȡ¶Attention is all you need¡·»¹ÔçÁ½Ä꣬µ«Ëü²¢Ã»ÓÐÊܵ½Ó¦ÓеĹØ×¢£¬±»ÒýÁ¿Ö»ÓÐ 3000 ¶à¡£ ×÷ÕßÌáµ½£¬ÕâÆªÂÛÎÄÊÇ¶Ô Facebook AI Ñо¿Ôº 2014 ÄêµÄһƪÂÛÎÄ¡ª¡ª¡¸Memory Networks¡¹µÄˢС£¡¸Memory Networks¡¹ÒýÈëÁ˶à²ã¶ÑµþµÄÓ²×¢ÖØÁ¦»úÖÆ£¨hard attention£©¡ª¡ªÓë Bahdanau µÈÈËÔÚµ¥²ãÉÏÒýÈëÈí×¢ÖØÁ¦ÊÇͬÆÚÌá³öµÄ¡£ ÔÚÈ¥ÄêµÄÒ»¸öÌû×ÓÖУ¬AI ´óÅ£ Andrej Karpathy Ôø·¢Ìû̾Ϣ£¬ Bahdanau µÈÈËÔÚµ¥²ãÉÏÒýÈëÈí×¢ÖØÁ¦ µÄÄÇÏîÊÂÇ顪¡ª¡¸Neural Machine Translation by Jointly Learning to Align and Translate¡¹²ÅÊÇÕæÕýÒýÈë×¢ÖØÁ¦»úÖÆµÄÂÛÎÄ£¨×î½üÄõ½ÁË ICLR 2025 ʱ¼äÄ¥Á·½±µÄÑǾü£©£¬µ«¡¸Attention is all you need¡¹ËùÊܵ½µÄ¹Ø×¢È´ÊÇËüµÄ 100 ¶à±¶¡£²»¹ý£¬ËûÒ²ÈϿɣ¬¡¸Attention is all you need¡¹ÕâÆªÂÛÎÄÓÐÆäÆæÒìÐÔ¡£ »Øµ½ÎÄÕ¿ªÍ·Ìáµ½µÄ¡¸End-to-End Memory Networks¡¹£¬Ëü×ÅʵÊǽ«¡¸Memory Networks¡¹ºÍ¡¸Neural Machine Translation by Jointly Learning to Align and Translate¡¹µÄÏ뷨͎ᵽÁËÒ»Æð£¬²¢Õ¹Ê¾ÁË ¶à²ãÈí×¢ÖØÁ¦Äܹ»±¬·¢ÖØ´óµÄÍÆÀíÄÜÁ¦ ¡ª¡ªÕâÊǵ±½ñ AI ¼Ü¹¹×îÖ÷ÒªµÄ·½ÃæÖ®Ò»¡£ ³ýÁ˽¹µãÁ¢Ò죬һ×÷ Sainbayar Sukhbaatar »¹·ÖÏíÁËÕâÆªÂÛÎĽµÉú±³ºóµÄ¹ÊʺÍËûÃÇÏÖÔÚÕýÔÚÍÆ½øµÄÐÂÊÂÇé¡£ һƪ±» Transformer ¹âÏßÑÚÊεÄÂÛÎÄ Sainbayar Sukhbaatar »ØÒä˵£¬ËûÃǵġ¸End-to-End Memory Networks¡¹Ñо¿Ê¼ÓÚ 2014 Äê´ºÌì¡£Æäʱ£¬Ëû²©Ê¿¶þÄê¼¶£¬»¹ÔÚ FAIR ʵϰ¡£ËûµÄµ¼Ê¦ Rob Fergus ±Þ²ßËû¾ÙÐйØÓÚÓ°ÏóµÄÑо¿¡£²»¹ý£¬ÄÇ»á¶ùµÄËû»¹²»Ã÷È·Ó°ÏóÒâζ×Åʲô£¬ÓÉÓÚÄÇÊÇÒ»¸öÓÉÑ»·ÍøÂçºÍ¾í»ýÍøÂçÖ÷µ¼µÄÌìÏ£¬Ó°Ïó²¢²»Ïñ½ñÌìÄÇÑùÊÇÒ»¸öÖ÷ÒªµÄÊ¢Ðдʡ£ ²»¹ý£¬ËûµÄÑо¿²¢²»ÐèÒªÖØÐÂ×îÏÈ¡£ÓÉÓÚÆäʱ£¬Facebook AI Ñо¿ÔºµÄ Jason Weston µÈÈËÒѾ×ö³öÁË¡¸Memory Networks¡¹¡£±ðµÄ£¬ËûÃÇ»¹Ðû²¼ÁËÒ»Ì×ÃûΪ bAbI µÄʹÃü£¬ÕâЩʹÃüÈÃÑ»·Ä£×Ó²Ò°Ü¡£ÓÉÓÚÕâЩʹÃüÐèÒªÒÔÎÞÐòµÄ·½·¨²éÕÒ¶à¸öÊÂʵ£¬¶øÕâÊÇ RNN µÄÖÂÃüÈõµã¡£ Ó롸ӰÏó¡¹ÓйصÄÕâ¸öÏîÄ¿×î³õÎüÒýÁËÐí¶àÈ˵ĹØ×¢£¬µ«ÊÂÇéÏ£Íû²¢²»Ë³Ëì¡£ ×îÖÕ£¬ËûÃÇ×îÏÈ×ÅÊÖÓÚÓ°ÏóÍøÂçµÄ½øÒ»²½Ñо¿£¬Ä¿µÄÊÇÈÃËüѧ»á¹Ø×¢ÄÇÀ¶ø²»ÐèÒª¸ø¶¨µÄ±êÇ©¡£ËûÃǾöÒéʹÓÃÇ¿»¯Ñ§Ï°ÑµÁ·À´½Ì»áÓ°ÏóÍøÂç¹Ø×¢ÄÇÀï¡£ ʱ¼ä¿ì½øµ½ 2014-2015 Ä궬Ì죬ËûÃÇÆäʱÒѾʵÏÖÁËÇ¿»¯Ñ§Ï°´úÂ룬²¢×¼±¸ÔÚÓïÑÔÄ£×ÓʹÃüÉÏÓë»ù×¼¾ÙÐнÏÁ¿¡£Ò»¸öÏÔ×ŵÄÑ¡ÔñÊÇ¡¸Neural Machine Translation by Jointly Learning to Align and Translate¡¹ÖÐʹÓõÄÈí×¢ÖØÁ¦»úÖÆ¡ª¡ªµ« Sainbayar Sukhbaatar µÈÈËÔÚÑо¿Öн«ÆäÓ¦ÓÃÓÚ¶à²ã½á¹¹ÖУ¬ÕâÔÚ֮ǰÊÇûÓÐÈË×ö¹ýµÄ¡£ÒÔÊÇËûÃǽ«Æä×÷Ϊ»ùÏßʵÏÖ£¬µ«×öÁËһЩ¸Ä±ä£¬ºÃ±ÈʹÓõã»ý¶ø²»ÊÇСÐͶà²ã¸ÐÖªÆ÷À´ÅÌËã×¢ÖØÁ¦¡£ÁîÈ˾ªÏ²µÄÊÇ£¬½ÓÄÉÕâÖÖÈí×¢ÖØÁ¦µÄÓ°ÏóÍøÂçЧ¹û³öÆæµØºÃ£¬ËûÃÇÁ¬Ã¦Òâʶµ½Õâ¾ÍÊÇ׼ȷµÄÆ«Ïò¡£ ÔÚ´ËÖ®ºó£¬ÊÂÇé×îÏÈ¿ìËÙÉú³¤¡£ÔÚ Arthur Szlam£¨Áíһλ×÷Õߣ©µÄ¼á³ÖÏ£¬ÍŶÓ×îÏÈʹÓà bAbI ʹÃü×÷Ϊ»ù×¼¡£ËûÃÇ¿ª·¢Á˼¸ÖÖÐÂÊÖÒÕ£¬ÈçΪ¼üºÍֵʹÓòî±ðµÄͶӰµÈ¡£ËûÃÇ»¹ÐèÒª½â¾ö×¢ÖØÁ¦µÄ˳ÐòÎȹÌÐÔÎÊÌ⣬ÒÔÊÇËûÃÇÌí¼ÓÁËʱ¼äǶÈ루ÏÖÔÚ³ÆÎªÎ»ÖÃǶÈ룩¡£ Jason ½¨ÒéÔÚÕâЩʱ¼äÖµÖÐÌí¼ÓËæ»úÔëÉùÒÔïÔ̹ýÄâºÏ¡£×îºó£¬ËûÃǾöÒé×öÒ»¸öÆäʱ²»Ê¢ÐеÄÓïÑÔ½¨Ä£Ê¹Ãü¡£ÁîÈ˾ªÑȵÄÊÇ£¬ËûÃǽöʹÓÃ×¢ÖØÁ¦¶øÃ»ÓÐÈκÎʱ¼ä recurrence ¾Í»÷°ÜÁË LSTM£¨ÔÚÂÛÎÄÖУ¬ËûÃÇʹÓḠrecurrence ¡¹Ò»´ÊÀ´ÐÎÃ²ÖØ¸´µÄ²ã£¬¼´ÏñͨÓà transformer ÄÇÑù¹²ÏíÈ¨ÖØ£©¡£ ËûÃÇÔÚ NeurIPS Ìá½»µÄ×îºóÒ»ÌìдÁ˴󲿷ÖÂÛÎÄ¡£ÓÐȤµÄÊÇ£¬Ëü×î³õ±»³ÆÎª¡¸Èõ¼àÊÓÓ°ÏóÍøÂ硹£¬ÓÉÓÚËüÐèÒª¸üÉٵļàÊÓ¡£ ÎÞÂÛÔõÑù£¬ËÈËʱÆÚÊÇмܹ¹µÄ»Æ½ðʱ´ú£¬·ºÆðÁË Neural GPU¡¢Stack RNN ºÍ Neural Turing Machine µÈÐÂÂÛÎÄ¡£ »ØÊ× 10 ÄêºóµÄ½ñÌìºÍÄ¿½ñ´óÐÍÓïÑÔÄ£×ÓµÄ״̬£¬Sainbayar Sukhbaatar ÒÔΪËûÃÇÔÚÂÛÎÄÖÐ׼ȷԤ¼ûÁ˼¸µã¡£ËûÃǵÄÄ£×ÓÊǵÚÒ»¸ö²»ÒÀÀµ recurrence µÄ»ùÓÚ×¢ÖØÁ¦µÄÓïÑÔÄ£×Ó¡£ËûÃÇÀֳɵضѵþÁ˶à²ã×¢ÖØÁ¦£¬Ê¹Ä£×ÓÄܹ»ÔÚÊä³öÏÂÒ»¸ö token ֮ǰ¹Ø×¢ÉÏÏÂÎĵIJî±ð²¿·Ö¡£ËûÃÇ»¹Ê¹ÓÃÁËλÖÃǶÈ룬ÉõÖÁÊÇÏà¶ÔλÖÃǶÈ룬ÕâÏÖÔÚÒѳÉΪ´óÐÍÓïÑÔÄ£×ӵıê×¼×ö·¨¡£ ËäÈ»ÕâÆªÂÛÎÄûÓÐÏñ¡¸Attention is all you need¡¹Ò»ÑùÒýÆð¾ª¶¯£¬µ«Ò²Æðµ½ÁËÒ»¶¨×÷Óá£ÓÐÈËÌåÏÖ×Ô¼º¶à´Î¶Á¹ýÕâÆªÂÛÎÄ£¬ÊÔͼÃ÷ȷΪʲôijÖÖÉñ¾¼Ü¹¹ÓÐÓᣠSainbayar Sukhbaatar ÈϿɣ¬Transformer ȷʵ×ö³öÁËÖ÷ÒªµÄˢУ¬ºÃ±ÈʹÓÃǰһ²ãµÄÒþ²Ø×´Ì¬×÷ΪÏÂÒ»²ãµÄÓ°Ïó¡£ÉÐÓÐǰÀ¡²ã¡¢¶àÍ·×¢ÖØÁ¦µÈµÈ¡£ ËûÒÔΪ£¬×ÝÈ»ÒѾÒÑÍùÊ®Ä꣬¼Ü¹¹Ë¢ÐµÄÊÂÇéÈÔÓÐÐí¶àÒª×ö¡£ÒÔÊÇ£¬Ç°¶Îʱ¼ä£¬ËûÃÇÐû²¼ÁËһƪÌâΪ¡¸Multi-Token Attention¡¹£¨MTA£©µÄÐÂÂÛÎÄ¡£ MTA ÔÚ¶à¸öÅÌÎÊ¡¢¼üºÍÍ·Éϵ÷Àí×¢ÖØÁ¦£¬ÔÚÐí¶àÖ¸±êÉ϶¼ÓÅÓÚ±ê×¼Èí×¢ÖØÁ¦¡£ÌØÊâÊÇ£¬ËüÄܹ»¸üºÃµØ½â¾ö³¤ÉÏÏÂÎÄÎÊÌ⣬ÀýÈ硸´óº£ÀÌÕ롹ÀàʹÃü¡£ÓÐȤµÄÊÇ£¬2015 Ä꡸ӰÏóÍøÂ硹ÂÛÎĵĽáÂÛÖоÍÒѾÌáµ½ÕâÒ»µã×÷ΪδÀ´µÄÊÂÇ飺¡¸Æ½»¬²éÕÒ¿ÉÄܲ»»áºÜºÃµØÀ©Õ¹µ½ÐèÒª¸ü´óÓ°ÏóµÄÇéÐΡ¹£¬ÕâǡǡÊǸÃÁìÓò½ñÌìÈÔÔÚÑо¿µÄÎÊÌâ¡£ ÈôÊÇÄã¶ÔËûÃǵÄÂÛÎĸÐÐËȤ£¬½Ó´ýÈ¥ÔĶÁÂÛÎÄÔÎÄ£¨°Ý¼û¡¶Multi-Token Í»ÆÆ×¢ÖØÁ¦»úÖÆÆ¿¾±£¬Meta ·¢Ã÷ÁËÒ»ÖÖºÜÐ嵀 Transformer¡·£©¡£ ²Î¿¼Á´½Ó£ºhttps://x.com/tesatory/status/1911150652556026328