https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#head https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://www.nanopub.org/nschema#hasAssertion https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://www.nanopub.org/nschema#hasProvenance https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#provenance https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://www.nanopub.org/nschema#hasPublicationInfo https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#pubinfo https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.nanopub.org/nschema#Nanopublication https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://arxiv.org/abs/2311.13171 https://sense-nets.xyz/hasZoteroItemType preprint https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://purl.org/dc/terms/creator https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://purl.org/spar/cito/discusses https://arxiv.org/abs/2311.13171 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://purl.org/spar/cito/discusses https://www.alphaxiv.org/pdf/2408.03092 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://purl.org/spar/cito/discusses https://x.com/prateeky2806/status/1727589818618523783 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://purl.org/spar/cito/includesQuotationFrom https://x.com/prateeky2806/status/1727589818618523783 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://www.w3.org/2000/01/rdf-schema#comment Merging models trained for long with WIDEN When models were trained on a lot of data they diverged further from the baseline (e.g. in continual pretraining for additional languages), current merging methods underperform in this setting https://alphaxiv.org/pdf/2408.03092 @AlibabaGroup https://twitter.com/LChoshen/status/1823002789217493392/photo/1 How do you do that? Let's assume we update a matrix with a few models. Pick a pretrained model and consider the rest of the models as diff from it (task vectors) Normalize the row of each model, separating the normalization factor (magnitude) and direction (row) Now we weigh every row by how much it changed (higher = better) and average all together + some trick to sometimes keep the original weight so weights might not sum to 1. You can see how this follows recent findings about direction and size (e.g. https://x.com/prateeky2806/status/1727589818618523783) While the results in "just" merging are not changing that much, merging with a continually trained model (Sailor) that added many languages look quite good! https://twitter.com/LChoshen/status/1823002796259791276/photo/1 Criticism (@askalphaxiv didn't upload comment): There is a vast overclaiming calling Sailor a different pretrained model. Quite complex, hard to know if it will generalize and they only show a specific model. https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://schema.org/keywords Sailor https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://schema.org/keywords WIDEN https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://schema.org/keywords large-language-models https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://schema.org/keywords model-merging https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion https://schema.org/keywords weight-disentanglement https://www.alphaxiv.org/pdf/2408.03092 https://sense-nets.xyz/hasZoteroItemType webpage https://x.com/prateeky2806/status/1727589818618523783 https://sense-nets.xyz/hasZoteroItemType forumPost https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#provenance https://sense-nets.xyz/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/prov#SoftwareAgent https://sense-nets.xyz/ http://www.w3.org/ns/prov#actedOnBehalfOf https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#activity http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://sense-nets.xyz/supervisedActivity https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#activity http://www.w3.org/ns/prov#wasAssociatedWith https://sense-nets.xyz/ https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://www.w3.org/ns/prov#linksTo https://x.com/LChoshen/status/1823002789217493392 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://www.w3.org/ns/prov#wasAssociatedWith https://x.com/LChoshen https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://www.w3.org/ns/prov#wasAttributedTo https://orcid.org/0000-0002-0085-6496 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://www.w3.org/ns/prov#wasAttributedTo https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion http://www.w3.org/ns/prov#wasGeneratedBy https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#activity https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts http://xmlns.com/foaf/0.1/account https://orcid.org/0000-0002-0085-6496 https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts http://xmlns.com/foaf/0.1/account https://x.com/LChoshen https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#pubinfo https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig http://purl.org/nanopub/x/hasAlgorithm RSA https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig http://purl.org/nanopub/x/hasPublicKey MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArHtI92jm8pAYVsvJabxLGfOT+7G0JyJGh2gwjB5x2pFPga6wWTd+rNBWWUZViIFnaJrBEsJpgdnoupLU9ppwn+khMiGRfxqGsDDzwHcj3Jc75CRys7d3etwXdBdoXfBgjsJiZBazwm13idr6tljRrC1TaEJBnRQAqzBw9cLDeGY77cSznzXT39feUGT168dpCSE9O6u/48DvvWVqciHGsH9cQ+LroJJVsMrorwtsdZnAK+q48wtIP6pIpw5shSJ5LnA0qeN/f4TvTFDV6ItYIXjiWWpTECc/Bxmfnyat3B5xWCu9nvz8fEs7Ns0TuzQwT3/K55iSKDEIi/E0nO97xwIDAQAB https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig http://purl.org/nanopub/x/hasSignature BMCHmxj4685c4tB4MzssQlbmilVpyC5oQEPuiEqc4AHbLlU0uJStQhpua7d52ZKIDFMi9nmrvLJc7eFuYs6gyjJzve0WY5BNHdpurTkJeU3Tyh9G2vsmlVof2FQc6QaijFR5DFKECKems3CSMJuBxChDj+hqrjS6DloVTdEIEalSHXsOw0utP7P/ZZvdhvkTMYaPPhuJspFjyGYmfLVb/m+Gr2zlsQgXRxdS5qc8LvGdAAjRxS4LAwzk7rklJXEfyDEWZ+B9V5hPzsmmqb60iFPaA9PHyqFGUT+EP1WFyJdIVL5PS48izFWx0+KDaTH4Nm6JrQUSO8kNx348rgKYZA== https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig http://purl.org/nanopub/x/hasSignatureTarget https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig http://purl.org/nanopub/x/singedBy https://sense-nets.xyz/ https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig http://www.w3.org/ns/prov#wasAssociatedWith https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16VtssigningDelegation https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://purl.org/dc/terms/created 2024-09-03T21:16:16.131Z https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://purl.org/dc/terms/creator https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://purl.org/dc/terms/license https://creativecommons.org/licenses/by/4.0/ https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://purl.org/nanopub/x/hasNanopubType https://sense-nets.xyz/SemanticPost https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://purl.org/nanopub/x/wasCreatedAt https://sense-nets.xyz/ https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://www.w3.org/2000/01/rdf-schema#label CoSMO Semantic Post https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 http://www.w3.org/ns/prov#wasAttributedTo https://orcid.org/0000-0002-0085-6496 https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0 https://sense-nets.xyz/hasRootSigner 0xf6ECcfD463afB464dcC85b051DF2E93E2646E6D2 https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts http://xmlns.com/foaf/0.1/account https://orcid.org/0000-0002-0085-6496 https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts http://xmlns.com/foaf/0.1/name Leshem Choshen 🤖🤗 @ICML wanna talk?