https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#head
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://www.nanopub.org/nschema#hasAssertion
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://www.nanopub.org/nschema#hasProvenance
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#provenance
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://www.nanopub.org/nschema#hasPublicationInfo
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#pubinfo
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.nanopub.org/nschema#Nanopublication
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://arxiv.org/abs/2311.13171
https://sense-nets.xyz/hasZoteroItemType
preprint
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://purl.org/dc/terms/creator
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://purl.org/spar/cito/discusses
https://arxiv.org/abs/2311.13171
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://purl.org/spar/cito/discusses
https://www.alphaxiv.org/pdf/2408.03092
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://purl.org/spar/cito/discusses
https://x.com/prateeky2806/status/1727589818618523783
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://purl.org/spar/cito/includesQuotationFrom
https://x.com/prateeky2806/status/1727589818618523783
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://www.w3.org/2000/01/rdf-schema#comment
Merging models trained for long with WIDEN
When models were trained on a lot of data they diverged further from the baseline (e.g. in continual pretraining for additional languages), current merging methods underperform in this setting
https://alphaxiv.org/pdf/2408.03092
@AlibabaGroup https://twitter.com/LChoshen/status/1823002789217493392/photo/1
How do you do that?
Let's assume we update a matrix with a few models.
Pick a pretrained model and consider the rest of the models as diff from it (task vectors)
Normalize the row of each model, separating the normalization factor (magnitude) and direction (row)
Now we weigh every row by how much it changed (higher = better) and average all together
+ some trick to sometimes keep the original weight so weights might not sum to 1.
You can see how this follows recent findings about direction and size (e.g. https://x.com/prateeky2806/status/1727589818618523783)
While the results in "just" merging are not changing that much, merging with a continually trained model (Sailor) that added many languages look quite good! https://twitter.com/LChoshen/status/1823002796259791276/photo/1
Criticism (@askalphaxiv didn't upload comment):
There is a vast overclaiming calling Sailor a different pretrained model.
Quite complex, hard to know if it will generalize
and they only show a specific model.
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://schema.org/keywords
Sailor
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://schema.org/keywords
WIDEN
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://schema.org/keywords
large-language-models
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://schema.org/keywords
model-merging
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
https://schema.org/keywords
weight-disentanglement
https://www.alphaxiv.org/pdf/2408.03092
https://sense-nets.xyz/hasZoteroItemType
webpage
https://x.com/prateeky2806/status/1727589818618523783
https://sense-nets.xyz/hasZoteroItemType
forumPost
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#provenance
https://sense-nets.xyz/
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/ns/prov#SoftwareAgent
https://sense-nets.xyz/
http://www.w3.org/ns/prov#actedOnBehalfOf
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#activity
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
https://sense-nets.xyz/supervisedActivity
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#activity
http://www.w3.org/ns/prov#wasAssociatedWith
https://sense-nets.xyz/
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://www.w3.org/ns/prov#linksTo
https://x.com/LChoshen/status/1823002789217493392
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://www.w3.org/ns/prov#wasAssociatedWith
https://x.com/LChoshen
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0002-0085-6496
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#assertion
http://www.w3.org/ns/prov#wasGeneratedBy
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#activity
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
http://xmlns.com/foaf/0.1/account
https://orcid.org/0000-0002-0085-6496
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
http://xmlns.com/foaf/0.1/account
https://x.com/LChoshen
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#pubinfo
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig
http://purl.org/nanopub/x/hasAlgorithm
RSA
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig
http://purl.org/nanopub/x/hasPublicKey
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArHtI92jm8pAYVsvJabxLGfOT+7G0JyJGh2gwjB5x2pFPga6wWTd+rNBWWUZViIFnaJrBEsJpgdnoupLU9ppwn+khMiGRfxqGsDDzwHcj3Jc75CRys7d3etwXdBdoXfBgjsJiZBazwm13idr6tljRrC1TaEJBnRQAqzBw9cLDeGY77cSznzXT39feUGT168dpCSE9O6u/48DvvWVqciHGsH9cQ+LroJJVsMrorwtsdZnAK+q48wtIP6pIpw5shSJ5LnA0qeN/f4TvTFDV6ItYIXjiWWpTECc/Bxmfnyat3B5xWCu9nvz8fEs7Ns0TuzQwT3/K55iSKDEIi/E0nO97xwIDAQAB
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig
http://purl.org/nanopub/x/hasSignature
BMCHmxj4685c4tB4MzssQlbmilVpyC5oQEPuiEqc4AHbLlU0uJStQhpua7d52ZKIDFMi9nmrvLJc7eFuYs6gyjJzve0WY5BNHdpurTkJeU3Tyh9G2vsmlVof2FQc6QaijFR5DFKECKems3CSMJuBxChDj+hqrjS6DloVTdEIEalSHXsOw0utP7P/ZZvdhvkTMYaPPhuJspFjyGYmfLVb/m+Gr2zlsQgXRxdS5qc8LvGdAAjRxS4LAwzk7rklJXEfyDEWZ+B9V5hPzsmmqb60iFPaA9PHyqFGUT+EP1WFyJdIVL5PS48izFWx0+KDaTH4Nm6JrQUSO8kNx348rgKYZA==
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig
http://purl.org/nanopub/x/hasSignatureTarget
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig
http://purl.org/nanopub/x/singedBy
https://sense-nets.xyz/
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0#sig
http://www.w3.org/ns/prov#wasAssociatedWith
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16VtssigningDelegation
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://purl.org/dc/terms/created
2024-09-03T21:16:16.131Z
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://purl.org/dc/terms/creator
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://purl.org/dc/terms/license
https://creativecommons.org/licenses/by/4.0/
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://purl.org/nanopub/x/hasNanopubType
https://sense-nets.xyz/SemanticPost
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://purl.org/nanopub/x/wasCreatedAt
https://sense-nets.xyz/
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://www.w3.org/2000/01/rdf-schema#label
CoSMO Semantic Post
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0002-0085-6496
https://w3id.org/np/RAPwHYQQtXh6p3DQQ066TmpKOBMIWkerAYv-chCViAqC0
https://sense-nets.xyz/hasRootSigner
0xf6ECcfD463afB464dcC85b051DF2E93E2646E6D2
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
http://xmlns.com/foaf/0.1/account
https://orcid.org/0000-0002-0085-6496
https://w3id.org/np/RAoSadUw99CeqDlR2400018nqTzR_38fT86OrTzk16Vts
http://xmlns.com/foaf/0.1/name
Leshem Choshen 🤖🤗 @ICML wanna talk?