. . . . "preprint" . . . " best #icmi2024 position:\n103 datasets that claim to be more diverse, are not.\nDiversity claims are subjective, political and not tested, instead of claiming, let's measure.\nBut how?\n\n@dorazhao9 @SciOrestis @alicexiang\nhttps://arxiv.org/abs/2407.08188 https://twitter.com/LChoshen/status/1816031646568583532/photo/1\n\n Basically, like we evaluate everything else.\nMeasure one thing at a time (don't also test a new model)\nHave a specific claim (is it language diverse, background,origin) and quantify it \nSeparate it from other constructs like how much data was collected or whether it is biased https://twitter.com/LChoshen/status/1816031649416556577/photo/1\n\n" . "datasetdiversity" . "datasets" . "diversity" . "icmi2024" . "measurement" . "value-laden" . . . . . . . . . . . . . "RSA" . "MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArHtI92jm8pAYVsvJabxLGfOT+7G0JyJGh2gwjB5x2pFPga6wWTd+rNBWWUZViIFnaJrBEsJpgdnoupLU9ppwn+khMiGRfxqGsDDzwHcj3Jc75CRys7d3etwXdBdoXfBgjsJiZBazwm13idr6tljRrC1TaEJBnRQAqzBw9cLDeGY77cSznzXT39feUGT168dpCSE9O6u/48DvvWVqciHGsH9cQ+LroJJVsMrorwtsdZnAK+q48wtIP6pIpw5shSJ5LnA0qeN/f4TvTFDV6ItYIXjiWWpTECc/Bxmfnyat3B5xWCu9nvz8fEs7Ns0TuzQwT3/K55iSKDEIi/E0nO97xwIDAQAB" . "MM8qudqxnOFWfSh+OvWcIZsM4hpBBIdfF6BVHMFg5pLGYwrPe1Y72TwZqv7j4qUfp7p6q/X710IwXCjLiiyHFph5khcUnpuVR3l9LIR89D4uUdrWTEOd3g7b6cvcjyx+kS/CgTHtSolHhl3UHVH874zpxQ6A0X+YS3EFkwpFZifDPguyCkaMf7eZ6Wf3jwOkpb7rWQpZ0RYq7I4U7RkRJgvOai0dK4zw1fXEA3uJpgnNiokPoa48GyyCuQU68i8n5kCCDmc9KFHZ86xumdGET1xx700Bq9f5iwUBwjRtSsGYdadICEapkEeg5NXrcnSnr3Alqje8zEYZKSXw75VzEA==" . . . . "2024-09-12T18:45:28.503Z"^^ . . . . . "CoSMO Semantic Post" . . "0xf6ECcfD463afB464dcC85b051DF2E93E2646E6D2" . . "Leshem Choshen 🤖🤗 @ICML wanna talk?" .