Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
In this paper, we present Sosed, a tool for discovering similar software projects. We use fastText to compute the embeddings of subto-kens into a dense space for 120, 000 GitHub projects in 200 languages. Then, we cluster embeddings to identify groups of semantically similar subtokens that reflect topics in source code. We use a dataset of 9 million GitHub projects as a reference search base. To identify similar projects, we compare the distributions of clusters among their subtokens. The tool receives an arbitrary project as input, extracts subtokens in 16 most popular programming languages, computes cluster distribution, and finds projects with the closest distribution in the search base. We labeled subtoken clusters with short descriptions to enable Sosed to produce interpretable output. Sosed is available at https://github.com/JetBrains-Research/sosed/. The tool demo is available at https://www.youtube.com/watch?v=LYLkztCGRt8. The multi-language extractor of subtokens is available separately at https://github.com/JetBrains-Research/buckwheat/.
Язык оригинала | английский |
---|---|
Название основной публикации | Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 |
Издатель | Institute of Electrical and Electronics Engineers Inc. |
Страницы | 1316-1320 |
Число страниц | 5 |
ISBN (электронное издание) | 9781450367684 |
DOI | |
Состояние | Опубликовано - сен 2020 |
Событие | 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 - Virtual, Melbourne, Австралия Продолжительность: 22 сен 2020 → 25 сен 2020 |
Название | Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 |
---|
конференция | 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 |
---|---|
Страна/Tерритория | Австралия |
Город | Virtual, Melbourne |
Период | 22/09/20 → 25/09/20 |
ID: 73688884