This paper describes a formalized procedure for exploring a site using webometrics methods. The procedure involves gathering details on a site’s structure, constructing and exploring the resulting webgraph, defining the correctness criterion, identifying control actions that would improve the structure under the given criterion, testing the correctness criterion on real-world examples and developing recommendations on improving the structure. PageRank is used as a criterion to evaluate the value of web pages. The value is determined by the presence/absence of a link pointing to that page from the homepage of the site. Going by the correctness criterion, valuable pages of a site should have the highest PageRank among all other pages of that site. Control action consists of removing non-valuable directories (and transforming them into independent sites), whose root page has a high PageRank. Experiments are conducted on three faculty sites of major universities in USA, Russia and Nigeria. The approach is shown t
Original languageEnglish
Pages (from-to)337-352
JournalВестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления
Volume15
Issue number3
StatePublished - 2019
Externally publishedYes

    Research areas

  • data mining, graph, PageRank, universities, url, Web harvesting, web mining, website, website structure, веб-майнинг, веб-сайт, график, извлечение веб-данных, интеллектуальный анализ данных, структура веб-сайта, университеты

ID: 78394354