{"id":170515,"date":"2010-08-12T03:41:55","date_gmt":"2010-08-12T03:41:55","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/website-structure-understanding-and-its-applications\/"},"modified":"2017-06-21T08:13:45","modified_gmt":"2017-06-21T15:13:45","slug":"website-structure-understanding-and-its-applications","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/website-structure-understanding-and-its-applications\/","title":{"rendered":"Website Structure Understanding and its Applications"},"content":{"rendered":"
Website structure understanding can be treated as a reverse engineering for the purpose of automatically discovering the layout templates and URL patterns of a website, and understanding how these templates and patterns are integrated to organize the website. The study of this problem has had a great impact to many applications which can leverage such site-level knowledge to help web search and data mining.<\/p>\n
Almost every website on the Internet has a distinct design & organization structure. Experienced website designers usually create distinguishable layout templates for pages of different functions. They then organize the website by linking various pages with hyperlinks, each of which is represented by a URL string following some pre-defined syntactic patterns. This project is a reverse engineering to automatically discover the layout templates and URL patterns of a website, and understand how these templates and patterns are integrated to organize the website. To demonstrate the power of website structure understanding, this project also proposes some applications which leverage such site-level knowledge to help web search and data mining.<\/p>\n\t