In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality. It allows for distinguishing between eight of the most relevant functional classes of Web sites. We show that a pre-classification of Web sites utilizing structural properties considerably improves a subsequent textual classification with standard techniques. We evaluate this approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known Web pages. Our approach achieves an accuracy of 92% for the coarse-grained classification of these Web sites.
Lindemann, Christoph and Lars Littig. WWW 2007 (2007). Articles>Web Design>Information Design>Metadata
The web is designed to be consumed by humans, and much of the rich, useful information our websites contain, is inaccessible to machines. People can cope with all sorts of variations in layout, spelling, capitalization, color, position, and so on, and still absorb the intended meaning from the page. Machines, on the other hand, need some help. A new kind of web—a semantic web—would be made up of information marked up in such a way that software can also easily understand it. Before considering how we might achieve such a web, let’s look at what we might be able to do with it.
Birbeck, Mark. List Apart, A (2009). Articles>Web Design>Information Design>Metadata
There are 20 readers currently online: 1 registered user and 19 guests. Register.

![]()
![]()


![]()
![]()
![]()