A directory of resources inthe field of technical communication.

WWW 2007

22 found.

About this Site | Advanced Search | Localization | Site Maps



Building and Managing Personalized Semantic Portals   (PDF)

This paper presents a semantic portal, SEMPort, which provides better user support with personalized views, semantic navigation, ontology-based search and three different kinds of semantic hyperlinks. Distributed content editing and provision is supplied for the maintenance of the contents in real-time. As a case study, SEMPort is tested on the Course Modules Web Page (CMWP) of the School of Electronics and Computer Science (ECS).

Şah, M. and W. Hall. WWW 2007 (2007). Articles>Information Design>Web Design>Semantic


Classifying Web Sites   (PDF)

In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality. It allows for distinguishing between eight of the most relevant functional classes of Web sites. We show that a pre-classification of Web sites utilizing structural properties considerably improves a subsequent textual classification with standard techniques. We evaluate this approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known Web pages. Our approach achieves an accuracy of 92% for the coarse-grained classification of these Web sites.

Lindemann, Christoph and Lars Littig. WWW 2007 (2007). Articles>Web Design>Information Design>Metadata


Collaborative ICT for Indian Business Clusters   (PDF)

Indian business clusters have contributed immensely to the country’s industrial output, poverty alleviation and employment generation. However, with recent globalization these clusters can lose out to international competitors if they do not continuously innovate and take advantage of the new opportunities that are available through economic liberalization. In this paper, we discuss how information and communication technologies (ICT) can help in improving the productivity and growth of these clusters.

Roy, Soumya and Shantanu Biswas. WWW 2007 (2007). Articles>Business Communication>Regional>India


The Complex Dynamics of Collaborative Tagging   (PDF)

The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for 'popular' sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

Halpin, Harry, Valentin Robu and Hana Shepherd. WWW 2007 (2007). Articles>Web Design>Taxonomy>Collaboration


Consistency-Preserving Caching of Dynamic Database Content   (PDF)

With the growing use of dynamic web content generated from relational databases, traditional caching solutions for throughput and latency improvements are ineffective. We describe a middleware layer called Ganesh that reduces the volume of data transmitted without semantic interpretation of queries or results. It achieves this reduction through the use of cryptographic hashing to detect similarities with previous results. These benefits do not require any compromise of the strict consistency semantics provided by the back-end database. Further, Ganesh does not require modifications to applications, web servers, or database servers, and works with closed-source applications and databases. Using two benchmarks representative of dynamic web sites, measurements of our prototype show that it can increase end-to-end throughput by as much as twofold for non-data intensive applications and by as much as tenfold for data intensive ones.

Tolia, Niraj and M. Satyanarayanan. WWW 2007 (2007). Articles>Web Design>Information Design>Databases


The Discoverability of the Web   (PDF)

Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our study has two parts. First, we study the inherent difficulty of the discovery problem using a maximum cover formulation, under an assumption of perfect estimates of likely sources of links to new content. Second, we relax this assumption and study a more realistic setting in which algorithms must use historical statistics to estimate which pages are most likely to yield links to new content. We recommend a simple algorithm that performs comparably to all approaches we consider. We measure the overhead of discovering new content, de- fined as the average number of fetches required to discover one new page. We show first that with perfect foreknowledge of where to explore for links to new content, it is possible to discover 90% of all new content with under 3% overhead, and 100% of new content with 9% overhead. But actual algorithms, which do not have access to perfect foreknowl- edge, face a more difficult task: one quarter of new content is simply not amenable to efficient discovery. Of the re- maining three quarters, 80% of new content during a given week may be discovered with 160% overhead if content is recrawled fully on a monthly basis.

Dasgupta, Anirban, Arpita Ghosh, Ravi Kumar, Christopher Olston, Sandeep Pandey and Andrew Tomkins. WWW 2007 (2007). Articles>Web Design>Search>Information Design


Do Not Crawl in the DUST: Different URLs with Similar Text   (PDF)

We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and dynamically generates the same page from various different URL requests. We present a novel algorithm, DustBuster, for uncovering dust; that is, for discovering rules that transform a given URL to others that are likely to have similar content. DustBuster mines dust effectively from previous crawl logs or web server logs, without examining page contents. Verifying these rules via sampling requires fetching few actual web pages. Search engines can benefit from information about dust to increase the effectiveness of crawling, reduce indexing overhead, and improve the quality of popularity statistics such as PageRank.

Bar-Yossef, Ziv, Idit Keidar and Uri Schonfeld. WWW 2007 (2007). Articles>Web Design>Search Engine Optimization


Extensible Schema Documentation with XSLT 2.0   (PDF)

XML Schema documents are defined using an XML syntax, which means that the idea of generating schema documentation through standard XML technologies is intriguing. We present X2Doc, a framework for generating schema-documentation solely through XSLT. The framework uses SCX, an XML syntax for XML Schema components, as intermediate format and produces XML-based output formats. Using a modular set of XSLT stylesheets, X2Doc is highly configurable and carefully crafted towards extensibility. This proves especially useful for composite schemas, where additional schema information like Schematron rules are embedded into XML Schemas.

Michel, Felix and Erik Wilde. WWW 2007 (2007). Articles>Documentation>XML>XSL


Homepage Live: Automatic Block Tracing for Web Personalization   (PDF)

The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggregate them in a single Web page. The web contents are often predefined content blocks provided by the service providers. However, it involves intensive manual efforts to define the content blocks and maintain the information in it. In this paper, we propose a novel personalized homepage system, called “Homepage Live”, to allow end users to use drag-and-drop actions to collect their favorite Web content blocks from existing Web pages and organize them in a single page. Moreover, Homepage Live automatically traces the changes of blocks with the evolvement of the container pages by measuring the tree edit distance of the selected blocks. By exploiting the immutable elements of Web pages, the tracing algorithm performance is significantly improved. The experimental results demonstrate the effectiveness and efficiency of our algorithm.

Han, Jie, Dingyi Han, Chenxi Lin, Hua-Jun Zeng, Zheng Chen and Yong Yu. WWW 2007 (2007). Articles>Web Design>Personalization


Investigating Behavioral Variability in Web Search   (PDF)

Understanding the extent to which people’s search behaviors differ in terms of the interaction flow and information targeted is important in designing interfaces to help World Wide Web users search more effectively. In this paper we describe a longitudinal log-based study that investigated variability in people’s interaction behavior when engaged in search-related activities on the Web. We analyze the search interactions of more than two thousand volunteer users over a five-month period, with the aim of characterizing differences in their interaction styles. The findings of our study suggest that there are dramatic differences in variability in key aspects of the interaction within and between users, and within and between the search queries they submit. Our findings also suggest two classes of extreme user--navigators and explorers--whose search interaction is highly consistent or highly variable. Lessons learned from these users can inform the design of tools to support effective Web-search interactions for everyone.

White, Ryen W. and Steven M. Drucker. WWW 2007 (2007). Articles>Web Design>Search>User Centered Design


A Large-Scale Study of Web Password Habits   (PDF)

We report the results of a large scale study of password use and password re-use habits. The study involved half a million users over a three month period. A client component on users’ machines recorded a variety of password strength, usage and frequency metrics. This allows us to measure or estimate such quantities as the average number of passwords and average number of accounts each user has, how many passwords she types per day, how often passwords are shared among sites, and how often they are forgotten. We get extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site. The data is the first large scale study of its kind, and yields numerous other insights into the role the passwords play in users’ online experience.

Florencio, Dinei and Cormac Herley. WWW 2007 (2007). Articles>Web Design>Security


Learning Information Intent via Observation   (PDF)

Workers in organizations frequently request help from assistants by sending request messages that express information intent: an intention to update data in an information system. Human assistants spend a significant amount of time and effort processing these requests. For example, human-resource assistants process requests to update personnel records, and executive assistants process requests to schedule conference rooms or to make travel reservations. To process the intent of a request, an assistant reads the request and then locates, completes, and submits a form that corresponds to the expressed intent. Automatically or semi-automatically processing the intent expressed in a request on behalf of an assistant would ease the mundane and repetitive nature of this kind of work.

Tomasic, Anthony, Isaac Simmons and John Zimmerman. WWW 2007 (2007). Articles>Information Design>User Centered Design


Open User Profiles for Adaptive News Systems: Help or Harm?   (PDF)

Over the last five years, a range of projects have focused on progressively more elaborated techniques for adaptive news delivery. However, the adaptation process in these systems has become more complicated and thus less transparent to the users. In this paper, we concentrate on the application of open user models in adding transparency and controllability to adaptive news systems. We present a personalized news system, YourNews, which allows users to view and edit their interest profiles, and report a user study on the system. Our results confirm that users prefer transparency and control in their systems, and generate more trust to such systems. However, similar to previous studies, our study demonstrate that this ability to edit user profiles may also harm the system’s performance and has to be used with caution.

Ahn, Jae-wook, Peter Brusilovsky, Jonathan Grady, Daqing He and Sue Yeon Syn. WWW 2007 (2007). Articles>Web Design>Journalism>Personalization


Summarizing Email Conversations with Clue Words   (PDF)

Accessing an ever increasing number of emails, possibly on small mobile devices, has become a ma jor problem for many users. Email summarization is a promising way to solve this problem. In this paper, we propose a new framework for email summarization. One novelty is to use a fragment quotation graph to try to capture an email conversation. The second novelty is to use clue words to measure the importance of sentences in conversation summarization. Based on clue words and their scores, we propose a method called CWS, which is capable of producing a summary of any length as requested by the user. We provide a comprehensive comparison of CWS with various existing methods on the Enron data set. Preliminary results suggest that CWS provides better summaries than existing methods.

Carenini, Giuseppe, Raymond T. Ng and Xiaodong Zhou. WWW 2007 (2007). Articles>Business Communication>Email


Tag Clouds for Summarizing Web Search Results   (PDF)

In this paper, we describe an application, PubCloud that uses tag clouds for the summarization of results from queries over the PubMed database of biomedical literature. PubCloud responds to queries of this database with tag clouds generated from words extracted from the abstracts returned by the query. The results of a user study comparing the PubCloud tag-cloud summarization of query results with the standard result list provided by PubMed indicated that the tag cloud interface is advantageous in presenting descriptive information and in reducing user frustration but that it is less effective at the task of enabling users to discover relations between concepts.

Kuo, Byron Y-L., Thomas Hentrich, Benjamin M. Good and Mark D. Wilkinson. WWW 2007 (2007). Articles>Web Design>Information Design>Taxonomy


Toward Expressive Syndication on the Web   (PDF)

Syndication systems on the Web have attracted vast amounts of attention in recent years. As technologies have emerged and matured, there has been a transition to more expressive syndication approaches; that is, subscribers and publishers are provided with more expressive means of describing their interests and published content, enabling more accurate information filtering. In this paper, we formalize a syndication architecture that utilizes expressive Web ontologies and logic-based reasoning for selective content dissemination. This provides finer grained control for filtering and automated reasoning for discovering implicit subscription matches, both of which are not achievable in less expressive approaches. We then address one of the main limitations with such a syndication approach, namely matching newly published information with subscription requests in an efficient and practical manner. To this end, we investigate continuous query answering for a large subset of the Web Ontology Language (OWL); specifically, we formally define continuous queries for OWL knowledge bases and present a novel algorithm for continuous query answering in a large subset of this language. Lastly, an evaluation of the query approach is shown, demonstrating its effectiveness for syndication purposes.

Halaschek-Wiener, C. and J. Hendler. WWW 2007 (2007). Articles>Web Design>Information Design>RSS


The Use of XML to Express a Historical Knowledge Base    (PDF)

Since conventional historical records have been written assuming human readers, they are not well-suited for computers to collect and process automatically. If computers could understand descriptions in historical records and process them automatically, it would be easy to analyze them from different perspectives. In this paper, we review a number of existing frameworks used to describe historical events, and make a comparative assessment of these frameworks interms of usability, based on 'deep cases' of Fillmore ’score grammar. Based on this assessment, we propose a new description framework, and have created a microformat vocabulary set suitable for that framework.

Nakahira, Katsuko T., Masashi Matsui and Yoshiki Mikami. WWW 2007 (2007). Articles>Knowledge Management>XML>History


Web Object Retrieval   (PDF)

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

Nie, Zaiqing, Yunxiao Ma, Shuming Shi, Ji-Rong Wen and Wei-Ying Ma. WWW 2007 (2007). Articles>Web Design>Information Design>Search


Why We Search: Visualizing and Predicting User Behavior   (PDF)

The aggregation and comparison of behavioral patterns on the WWW represent a tremendous opportunity for understanding past behaviors and predicting future behaviors. In this paper, we take a first step at achieving this goal. We present a large scale study correlating the behaviors of Internet users on multiple systems ranging in size from 27 million queries to 14 million blog posts to 20,000 news articles. We formalize a model for events in these time-varying datasets and study their correlation. We have created an interface for analyzing the datasets, which includes a novel visual artifact, the DTWRadar, for summarizing differences between time series. Using our tool we identify a number of behavioral properties that allow us to understand the predictive power of patterns of use.

Adar, Eytan, Daniel S. Weld, Brian N. Bershad and Steven D. Gribble. WWW 2007 (2007). Articles>Web Design>Search>Research


XML Design for Relational Storage   (PDF)

Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of a native XML storage, while in practice most of XML data is stored in relational databases. In this paper we study XML design and normalization for relational storage of XML documents. To be able to relate and compare XML and relational designs, we use an information-theoretic framework that measures information content in relations and documents, with higher values corresponding to lower levels of redundancy. We show that most common relational storage schemes preserve the notion of being well-designed (i.e., anomalies- and redundancy-free). Thus, existing XML normal forms guarantee well-designed relational storages as well. We further show that if this perfect option is not achievable, then a slight restriction on XML constraints guarantees a “second-best” relational design, according to possible values of the information-theoretic measure. We finally consider an edge-based relational representation of XML documents, and show that while it has similar information-theoretic properties with other relational representations, it can behave significantly worse in terms of enforcing integrity constraints.

Kolahi, Solmaz and Leonid Libkin. WWW 2007 (2007). Articles>Information Design>XML>Databases


XML-Based Multimodal Interaction Framework for Contact Center Applications   (PDF)

In this paper, we consider a way to represent contact center applications as a set of multiple XML documents written in different markups including VoiceXML and CCXML. Applications can comprise a dialog with IVR, call routing and agent scripting functionalities. We also consider ways how such applications can be executed in run-time contact center environment.

Anisimov, Nikolay, Brian Galvin and Herbert Ristock. WWW 2007 (2007). Articles>Business Communication>Information Design>XML


XML-Based XML Schema Access   (PDF)

XML Schema’s abstract data model consists of components, which are the structures that eventually define a schema as a whole. XML Schema’s XML syntax, on the other hand, is not a direct representation of the schema components, and it proves to be surprisingly hard to derive a schema’s components from the XML syntax. The Schema Component XML Syntax (SCX) is a representation which attempts to map schema components as faithfully as possible to XML structures. SCX serves as the starting point for applications which need access to schema components and want to do so using standardized and widely available XML technologies.

Wilde, Erik and Felix Michel. WWW 2007 (2007). Articles>Information Design>XML

Follow us on: TwitterFacebookRSSPost about us on: TwitterFacebookDeliciousRSSStumbleUpon