A directory of resources inthe field of technical communication.

Articles>Information Design>Search

26-31 of 31 found. Page 2 of 2.

About this Site | Advanced Search | Localization | Site Maps
 

« PREVIOUS PAGE 1 2

 

26.
#34238

Web Object Retrieval   (PDF)

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

Nie, Zaiqing, Yunxiao Ma, Shuming Shi, Ji-Rong Wen and Wei-Ying Ma. WWW 2007 (2007). Articles>Web Design>Information Design>Search

27.
#34239

The Discoverability of the Web   (PDF)

Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our study has two parts. First, we study the inherent difficulty of the discovery problem using a maximum cover formulation, under an assumption of perfect estimates of likely sources of links to new content. Second, we relax this assumption and study a more realistic setting in which algorithms must use historical statistics to estimate which pages are most likely to yield links to new content. We recommend a simple algorithm that performs comparably to all approaches we consider. We measure the overhead of discovering new content, de- fined as the average number of fetches required to discover one new page. We show first that with perfect foreknowledge of where to explore for links to new content, it is possible to discover 90% of all new content with under 3% overhead, and 100% of new content with 9% overhead. But actual algorithms, which do not have access to perfect foreknowl- edge, face a more difficult task: one quarter of new content is simply not amenable to efficient discovery. Of the re- maining three quarters, 80% of new content during a given week may be discovered with 160% overhead if content is recrawled fully on a monthly basis.

Dasgupta, Anirban, Arpita Ghosh, Ravi Kumar, Christopher Olston, Sandeep Pandey and Andrew Tomkins. WWW 2007 (2007). Articles>Web Design>Search>Information Design

28.
#34564

Designing for Faceted Search

Faceted search, or guided navigation, has become the de facto standard for e-commerce and product-related websites, from big box stores to product review sites. But e-commerce sites aren’t the only ones joining the facets club. Other content-heavy sites such as media publishers (e.g. Financial Times: ft.com), libraries (e.g. NCSU Libraries: lib.ncsu.edu/), and even non-profits (e.g. Urban Land Institute: uli.org) are tapping into faceted search to make their often broad-range of content more findable. Essentially, faceted search has become so ubiquitous that users are not only getting used to it, they are coming to expect it.

Lemieux, Stephanie. User Interface Engineering (2009). Articles>Web Design>Information Design>Search

29.
#34665

Indexing the Web—It’s Not Just Google’s Business

Web databases do much more than passively store information. Part of their power comes from indexing records efficiently. An index serves as a map, identifying the precise location of a small piece of data in a much larger pile. For example, when I search for “web development,” Google identifies two hundred million results and displays the first ten—in a quarter of a second. But Google isn’t loading every one of those pages and scanning their contents when I perform my search: they’ve analyzed the pages ahead of time and matched my search terms against an index that only references the original content.

Mullican, Lyle. List Apart, A (2009). Articles>Web Design>Information Design>Search Engine Optimization

30.
#34739

Is Your Key Content Drowning in News?

Many web editors spend a lot of their time writing news stories for the company web site. However, traffic analysis frequently reveals that this content is not very popular - and that users may in fact miss the key content they come to find (product data, addresses etc.) because it's practically drowning in news stories.

Furu, Nina. Content Strategy (2005). Articles>Web Design>Information Design>Search Engine Optimization

31.
#34961

What is Enough? Satisficing Information Needs   (peer-reviewed)   (members only)

This paper seeks to understand how users know when to stop searching for more information when the information space is so saturated that there is no certainty that the relevant information has been identified.

Prabha, Chandra, Lynn SilipigniConnaway, LawrenceOlszewski and Lillie R. Jenkins. Journal of Documentation (2007). Articles>Information Design>Search>User Centered Design

 
« PREVIOUS PAGE 

There are 12 readers currently online: 1 registered user and 11 guests. Register.Follow us on: TwitterFacebookRSSPost about us on: TwitterFacebookDeliciousRSSStumbleUpon