One of the dirty little secrets about being an information architect is that most of us only bat .500 at best. We labor and agonize over making recommendations and designing information architectures that are supposed to change the world, but many of our designs never see the light of day. Rather than moan about why my designs were not implemented, I want to share my story.
Faceted search is extremely helpful for certain kinds of finding—particularly for ecommerce apps. Unfortunately, the designers of mobile applications do not have established user interface paradigms they can follow or abundant screen real estate for presenting facets and filters in a separate area on the left or at the top of a screen. To implement faceted search on mobile devices, we need to get creative rather than following established Web design patterns. Join me in exploring the Four Corners, Modal Overlay, Watermark, and Refinement Options design patterns for mobile devices. Following these patterns can move us one step closer to making faceted search a usable reality on mobile devices. But first, let’s take a look at the challenges of designing mobile faceted search, which include navigational elements that use up precious screen real estate, limited search-refinement options, and the general lack of an iterative refinement flow.
Faceted search, or guided navigation, has become the de facto standard for e-commerce and product-related websites, from big box stores to product review sites. But e-commerce sites aren’t the only ones joining the facets club. Other content-heavy sites such as media publishers (e.g. Financial Times: ft.com), libraries (e.g. NCSU Libraries: lib.ncsu.edu/), and even non-profits (e.g. Urban Land Institute: uli.org) are tapping into faceted search to make their often broad-range of content more findable. Essentially, faceted search has become so ubiquitous that users are not only getting used to it, they are coming to expect it.
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our study has two parts. First, we study the inherent difficulty of the discovery problem using a maximum cover formulation, under an assumption of perfect estimates of likely sources of links to new content. Second, we relax this assumption and study a more realistic setting in which algorithms must use historical statistics to estimate which pages are most likely to yield links to new content. We recommend a simple algorithm that performs comparably to all approaches we consider. We measure the overhead of discovering new content, de- ﬁned as the average number of fetches required to discover one new page. We show ﬁrst that with perfect foreknowledge of where to explore for links to new content, it is possible to discover 90% of all new content with under 3% overhead, and 100% of new content with 9% overhead. But actual algorithms, which do not have access to perfect foreknowl- edge, face a more difficult task: one quarter of new content is simply not amenable to efficient discovery. Of the re- maining three quarters, 80% of new content during a given week may be discovered with 160% overhead if content is recrawled fully on a monthly basis.
The Google Sandbox is a filter that was put in place in about March of 2004. New websites with new domain names can take 6 to 12 months to get decent rankings on Google. Some are reporting stays of up to 18 months. The Sandbox seems to affect nearly all new websites placing them on probation. Similarly, websites that have made comprehensive redesigns have been caught up in this Sandbox. Does this Sandbox Really Exist, or is it just part of the Google algorithm? This has been a big controversy with many different opinions. Most now believe that this is an algorithm. In either case, the Sandbox functions to keep new sites from shooting to the top of Google in just a few weeks and overtaking quality sites that have been around for many years. This appears to be an initiation period for new websites.
Google's increasing use of anti-spam features has meant that optimising websites for Google has become much harder and it's now not just a case of opening your websites source files in notepad, adding some keywords into your various HTML tags, uploading your files and waiting for the results. In fact in my opinion and I'm sure others will agree with me, this type of optimisation, commonly referred to as onpage optimisation will only ever be 20% effective at achieving rankings for any keywords which are even mildly competitive. Those of us who aced maths in school will know this leaves us with 80% unaccounted for.
Web databases do much more than passively store information. Part of their power comes from indexing records efficiently. An index serves as a map, identifying the precise location of a small piece of data in a much larger pile. For example, when I search for “web development,” Google identifies two hundred million results and displays the first ten—in a quarter of a second. But Google isn’t loading every one of those pages and scanning their contents when I perform my search: they’ve analyzed the pages ahead of time and matched my search terms against an index that only references the original content.
Search engines are one of the most important traffic drivers to sites these days, which is why Search Engine Optimization (SEO) is becoming more and more important. SEO is often thought to be just a set of some technical tricks, and as a professional SEO, I confess to spending a lot of time with clients fixing technical issues. A site's structure though, is just as important. Your site's structure determines whether a search engine understands what your site is about, and how easily it will find and index content relevant to your site's purpose and intent. By creating a good structure, you can use the content you've written that has attracted links from others, and use your site's structure to spread some of that "linkjuice" to the other pages on your site.
Many web editors spend a lot of their time writing news stories for the company web site. However, traffic analysis frequently reveals that this content is not very popular - and that users may in fact miss the key content they come to find (product data, addresses etc.) because it's practically drowning in news stories.
In my last post, I argued that making content findable in search engines requires you to understand how your search engine algorithm ranks and sorts the content it indexes. Since Google is such an important search engine for content, including help content, I want to dive deeper into strategies for maximizing the visibility of help content on Google.
In my last post, I argued that navigation systems can’t be entirely discarded in favor of search, because navigation helps users discover the unknown unknown. But now that we’ve covered navigation systems a bit, it’s time to move on to search, because search is undoubtedly a major way that users navigate help content. How can you organize your content so that the topics are findable in search?
Within the realm of computational semantics, there is still a fairly broad disconnect between triple pair semantics, the use of RDF (or turtle notation) to create atomic assertions, and the realm of semantics as reflected on the web. I do not expect this to change much in 2009, save perhaps that the gulf between the two will likely just get wider.
Frank Lloyd Wright said that the two most important tools for an architect were the drafting pencil and the sledgehammer. Of the two, the pencil is the easier to use as well as the more effective. As it is with building design, so it is with designing websites and their discoverability by search engines, the tool used by a majority of users. The Web has become so vast and the search systems have become so sophisticated that retroactive optimization can be only marginally effective.
Acquiring and installing a search engine is just the beginning of creating an effective enterprise search system. John Ferrara walks us through strategies for addressing critical aspects of the user experience often overlooked or ignored.
The authors explore ways in which categories can be leveraged to improve search. An interface named SWISH is presented, in which search results are automatically categorized, and pages in the same category are grouped together.
The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.
Topical metadata have been used to indicate the subject of Web pages. They have been simultaneously hailed as building blocks of the semantic Web and derogated as spam. At this time major Web browsers avoid harvesting topical metadata. This paper suggests that the significance of the topical metadata controversy depends on the technological appropriateness of adding them to Web pages. This paper surveys Web technology with an eye on assessing the appropriateness of Web pages as hosts for topical metadata. The survey reveals Web pages to be both transient and volatile: poor hosts of topical metadata. The closed Web is considered to be a more supportive environment for the use of topical metadata. The closed Web is built on communities of trust where the structure and meaning of Web pages can be anticipated. The vast majority of Web pages, however, exist in the open Web, an environment that challenges the application of legacy information retrieval concepts and methods.