Processing Content Is Easy, Right?
September 30, 2014
A mobile search app would be useful and appreciated by mobile devices. According to the URX Blog post “Deduplication Of Web Content” it is relatively easy to create a search app, but creating a robust search app is the challenge. A robust search app would need to include link prioritization, feature extraction, re-crawl estimation, and content deduplication. The post is the first in an article series developing a mobile search app.
Deduplicating content is important for user experience:
“Duplicate pages in a search index poison search results. The goal of a search engine is to return both relevant and diverse documents, allowing users to decide the optimal resolution for a query. Without deduplication, the top-k results returned for a user’s query would likely contain duplicate content. In the extreme, all k results will be copies of the same page. This creates a bad user experience where, as the crawler scales out, the duplicate likelihood increases. In fact, Google’s Matt Cutts believes that up to 20% of web content is duplicated.”
The rest of the post examines the different types of duplication, how to identify them, and remove them from search results.
While the search app will serve an important function, it does not make sense to me why people cannot just open a Web browser on a mobile device and conduct a regular search. What I would like to see is an app that searches content on apps on a device.
Whitney Grace, September 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext