New Look for Internet Archive

October 29, 2014

The Internet Archive has a new look. You may have seen the change, but I don’t visit the site too frequently. I have struggles with its search system.

The new look features many postage stamp graphics and some text. Click on a graphic and one is sent to the appropriate Archive page.

Here’s a screenshot of the content available to you.

image

How does one search this content? The search box returns a list of hits with an icon indicating the content type. Have the cheerleaders for unified search would have cracked the information access challenge for a single search box to access mixed content types? I am still a fan of one at a time searching. Inefficient, but I get a sense of the collection’s scope and the idiosyncrasies of the indexed information.

Searching today is more difficult than it was in 1980 in my opinion. The method required is to know what in a collection before one queries it.

How does one know what’s in each of these collections? Well, unfortunately you can no longer ask a librarian in many organizations.

You are on your own, pilgrim.

Stephen E Arnold, October 29, 2014

Visualizing Data: Often Baffling

October 29, 2014

I read “72 Hours of #Gamergate.” I don’t follow the high buck world of video games. The write up contains oodles of data. Some of the information is in the form of bar charts. Other information is presented in words, spreadsheets, and graphics. I am okay with the bar charts. These have labels and numbers on the x and y axes. The visualization show below baffles me:

image

The image adds graphic impact. I have been in briefings in which senior executives and military brass have presented similar visualizations.

I suppose clarity is less important than sizzle. Analytics vendors, are you listening? I think not if this graphic is any indication of the way data are presented.

Stephen E Arnold, October 29, 2014

EasyAsk Adds Glitter to Oya Costumes

October 29, 2014

I learned that Oya Costumes has tapped EasyAsk to provide the search function for www.oyacostumes.com. You can read the news release here. I clicked around using drop downs and facets. I did run a query to locate a suitable Harrod’s Creek Halloween costume. I searched for Darth Vader. The results were mostly on point. There was one anomaly, an inflatable purple suit. Perhaps Darth has a side few know about.

Here’s the result page for my query:

image

Here’s a close up of the purple outfit mapped to the query “Darth Vader.”

image

I quite like the inflatable purple suit. I assume it is semantically related to Mr. Vader.

Stephen E Arnold, October 29, 2014

The Cheating Search Engine

October 29, 2014

Spokeo is a people search engine that when you enter in a name, email address, or username it returns personal information. The results usually include address, phone number, other email addresses, or aliases. People searches are the digital equivalent of a phone book’s white pages, except it is easier to manipulate the information for the desired output. The results, however, are limited based off how much you want to pay for.

Most of the time people search engine advertisements are banal, but Spokeo has tried a new approach. When you visit the Web site, you are greeted by the following caption: “Is He Cheating On You?” This warning follows it:

CAUTION: This information is potentially shocking. Spokeo uses proprietary deep web technology to search over 70 social networks for status updates, photos, relationships, and profiles. Please prepare yourself for the unexpected.”

A picture of a couple caught in a scandalous position flocks the search box. Spokeo is trying to appeal to an entire new clientele, perhaps the kind who click on advertisements about learning a new language in a week or melting away the pounds with a new, exclusive diet pill offer. The search results include dating profiles, social media accounts, aliases, hidden photos, etc. The type of information you

While Spokeo was not the first people search engine of choice, it does provide basic information about individuals. This new advertising campaign, however, pushes it into the lowbrow Internet and makes its content questionable. Why the sudden change in marketing? Is Spokeo seeing a revenue drop or has seen a spike in profits when used as a cheating search engine?

Whitney Grace, October 29, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

The Failings of Google Authorship

October 29, 2014

A couple years ago, Google began pushing its Authorship markup program—its plan to verify the authorship of items in its search results and to supply author photos alongside verified entries. The idea, of course, was to convey trust in articles’ sources. Now, though, the initiative seems to be dead, and blogger David Leonhardt gives us “Google Authorship- the 3 Reasons Why It Failed.” Apparently, the average searcher was not persuaded to trust a source because of its writer’s smiling visage. In fact, says Leonhardt, those photos seemed to deter some searchers. He writes:

“Hindsight is 20/20 vision, so let’s put on our hindsight goggles and review the three reasons.

1. Trust and authority differ for different types of searches.

2. People trust institutions more than strangers.

3. People select between news and opinion.”

The post elaborates on each point. For example, Leonhardt identifies the three types of searching: for a purchase, for entertainment, and for information; each of these suggests different criteria for “authority.” He also observes that people looking for opinion seem to be swayed by seeing a trusted journalist’s face, but those looking for hard facts tend to click on entries sporting a news organization’s logo. See the write-up for more on these reasons behind Authorship’s downfall.

Could the authorship concept be saved, or revived? The post speculates:

“If Google can harness this understanding of what ‘authority’ means for various searches and flag individual author expertise and institutional expertise accordingly, it might still be able to help people find the most trusted authorities for a given search. Or here’s a novel idea: Google could do what it is already doing: trying to float the most trustworthy authoritative pages to the top of its results, where people tend to click through the most anyway. The face, or the logo, would not give the entry authority – it’s ranking would (and does).”

So, problem solved—there is no problem.

Cynthia Murrell, October 29, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Attensity Ups Its Presence in Hackathons

October 28, 2014

I found the Attensity blog post “Attensity Takes Utah Tech Week” quite interesting. I cannot recall when mainstream content processing companies embraced hackathons so fiercely.

The blog post explains:

A hackathon, for the uninitiated, is exactly what it sounds like: a hybrid of computer hacking and a marathon in a grueling, caffeine-fueled, 12-hour time period. Groups comprised of mostly engineers and IT whizzes compete against the clock and other teams to create a project to present at the of the day to a panel of judges.

What did Attensity’s engineers build to showcase the company’s sentiment analysis and analytics technologies? Here’s the Attensity description:

With the Twitter API up and running, Team Attensity used Raspberry Pi to process tweets using #obama and #utahtechweek. Simultaneously, the team used Arduino to code sentiments from the tweets using a red light for negative sentiments, blue for positive sentiments, and yellow for neutral sentiments.

Attensity was pleased with the outcome in Utah. More hackathons are in the firm’s future. I wonder if one can deploy IBM Watson using a Raspberry Pi or showcase HP Autonomy with an Arduino.

How will hackathons generate revenue? I am not sure. The effort seems like a cost hole to me.

Stephen E Arnold, October 28, 2014

Google Not Accurate in Search Results

October 28, 2014

I read “Is Google Responsible for Delivering Accurate – And Truthful – Search Results?” The main idea of the write up is that at least one person perceives Google’s search results as “riddled with lies, deception, even criminal intent.”

Google?

My goodness. A locksmith is taking Google to court over his allegations about Google’s search results. The aggrieved locksmith is quoted as saying:

“People think this search engine stuff is accurate. A lot of times it isn’t.”

Who knew.

The article reports:

It’s not just locksmiths, either, said Baldino. His belief is that listings for many occupations are contaminated with links to the unqualified and the unlicensed, because apparently Google acquires many of its listing from third parties including bogus locksmiths who create misleading web pages. Remember this because it may be key to how the law plays out. Nonetheless, it would be easy, Baldino added in the interview, for Google to double-check its listings. For locksmiths, he said, the company could cross-reference listings with state records of locksmith licenses. No license in the state database, no listing in Google; it’s that simple, he [the locksmith] said.

Double my goodness.

Stephen E Arnold, October 28, 2014

Predictive Analytics: Trouble Ahead?

October 28, 2014

I learned about a new book that will be available in early 2015. Its title is The Black Box Society: The Secret Algorithms That Control Money and Information. The author is Frank Pasquale, a professor of law at the University of Maryland.

The Harvard promotional Web site for the book asserts:

Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior.

The Institute for Ethics and Emerging Technologies mentioned the forthcoming book here. One of the comments about that post was interesting to me. TooManyJoes wrote:

The control of the results by the decision makers is what makes this future menacing. Right now, Google is under attack being too good at search prediction and making money on targeted advertisements whose brilliantly written algorithms allow such a sophisticated variety of information to be indexed. As a result search bubbles have formed, and a lack of statistics comprehension prevents the awareness of control over this new medium. Snake oil salesmen turned into Mad Men and psychiatrists, it’s the medium of internet based controlled by one snake oil salesman that frightens us all. I believe it’s not possible without a formal computational human algorithm to have enough of an impact to have widespread influence. I bring up these mediums because to engage in them is to participate, participation can be tracked, then imagine the expense of the things we have access to because free participation drives those products and services by up selling those products. Without education, which most people won’t be open to, and time for the common man to analyze the data…those in control of the data will be people delegated by others. Welcome to the age of transparency.

The Google reference may presage some discussion of the company’s predictive wizardry.

Stephen E Arnold, October 28, 2014

Hewlett Packard Autonomy Pushes into Hungary

October 28, 2014

I read “Statis Partners Up with HP Autonomy.” Statis is a services firm with about 60 employees. According to the article:

Hungarian Stratis Vezet?i és Informatikai Tanácsadó Kft. entered a partnership with Hewlett Packard Autonomy, thereby becoming part of the Big Data market, Stratis announced…

IDOL and the Digital Reasoning Engine can “do” Big Data, but the core system does information retrieval. There are other approaches to Big Data that use more modern technologies.

Some content processing vendors are showing more interest in what I call the Eastern European market. With HP looking to sell some of its China assets, the shift to Europe may be one way of growing revenues.

HP, like IBM, has its hands full. Forget the legal hassles, both companies are trying to get out of the buggy whip business, to reference a famous marketing myopia case.

The problem is that the revenues generated by new fangled businesses via the old-school IBM- and HP-type business model will produce revenue. Unfortunately that revenue will be less lucrative than the money made on mainframes and scientific equipment in the good old days.

HP will need to find dozens of Hungarian-type deals to allow Autonomy to pay back its new owner.

Stephen E Arnold, October 28, 2014

A New Partnership For Better Organization

October 28, 2014

Partnerships offer companies ways to improve their product quality and create new ones. Semantic Web reports that “Expert System And WAND Partner For A More Effective Management Of Enterprise Information.” Expert System is a leading semantic technology company and WAND is known for its enterprise taxonomies. Their new partnership will allow businesses to have a better and more accurate way to organize data.

Each company brings unique features to the partnership:

“The combination of the strengths of each company, on one side WAND’s unique expertise in the development of enterprise taxonomies and Expert System’s Cogito on the other side with its unique capability to analyze written text based on the comprehension of the meaning of each word, not only ensures the highest quality possible, but also opens up the opportunity to tackle the complexity of enterprise information management. With this new joint offer, companies will finally have full support for a faster and flexible information management process and immediate access to strategic information.”

Enterprise management teams are going to get excited about how Expert System and WAND will improve taxonomy selection and have more native integration with in-place data systems. One of the ways the two will combine their strengths is with the new automatic classification: when a WAND taxonomy is selecting, Expert System brings in its semantic based categorization rules and an engine for automatic categorization.

Whitney Grace, October 28, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta