Disappearing Content: You Cannot Search for It if It Is Not There

November 25, 2014

The issue of shaped and filtered content is becoming more and more of a mainstream topic. I read “Uber Removed Blog Post from Data Science Team That Examined Link between Prostitution and Rides.” The world’s oldest profession meets the world’s newest ride service. I noted this passage in the write up:

“The company examined its rider data, sorting it for anyone who took an Uber between 10 p.m. and 4 a.m. on a Friday or Saturday night. Then it looked at how many of those same people took another ride about four to six hours later – from at or near the previous nights’ drop-off point. Yes, Uber can and does track one-night stands. Consider it the Uber equivalent of the walk of shame.”

How will this corporate approach to content play out? My hunch is that content has been getting removed for a long time. I recall looking for information about the Spyglass browser decades ago and finding a 404 error.

At least today there are copies of the Web, caches on public systems, and people who store content on their drives. Nevertheless, most people cannot search for content that is not “there.” Has anyone looked for CMS information about the original MIC, RAC, and ZPIC contractors? My hunch is that more attention should be paid to content that goes missing, not because of its prurient nature, but because the disappearance of content provides very useful information about the behavior of people, systems, and organizations.

Stephen E Arnold, November 25, 2014

Elasticsearch Ups the Pressure on LucidWorks (Really?)

November 23, 2014

I am not too keen on videos. I prefer reading hard copies. I did find the video referenced in “Elasticsearch Uses Power of Community for Open Source Analytics” useful. My team and I are putting the finishing touches on a report that points out how enterprise search vendors have been leapfrogged by vendors rarely considered by mid tier consultants and the self appointed experts in search. The video drives home a simple point: Combining open source technologies delivers information access functions that are more useful to users than laundry lists, odd ball point and click suggested content, and confusing mash ups of information presented without context.

Why the reference to Lucid? One of the firm’s presidents had been involved with Jaspersoft, an open source analytics outfit. Despite this “inside track”, Elasticsearch has powered past Lucid, leaving that open source vendor struggling to reach parity with Elasticsearch. Elasticsearch itself faces challenges, but that’s the name of the game when keyword search is the keystone of a service. For now, Elasticsearch leaves competitors rushing to close the gap. By the way, this subject was the focal point of one of Dave Schubmehl’s IDC reports that surfed on my name. The juicy part about the “gap” was edited from my original write up. Nevertheless, the facts remain valid. Kudos to Elasticsearch.

Stephen E Arnold, November 23, 2014

Want to Lose Weight? Google Results Could Make You Fat

November 18, 2014

I never know what to make of article that report about research results. Mistakes that would not fly in a Stats 101 class are the norm. I did work through “Poor-Quality Weight Loss Advice Often Appears First in an Online Search.”

Here’s the passage that I highlighted with my new bright pink marker:

The study reveals that the first page of results, using a search engine like Google, is likely to display less reliable sites instead of more comprehensive, high-quality sites, and includes sponsored content that makes unrealistic weight loss promises.

I would not be surprised if there were a Federal grant boosting this ground breaking, never before thought of, issue.

I find the results presented by advertising supported search engines incredibly useful, relevant, and on point. The notion that one might have to use a system other than Bing or Google to get accurate information is a new thought.

I liked this bit about the timeliness and rigor of the research too:

In 2012, the researchers accessed 103 websites for queries specific to weight loss and scored the content on its adherence to available evidence-based guidelines for weight loss. Medical, government and university sites ranked highest, along with blogs.

Yes, blogs and governmental entities are fonts of accurate information. With data from a mind boggling 103 Web sites to evaluate, I am amazed with the speed with which the information found its way to an online publication.

Stephen E Arnold, November 18, 2014

Sinequa Launches U.S. Subsidiary

November 13, 2014

Another French company is setting up shop in America. Business Wire reports, “Sinequa to Open New York Subsidiary Following Continued Success in U.S. Market.” The search and analytics company has found so much prosperity here they’ve decided to make it official. The write-up tells us:

“The creation of the U.S. Corporation and the New York office is a necessary next step as Sinequa has recently announced major client wins in the U.S including AstraZeneca, Biogen Idec, Mercer, and several ongoing RFPs. Sinequa is projecting that the largest part of the company’s future growth will come from the U.S., and they are also expecting to extend their presence there as well.

“The first employee of Sinequa Corp. is Nicolas Brel, who brings his deep experience with Sinequa to New York as the company’s Solutions Architect. The hiring process for a new sales manager is well advanced, and Xavier Pornain, VP Sales & Alliances, will soon relocate to the U.S. and assume the role of country manager and head of Sinequa Corp.”

Will Sinequa outperform Dassault (Exalead), Antidot, or PolySpot? The French market appears to be problematic. The U.S. market, on the other hand, seems to be a slam dunk compared to demand for proprietary search in the healthy European countries. Interesting.

Launched in 2002, Sinequa serves hundreds of organizations around the world. Naturally, the company boasts strong business analytics across multiple data sources and types. Sinequa also emphasizes an intuitive interface and strong visualization tools. Based in Paris, the firm maintains offices in Frankfurt and London as well as their shiny new location in New York City.

Cynthia Murrell, November 13, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Will Two Xooglers Burnish Yahoo?

November 12, 2014

I read an exclusive story. Know how I know the story is “exclusive”? Here’s the title:

Exclusive: Some Unhappy Yahoo Investors Asking AOL for Rescue

Obviously you have to read the foundation’s exclusive. I want to focus on a different question: Can two former Google executives repair Yahoo’s revenues? I am less than optimistic. I used an illustration in one of the briefings I did during the era of Terry Semel. The picture featured a sinking ship with Mr. Semel’s face Photoshopped into a captain’s uniform.

As I pointed out years ago, once an Internet portal service loses its momentum, flat-lining is the upside. The downside is a slow, gentle drift into irrelevance. So the answer to the question, in my opinion, is, “Long shot.”

I like to recall Yahoo’s former chief technology officer railing me on a conference call about Yahoo’s super-advanced search technology. How is that working out?

Stephen E Arnold, November 12, 2014

Disney and Search

November 10, 2014

I won’t bore you with the Disney InfoSeek adventure. Sigh. If you want to know how Disney is approaching Web search, read “Disney Fights Piracy With New Search Patent.” The system and method is intended to filter out content not licensed by means known to Disney. The write up’s headline suggests that a system and method in the form of a patent will “fight piracy.” Interesting notion, but I think the idea is that Disney has built or will build a system that shows only “official” content.

The notion of building a specialist Web site is an interesting idea. The reality may be that traffic will be very hard to come by. The most recent evidence is Axil Springer’s capitulation to the Google. Axil Springer owns a chunk of Qwanta. Again a good idea, but it does not deliver traffic.

If you build a search engine, who will use it? Answer: Not too many people if the data available to me are correct.

Stephen E Arnold, November 210, 2014

POTUS, Fear, and Google

November 8, 2014

I have zero clue of this article—“Movie Chief: Obama Is Scared to Push Google, ISPs on Piracy”—is accurate. Let’s for the moment assume that the write up by Andy is right as rain.

Here’s a statement I noted:

“It’s sad because if we had a good president that cared about the film industry he would pass a very simple law, an anti-piracy law, but they don’t want to stop it because they are scared of Google, and he’s scared of all the ISPs,” Lerner says. Google’s power and money not only scares off the President but Congress too, Lerner adds. Furthermore, plenty of that revenue is coming piracy-related sources, so the company has no incentive stop it.

Let’s look at the entities in this article.

  • The president of the United States or POTUS
  • Nu Image CEO and founder Avi Lerner
  • The GOOG.

As I understand it, Google which worked out a friendly deal with Axil Springer the other day is just as cuddly as a child chewed Harrod’s teddy bear. The POTUS is able to send troops, issue Executive Orders, and disrupt traffic when he ventures out into the amber waves of grain. (Is there “grain” in LA?) Mr. Lerner is a movie mogul. I am not sure what a movie mogul does. I think it involves creating high value intellectual property which puts Shakespeare and Milton in a state of inferiority.

The point is that movie moguls and POTUS are not as powerful as Google.

From Google’s point of view, that’s the way life is supposed to work. Problems with that, pilgrim. Well, you can always take your queries to Yahoo or, better yet, Qwanza OR Qwanta, whatever. (Try typing that name rapidly on your iPhone.)

Keep in mind that the source write up may not be spot on. It is entertaining, though.

Stephen E Arnold, November 8, 2014

Attensity: Downplaying Automated Collection and Analysis

November 7, 2014

I read “Do What I Mean, Not What I Say: The Text Analytics Paradox.” The write up made a comment which I found interesting; to wit:

Now, before you start worrying about robots replacing humans (relax—that’s at least a couple of years away), understand this: context and disambiguation within these billions of daily social posts, tweets, comments, and online surveys is they key to viable, business-relevant data. The way human use language is replete with nuance, idiomatic expressions, slang, typos, and of course, context. This underscores the magnitude of surfacing actionable intelligence in data for any industry.

Based on information my research team has collected, the notion of threat detection via automated collection and analysis of Internet-accessible information is quite advanced. In fact, some of the technology has been undergoing continuous refinement since the late 1990s. Rutgers University has been one of the academic outfits in the forefront of this approach to the paradox puzzling Attensity.

The more recent entrants in this important branch (perhaps an new redwood in the search forest) of information access are keeping a low profile. There is a promising venture funded company in Baltimore as well as a China-based firm operating from offices in Hong Kong. Neither of these companies has captured the imagination of traditional content processing vendors for three reasons:

First, the approach is not from traditional information retrieval methodologies.

Second, the companies generate most of their revenue from organizations demanding “quiet service.” (This means that when there is no marketing hoo hah, the most interesting companies are simply not visible to the casual, MBA inspired analyst.

Third, the outputs are of stunning utility. Information about quite particular subjects are presented without recourse to traditional human intermediated fiddling.

I want to float an idea: The next generation firms delivering state of the art solutions and have yet to hit the wall that requires the type of marketing that now characterizes some content processing efforts.

I am trying to figure out how to present these important but little known players. I will write about one in my next Info Today article. The challenge is that there are two dozen firms pushing “search” in a new and productive direction.

Stephen E Arnold, November 7, 2014

Insights from Search Pro Dave Hawking

November 7, 2014

Search-technology expert Dave Hawking is now working with Microsoft to improve Bing. Our own Stephen Arnold spoke to Mr. Hawking when he was still helping propel Funnelback to great heights. Now, IDM Magazine interviews the search wizard about his new gig, some search history, and challenges currently facing enterprise search in, “To Bing and Beyond.”

Anyone interested in the future of Bing, Microsoft, or enterprise search, or in Australian computer-science history, should check out the article. I was interested in this bit Hawking had to say about ways that tangled repository access can affect enterprise search:

“Access controls for particular repositories are often out of date, inappropriate, and inconsistent, and deployment of enterprise search exposes these problems. They can arise from organisational restructuring, staff changes or knee-jerk responses to unauthorised accesses. As there are usually a large number of repositories, rationalising access controls to ensure that search results respect policies is a lot of work.

“Organisations vary widely in their approach to security: some want security enforced with early binding (recording permissions at indexing time), others want late binding, where current permissions are applied when query result are displayed, or a hybrid of the two.

“This choice has a major impact on performance. Another option is ‘translucency’, where users may see the title of a document but not its content, or receive an indication that documents matching the query exist but that they need to request permission to access them. As well these security model variations, organisations vary in their requirements for customization, integration and presentation, and how results from multiple repositories should be prioritized, tending to make enterprise search projects quite complex.”

Eventually, standards and best practices may spread that will reduce these complexities. Then again, perhaps technology now changes too fast for such guidelines to take root. For now, at least, experts who can skillfully navigate this obstacle-strewn field will continue to command a pretty penny.

Cynthia Murrell, November 07, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

IBM Watson Has a Tough Question to Answer

November 6, 2014

In a sense, the erosion of a well-known company is interesting to watch. Some might use the word “bittersweet.” IBM has been struggling. Its “profits” come from stock buybacks, reductions in force, cost cutting, and divestitures. Coincident with the company’s quarterly financial reports, I heard two messages.

  1. We are not going to hit the 2015 targets we said we were going to hit
  2. IBM paid another company money to “acquire” one of IBM’s semiconductor units.

I may have these facts wrong, but what’s important is that the messaging about IBM’s strategic health sends signals which I find troubling. IBM is a big company, and it will take time for its ultimate trajectory to be discernable. But from my vantage point in rural Kentucky, IBM has its work cut out for its thousands of professionals.

I read “Does Watson Know the Answer to IBM’s Woes?” Compared to other Technology Review write ups about IBM’s projected $10 billion revenue juggernaut, the article finally suggests that IBM’s Watson may not be the unit that produces billions in new revenue.

Here’s a passage I highlighted with my trusty yellow marker:

Watson is still a work in progress. Some companies and researchers testing Watson systems have reported difficulties in adapting the technology to work with their data sets. IBM’s CEO, Virginia Rometty, said in October last year that she expects Watson to bring in $10 billion in annual revenue in 10 years, even though that figure then stood at around $100 million.

Let’s consider this $100 million number. If it is accurate, IBM is now one eighth the size of Autonomy when HP paid $11 billion for the company. It took Autonomy more than 14 years to hit this figure. In order to produce $800 million in revenue, Autonomy had to invest, license, and acquire technology and businesses. In total, Autonomy was more like an information processing holding company, not a company built on a one trick pony like Google’s search and advertising technology. Autonomy’s revenue was diversified for one good reason: It has been very difficult to built multi billion dollar businesses on basic search and retrieval. Google hit $60 billion because it hooked search to advertising. Autonomy generated seven times more revenue than Endeca because it was diversified. Endeca never broke out of three main product lines: ecommerce, search, and business intelligence. And Endeca never generated more than an estimated $160 million in revenue per year at the time of its sale to Oracle. Even Google’s search appliance fell short of Autonomy’s revenues. Now IBM wants to generate more money from search than Autonomy and in one third the time. Perhaps IBM could emulate Mike Lynch’s business approach, but event then this seems like a bridge too far. (This is a more gentle way of saying, “Not possible in 60 months.”)

It is very difficult to generate billions of dollars from search without some amazing luck and an angle.

If IBM has $100 million in revenue, how will the company generate $1 billion and then an additional $9 billion. The PR razzle dazzle that has involved TV game shows, recipes with tamarind, and an all out assault on main stream media about Watson has been impressive. In search, $100 million is a pretty good achievement. But $100 million does not beget $1 billion without some significant breakthroughs in marketing, technology, must have applications, and outstanding management.

From my point of view, Technology Review and other high profile “real” news outfits have parroted the IBM story about Watson, artificial intelligence, and curing cancer. To IBM’s credit, it has refrained from trying to cure death. Google has this task in hand.

The story includes a modest but refreshing statement about the improbability of Watson’s financial goal:

“It’s not taking off as quickly as they would like,” says Robert Austin, a professor of management at Copenhagen Business School who has studied IBM’s strategy over the years. “This is one of those areas where turning demos into real business value depends on the devils in the details. I think there’s a bold new world coming, but not as fast as some people think.”

As the story points out, “Watson is still a work in progress.”

Hey, no kidding?

Stephen E Arnold, November 6, 2014

Next Page »