The Google: A Real Newspaper Discovers Modern Research

December 4, 2016

I read “Google, Democracy and the Truth about Internet Search.” One more example of a person who thinks he or she is an excellent information hunter and gatherer. Let’s be candid. A hunter gatherer flailing away for 15 or so years using online research tools, libraries, and conversations with actual humans should be able to differentiate a bunny rabbit from a female wolf with baby wolves at her feet.

Natural selection works differently in the hunting and gathering world of online. The intrepid knowledge warrior can make basic mistakes, use assumptions without consequence, and accept whatever a FREE online service delivers. No natural selection operates.

image

A “real” journalist discovers the basics of online search’s power. Great insight, just 50 years from the time online search became available to this moment of insight in December 2017. Slow on the trigger or just clueless?

That’s scary. When the 21st century hunter gatherer seems to have an moment of inspiration and realizes that online services—particularly ad supported free services—crank out baloney, it’s frightening. The write up makes clear that a “real” journalist seems to have figured out that online outputs are not exactly the same as sitting at a table with several experts and discussing an issue. Online is not the same as going to a library and reading books and journal articles, thinking about what each source presents as actual factoids.

Here’s an example of the “understanding” one “real” journalist has about online information:

Google is knowledge. It’s where you go to find things out.

There you go. Reliance on one service to provide “knowledge.” From an ad supported. Free. Convenient. Ubiquitous. Online service.

Yep, that’s the way to keep track of “knowledge.”

Read more

Ontotext: The Fabric of Relationships

November 9, 2016

Relationships among metadata, words, and other “information” are important. Google’s Dr. Alon Halevy, founder of Transformic which Google acquired in 2006, has been beavering away in this field for a number of years. His work on “dataspaces” is important for Google and germane to the “intelligence-oriented” systems which knit together disparate factoids about a person, event, or organization. I recall one of his presentations—specifically the PODs 2006 keynote–in which he reproduced a “colleague’s” diagram of a flow chart which made it easy to see who received the document, who edited the document and what changes were made, and to whom recipients of the document forward the document.

Here’s the diagram from Dr. Halevy’s lecture:

image

Principles of Dataspace Systems, Slide 4 by Dr. Alon Halevy at delivered on June 26, 2006 at PODs. Note that “PODs” is an annual ACM database-centric conference.

I found the Halevy discussion interesting.

Read more

A Decision on the Palantir US Army Dust Up Looms

October 28, 2016

I read “Inside Palantir’s War With the U.S. Army.” The article follows a somewhat familiar line of thought about why a Sillycon Valley outfit wants to take the US Army to Federal court.

One of the major reasons, according to the article, is choice of clothing. I highlighted this passage:

The slacks and dress shirts with a few buttons undone that Palantir executives wore may have been a step up for sunny California where hoodies are the norm but were a sign of disrespect at the Pentagon, according to a person familiar with the meeting. Senior officials, including U.S. Assistant Secretary of the Army for Acquisition, Logistics and Technology Dean Popps, were not impressed, this person said. They told Palantir: “Don’t come to the E-ring without a tie unless your name is Gates or Buffet,” said the person, referring to the portion of the Pentagon occupied by senior officials. “They couldn’t get over the tie thing. They didn’t care about the technology.”

The culture disconnect between the Silicon Valley type and the Department of Defense type is real. The externalities of uniforms versus business casual are easy to spot. I read:

Because Palantir wasn’t able to show how its technology could work with the Army’s existing intelligence systems—the purpose of conducting both tests—it was sidelined from competing for a new contract and pigeonholed by Army officials as a niche player, Palantir claims in the documents.

One question is, “Why wasn’t Palantir able to show interoperability?” From my vantage point in Harrod’s Creek, I thought that this question was an important one. I wandered around my mental filing cabinet for some angles on this question about “Why?”

image

The decision about the Palantir –US Army legal matter may be made public on October 31, 2016. That’s Halloween in the US. Will Palantir be treated with a favorable decision with regard to its efforts to license Gotham to the US Army? Will Palantir be tricked by the legal maneuvers of US government legal eagles?

Three points struck me as I reflected:

First, Palantir Technologies was funded in part by In-Q-Tel. Some folks in the CIA love In-Q-Tel’s investments. Some folks point out that In-Q-Tel often gee whiz but often difficult to integrate into what are called “as is” systems. Most vendors make it difficult to integrate certain types of operations, data, or functions with their “as is” systems. The idea of open interchange of information is talked about and enshrined in SOWs (statements of work) but the reality is to keep the “as is” folks contracting for lucrative integration work. As logical as “snap in” and “seamless interchange” are in go go Palo Alto / Berkeley mindset, the “as is” crowd is a reluctant bride in many procurements, particularly multi-year deals.

Second, the Pentagon professionals have specific rules to follow when it comes to licensing software. For a company focused on being disruptive with gee whiz technology the rules are “not logical.” Software, for example, has to be free from backdoors. Software has to conform to various features and functions set forth in an SOW. Software has to be more than just disruptive, pretty, or state of the art. The software has to arrive via a process and be accompanied with people who can work within the often illogical rules of the US government. In my experience, this notion of “conforming” is one that does not compute for Googley-type companies.

Read more

Google: Fragmentation and the False Universal Search

October 14, 2016

I read “Within Months, Google to Divide Its Index, Giving Mobile Users Better & Fresher Content.” Let’s agree to assume that this write up is spot on. I learned that Google plans “on releasing a separate mobile search index, which will become the primary one.”

The write up states:

The most substantial change will likely be that by having a mobile index, Google can run its ranking algorithm in a different fashion across “pure” mobile content rather than the current system that extracts data from desktop content to determine mobile rankings.

The news was not really news here in Harrod’s Creek. Since 2007, the utility of Google’s search system has been in decline for the type of queries the Beyond Search goslings and I typically run. On rare occasion we need to locate a pizza joint, but the bulk of our queries require old fashioned relevance ranking with results demonstration high precision and on point recall.

image

Time may be running out for Google Web search.

Several observations:

  1. With the volume of queries from mobile surpassing desktop queries, why would Google spend money to maintain two indexes? Perhaps Google will have a way to offer advertisers messaging targeted to mobile users and then sell ads for the old school desktop users? If the ad revenue does not justify the second index, well, why would an MBA continue to invest in desktop search? Kill it, right?
  2. What happens to the lucky Web sites which did not embrace AMP and other Google suggestions? My hunch is that traffic will drop and probably be difficult to regain. Sure, an advertiser can buy ads targeted at desktop users, but Google does not put much wood behind that which becomes a hassle, an annoyance, or a drag on the zippy outfit’s aspirations.
  3. What will the search engine optimization crowd do? Most of the experts will become instant and overnight experts in mobile search. There will be a windfall of business from Web sites addressed to business customers and others who use mobile but need an old fashioned boat anchor computing device. Then what? Answer: An opportunity to reinvent themselves. Data scientist seems like a natural fit for dispossessed SEO poobahs.

If the report is not accurate, so what? Here’s an idea. Relevance will continue to be eroded as Google tries to deal with the outflow of ad dollars to social outfits pushing grandchildren lovers and the folks who take snaps of everything.

The likelihood of a separate mobile index is high. Remember universal search? I do. Did it arrive? No. If I wanted news, I had to search Google News. Same separate index for scholar, maps, and other Google content. The promise of universal search was PR fluff.

Fragmentation is the name of the game in the world of Alphabet Google. And fragmented services have to earn their keep or get terminated with extreme prejudice. Just like Panoramio (I know. You are asking, “What’s Panoramio?), Google Web search could very well be on the digital glide way to the great beyond.

Stephen E Arnold, October 14, 2016

Five Years in Enterprise Search: 2011 to 2016

October 4, 2016

Before I shifted from worker bee to Kentucky dirt farmer, I attended a presentation in which a wizard from Findwise explained enterprise search in 2011. In my notes, I jotted down the companies the maven mentioned (love that alliteration) in his remarks:

  • Attivio
  • Autonomy
  • Coveo
  • Endeca
  • Exalead
  • Fabasoft
  • Google
  • IBM
  • ISYS Search
  • Microsoft
  • Sinequa
  • Vivisimo.

There were nodding heads as the guru listed the key functions of enterprise search systems in 2011. My notes contained these items:

  • Federation model
  • Indexing and connectivity
  • Interface flexibility
  • Management and analysis
  • Mobile support
  • Platform readiness
  • Relevance model
  • Security
  • Semantics and text analytics
  • Social and collaborative features

I recall that I was confused about the source of the information in the analysis. Then the murky family tree seemed important. Five years later, I am less interested in who sired what child than the interesting historical nuggets in this simple list and collection of pretty fuzzy and downright crazy characteristics of search. I am not too sure what “analysis” and “analytics” mean. The notion that an index is required is okay, but the blending of indexing and “connectivity” seems a wonky way of referencing file filters or a network connection. With the Harvard Business Review pointing out that collaboration is a bit of a problem, it is an interesting footnote to acknowledge that a buzzword can grow into a time sink.

image

There are some notable omissions; for example, open source search options do not appear in the list. That’s interesting because Attivio was at that time I heard poking its toe into open source search. IBM was a fan of Lucene five years ago. Today the IBM marketing machine beats the Watson drum, but inside the Big Blue system resides that free and open source Lucene. I assume that the gurus and the mavens working on this list ignored open source because what consulting revenue results from free stuff? What happened to Oracle? In 2011, Oracle still believed in Secure Enterprise Search only to recant with purchases of Endeca, InQuira, and Rightnow. There are other glitches in the list, but let’s move on.

Read more

Yahoo Security Breach: The Pee-Wee Purple Solecism

September 23, 2016

Remember ShrinkyDinks. Kids decorate pieces of plastic. The plastic then gets smaller when heated. I believe the ShrinkyDink management process has been disclosed. The innovator? Marissa Mayer, the former Google search guru turned business management maven.

Image result for shrinkydinks

What’s the ShrinkyDink approach to running a business? Take a revenue stream, decorate it with slick talk, and then reduce revenues and reputation. The result is a nifty entity with less value. Bad news? No. The upside is that Vanity Fair puts a positive spin on how bad news just get worse. A purple paradox!

ShrinkyDink Management. Pop business thinking into a slightly warmed market and watch those products and revenues become tinier as you watch in real time. Small is beautiful, right? I can envision a new study from Harvard University’s business school on the topic. Then comes an HBR podcast interview with Marissa Mayer, the Xoogler behind the ShrinkyDink method. A collaboration with Clayton Christensen is on deck. A book. Maybe a movie deal with Oliver Stone? As a follow up to “Snowden,” Stone writes, produces, and directs “Marissa: Making Big Little.” The film stars Ms. Mayer herself as the true Yahoo.

I read “Yahoo Verizon Deal May Be Complicated by Historic Hack.” Yahoo was “hacked,” according to the write up. Okay, but I read “hack” as a synonym for “We did not have adequate security in place.”

The write up points out:

The biggest question is when Yahoo found out about the breach and how long it waited to disclose it publicly, said Keatron Evans, a partner at consulting firm Blink Digital Security. (Kara Swisher at Recode reported that Verizon isn’t happy about Yahoo’s disclosures about the hack.)

CNBC points out that fixing the “problem” will be expensive. The write up includes this statement from the Xoogler run Yahoo:

“Such events could result in large expenditures to investigate or remediate, to recover data, to repair or replace networks or information systems, including changes to security measures, to deploy additional personnel, to defend litigation or to protect against similar future events, and may cause damage to our reputation or loss of revenue,” Yahoo warned.

Of interest to me is the notion that information about 500 million users was lost. The date of the problem seems to be about two years ago. My thought is that information about the breach took a long time to be discovered and disclosed.

Along the timeline was the sale of Yahoo to Verizon. Verizon issued a statement about this little surprise:

Within the last two days, we were notified of Yahoo’s security incident. We understand that Yahoo is conducting an active investigation of this matter, but we otherwise have limited information and understanding of the impact. We will evaluate as the investigation continues through the lens of overall Verizon interests, including consumers, customers, shareholders and related communities. Until then, we are not in position to further comment.

I highlighted in bold the two points which snagged my attention:

First, Verizon went through its due diligence and did not discover that Yahoo’s security had managed to lose 500 million customers’ data. What’s this say about Yahoo’s ability to figure out what’s going on in its own system? What’s this say about Yahoo management’s attention to detail? What’s this say about Verizon’s due diligence processes?

Second, Verizon seems to suggest that if its “interests” are not served, the former Baby Bell may want to rethink its deal to buy Yahoo. That’s understandable, but it raises the question, “What was Verizon’s Plan B if Yahoo presented the company with a surprise?” It seems there was no contingency, which is complementary with its approach to due diligence.

image

The decision making process at Yahoo has been, for me, wonky for a long time. The decision to release the breach information after the deal process and before the Verizon deal closes strikes me as an interesting management decision.

Read more

Jigsaw Reveals How Google Can Manipulate Thought and Behavior

September 12, 2016

Who knew? There have been suggestions that Alphabet Google manipulates search results. But the disclosure of a “clever plan to stop aspiring ISIS recruits” makes clear one thing: Alphabet Google can manipulate to some degree what a person thinks and how that person may then behave.

To get the details, navigate to Wired, the truth speaker for the technical aficionados. The article is “Google’s Clever Plan to Stop Aspiring ISIS Recruits.” Let’s visit some of the factoids in the article. I, of course, believe everything I read online.

Alphabet Google used to have an outfit called Google Ideas. Ideas, in my book, are a dime a dozen. The key is converting and idea to action and then shaping the idea to generate revenue. The Google Ideas group donned a new moniker, Jigsaw. According to the write up:

Jigsaw, the Google-owned tech incubator and think tank—until recently known as Google Ideas—has been working over the past year to develop a new program it hopes can use a combination of Google’s search advertising algorithms and YouTube’s video platform to target aspiring ISIS recruits and ultimately dissuade them from joining the group’s cult of apocalyptic violence. The program, which Jigsaw calls the Redirect Method and plans to launch in a new phase this month, places advertising alongside results for any keywords and phrases that Jigsaw has determined people attracted to ISIS commonly search for. Those ads link to Arabic- and English-language YouTube channels that pull together preexisting videos Jigsaw believes can effectively undo ISIS’s brainwashing—clips like testimonials from former extremists, imams denouncing ISIS’s corruption of Islam, and surreptitiously filmed clips inside the group’s dysfunctional caliphate in Northern Syria and Iraq.

This paragraph is mildly interesting and presents weaponized information in a matter of fact, what’s the big deal way. Consider these points:

  1. Search ad numerical recipes and videos. Quite a combination.
  2. Redirect. Send folks a different place from the place they really want to go.
  3. Undo brainwashing. Now that’s an interesting concept. Isn’t brainwashing a tough nut to crack. Cults, Jim Jones, etc.
  4. Shifting attention from a “dysfunctional caliphate” to something more acceptable. Okay for ISIS, but what if the GOOG substitutes other content to something else. Right, it will never happen. Mother Google is a really good person.

The article hits the high spots of censorship, including Twitter and the US Department of State’s Think Again, Turn Away, and everyone’s favorite cartoon Average Mohammed.

image

Click https://www.youtube.com/watch?v=7vJ-SlxjRrQ which may be offline after the Wired article hit the Internet.

Read more

The Yahooing of Alphabet Google

August 12, 2016

I read “Google Isn’t Safe from Yahoo’s Fate.” The write up is a business school type analysis which reminded me of the inevitable decline of many businesses. Case studies pose MBAs to be to the thrills of success and the consequences of management missteps. I recall a book, published by a now lost and forgotten outfit, which talked about blind spots and management myopia. Humans have a tendency to make errors. That’s what makes life exciting. But I see a GooHoo trajectory.

Goohoo

I learned in this article:

Google is on the wrong side of major trends in the digital advertising industry: Google captures direct response dollars as digital ad spend shifts up the funnel, its focus is still on browsers and websites as engagement is moving into apps and feeds, Google is deeply dependent on search during a shift to serendipitous discovery and ads designed to interrupt the user’s attention are being replaced by advertising designed to engage them. Its competitor, Facebook, is on the right side of all these trends.

The Alphabet Google thing has not been able to hit home runs in social media in my opinion. The Google Facebook dust up exists, and it seems to me that Google is withdrawing from the field of social battle.

The write up informed me:

Google’s search advertising model is built on direct response in that it charges for search ads that people click on. In theory, this is an entirely transparent model: After all, advertisers only pay when the advertising works. What it conceals is that they are taking more credit (and charging more) for value that its ads didn’t deliver. By charging you for the click that follows a search, Google effectively takes credit for the entire funnel of purchase consideration that led you to type in the search and click on the link in the first place….But the ad itself didn’t create their purchase intent — it just takes credit for it. Google’s lower funnel ads are getting credit for upper-funnel effectiveness, in no small part because the latter is just too hard to measure.

Read more

Text Analysis Vendors: Where Are They Now?

August 4, 2016

A year ago I read “20+ Text Mining and Text Analysis Tools.” The sale of Recommind to OpenText and the lack of excitement about search gave me an idea. Where are the companies identified by a mid tier consulting firm today. Let’s take a quick look.

AlchemyAPI. The company now asserts that its powers the “AI economy.” The Web sites has been updated since I last looked. There is a demo and a “free API key.” The system is now a platform. Gartner found the company to be a “cool vendor” in 2014. The company offers a webinar called “Building with Watson.”

Angoss. The company allows a customer to “predict, act, perform.” The focus is now on “customer intelligence in a single analytics tool.” The firm offers “knowledge” products and an insight optimizer.

Attensity. The company has undergone some change. The www.attensity.com Web site 404s. Years ago a text analytics cheerleader professed to be a fan. I think portions of the company operate under a different name in Germany. Appears to be in quiet mode.

Basis Technology. The company provided language reacted tools to outfits like Fast Search & Transfer. Someone told me that Basis dabbled in enterprise search. One high profile executive jumped to a company in Madrid.

Brainspace. The company’s Web site tells me, “We build brains.” The company offers NLP technology. Gartner “recommends” Brainspace for “advanced text analytics for financial institutions.” That’s good. The company does not list too many financial institutions as customers on its home page, however.

Buzzlogix. This company’s focus appears to be squarely on social media. The idea is that the firm helps its customers “listen, learn, and act.” When I visited the Web site, the most recent “news” appeared in November 2015.

Clarabridge. The company focuses on understanding “customer needs, wants, and feelings.” The company provides the “world’s most comprehensive customer intelligence platform.”

Clustify. The company positions its text analytics tools for eDiscovery. The company’s most recent news release is dated January 2014 and addresses the Recommind championed predictive coding approach to figuring out what was what in text documents.

Connexor. The company offers “machinese” demonstrations of its capabilities. The most recent item on the company’s Web site is the April 2015 announcement of a free NLP Web service.

DatumBox. This company is a “machine learning framework” provider. It makes machine learning “simple.” The Web site offers a free API key, which knocks the local KFC manager out as a potential licensee. The company’s most recent blog post is dated March 16, 2016. The most recent release is 0.7.0.

Eaagle. This is a company focused on the “new frontier of effective customer relationship management, research, and marketing.” Customers include HermanMiller, Chubb, and Suncor Energy. Data sheets, white papers, and documentation are available and no registration is necessary. Eaagle maintains a low profile.

ExpertSystem. The company bought Temis, a firm based on some ideas in the mind of a former IBM wizard. ExpertSystem, a publicly traded company, is pursuing the pharmaceutical industry and performing independent text analyses of Melania Trump’s and Michelle Obama’s speeches. The two ladies exhibit strong linguistic differences. The company’s stock is trading at $1.81 a share, a bit below Alphabet Google, an outfit also in the text analytics game.

FICO (Fair Isaac Corporation). The company gives “you the power to make smarter decisions.” The company has tallied a number of acquisitions since 1992. Its most recent purchase was Quadmetrics, a predictive analytics company. FICO is publicly traded and the stock is trading at $115.60 a share.

Cognitum. The company asserts that one can “improve your business with the innovation leader in semantic technology.” The company’s main product is Fluent Editor and it offers flagship platform called Ontorion. The firm’s spelling of “scallable” on its home page caught my attention.

IBM. The focus was not on Watson in the listing. Instead, the write up identified IBM Content Analytics as the product to watch. IBM’s LanguageWare uses a range of techniques to process content. IBM is very much in the content processing game with Watson becoming the umbrella “brand.” IBM just tallied is 16th straight quarter of declining revenue.

Intellexer offers text analytics, information security, media content search, and reputation management. The company’s most recent news release, dated May 13, 2016, announces the new version of Conceptmeister “which analyzes text from a photo, cloud documents, and URL.” Essentially this software creates a summary of the source content.

KBSPortal. This company offers natural language processing as a software as a service or NLP as SAAS. A demonstration of the system processes Wikipedia content. A demo video is available. To view it, I was asked to sign in. I declined. The company provides its prices and explains what each component does. Kudos for that approach.

Keatext. The company focuses on “customer experience management.” The company offers a two week free trial of its system. The system incorporates natural language processing. The company’s explanation of what it does requires a bit of digging.

Lexalytics. Lexalytics is in the sentiment analysis business.  The company’s capabilities include categorization and entity extraction. Social media monitoring can be displayed on dashboards. The company posts its prices. When I was involved in a procurement, Lexalytics prices, based on my recollection, were significantly higher than the fees quoted on this page. At one time, Lexalytics engaged in a merger or deal with Infonics. The company acquired Semantria a couple of years ago.

Leximancer. This Australian company’s software turns up in interesting places; for example, the US social security administration in Beltsville, Maryland. The firm’s “text in, insight out” technology emerged from research at the University of Queensland. The company was founded by UniQuest, a techohlogy commercialization company operated by the University of Queensland. The system is quite useful.

Linguamatics. This company has built a following in the pharmaceutical sector. The system does a good job processing academic and research information in ways which can influence certain lines of inquiry. The company now says that it offers the “world’s leading text mining platform.” the company was founded in 2001, and it has been moving along at a steady pace. Quite useful software and capabilities.

Linguasys. Surprised to see an installation profile. The outfit is maintaining a low profile.

Luminoso. The company provides “enterprise feedback and experience analytics.” The company has teamed with another Boston-area outfit, Basis Technologies, to form a marketing partnership. The angle the company seems to be promoting is that if you are using other systems, you can enhance them with text analytics.

MeaningCloud. Meaning cloud asserts that with its system one can “extract valuable information from any text source.” The company’s Text Classification API supports the Interactive Advertising Bureau’s “standard contextual taxonomy.” The focus seems to be on sentiment analysis like Lexalytics.

Read more

Why Enterprise Search Fails

July 12, 2016

I participated in a telephone call before the US holiday break. The subject was the likelihood of a potential investment in an enterprise search technology would be a winner. I listened for most of the 60 minute call. I offered a brief example of the over promise and under deliver problems which plagued Convera and Fast Search & Transfer and several of the people on the call asked, “What’s a Convera?” I knew that today’s whiz kids are essentially reinventing the wheel.

I wanted to capture three ideas which I jotted down during that call. My thought is that at some future time, a person wanting to understand the incredible failures that enterprise search vendors have tallied will have three observations to consider.

No background is necessary. You don’t need to read about throwing rocks at the Google bus, search engine optimization, or any of the craziness about search making Big Data a little pussycat.

Enterprise Search: Does a Couple of Things Well When Users Expect Much More

Enterprise search systems ship with filters or widgets which convert source text into a format that the content processing module can index. The problem is that images, videos, audio files, content from wonky legacy systems, or proprietary file formats like IBM i2’s ANB files do not lend themselves to indexing by a standard enterprise search system.  The buyers or licensees of the enterprise search system do not understand this one trick pony nature of text retrieval. Therefore, when the system is deployed, consternation follows confusion when content is not “in” the enterprise search system and, therefore, cannot be found. There are systems which can deal with a wide range of content, but these systems are marketed in a different way, often cost millions of dollars a year to set up, maintain, and operate.

image

Net net: Vendors do not explain the limitations of text search. Licensees do not take the time or have the desire to understand what an enterprise search system can actually do. Marketers obfuscate in order to close the deal. Failure is a natural consequence.

Data Management Needed

The disconnect boils down to what digital information the licensee wants to search. Once the universe is defined, the system into which the data will be placed must be resolved. No data management, no enterprise search. The reason is that licensees and the users of an enterprise search system assume that “all” or “everything” – maps to web content, email to outputs from an AS/400 Ironside are available any time. Baloney. Few organizations have the expertise or the appetite to deal with figuring out what is where, how much, how frequently each type of data changes, and the formats used. I can hear you saying, “Hey, we know what we have and what we need. We don’t need a stupid, time consuming, expensive inventory.” There you go. Failure is a distinct possibility.

image

Net net: Hope springs eternal. When problems arise, few know what’s where, who’s on first, and why I don’t know is on third.

Read more

Next Page »