CyberOSINT banner

Google Puts Wood behind Enterprise Search

May 24, 2015

A couple of years ago, enterprise search at Google was not setting my world on fire. I enjoyed reporting on the cost of the Google Search Appliance, a fail over component, and the services required by a Google partner to make the GSA sing and dance the way the licensee wanted. I listened to Googlers for little bits of gossip. One of the items which I was not able to verify was that there were not enough engineers working on the GSA. Other activities at Google beckoned. See, for example, my write up about the robotic teddy bear or run a search for the Loon balloon thing.

I read “Google Labs for Enterprise Search” and learned that those rumors were wrong. Wrong, wrong, wrong. Google Enterprise Search is not just the exciting, job creating engine embodied in the Google Search Appliance. Enterprise search embraces Google Intranet search and the Google Mini. I thought the Mini was history.

Wait. A newspaper for enterprise search reported this story as if it were recent. Well, the undated story about Google is not recent. The article comes from something called “In the Googleplex” and it is a Google hobby site.

So maybe I was not wrong, wrong, wrong.

I highlight this item because folks writing and curating information about search do not date their articles. Google is date challenged, which is one reason the GOOG has funded Recorded Future and its time technology.

Pumping out old information as if it were “fresh” just confuses the already credibility challenged search and content processing sector. Maybe no one cares or most readers are content to accept baloney as sirloin.

Stephen E Arnold, May 24, 2015

A Bigger, Faster, Better Technology Innovation Pipeline: Think Corporate Funding of R&D

May 24, 2015

I read the opinion piece by MIT president Rafael Reif. This item appeared in Mr. Bezos’ pet newspaper, the Washington Post. You can find a version of the editorial in the MIT News. Dr. Reif is, if Wikipedia is spot on, on the board of Alcoa. He also has invented “Method of forming a multi-layer semiconductor structure having a seamless bonding interface” and more than a dozen other systems and methods. You can get the biographical details in Wikipedia and on the MIT Office of the President’s Web site. Neither of these sources reference, as far as I could tell, “Trendy Reif Strikes Again” and this selfie:

The write up points out:

Together, the public and private sectors make investments in higher education and scientific research. (LiquiGlide emerged from research funded by the National Science Foundation and the Defense Department.) These investments spawn graduates and ideas that, through venture-capital-funded start-ups, pay off in innovations that serve society: the ultimate return on investment.

Okay, corporate funded academic research. The approach is a bit different from the now out moded Bell Labs’s angle of attack. But corporate funding generates some nifty architecture, even niftier piles of money to use for various purposes, and some nifty opportunities for the students and faculty.

There is a downside. I was surprised to learn:

But this system leaves a category of innovation stranded: new ideas based on new science. Self-fertilizing plants. Bacteria that can synthesize biofuels. Safe nuclear energy technology. Affordable desalination at scale. It takes time for new-science technologies to make the journey from lab to market, often including time to invent new manufacturing processes. It may take 10 years, which is longer than most venture capitalists can wait. The result? As a nation, we leave a lot of innovation ketchup in the bottle.


The problem has to be addressed. I assume that a hamburger without ketchup is not going to keep the conscientious, serious students and their mentors on the beam. MIT has to produce innovation. If ketchup is in the bottle, we need a vacuum device equipped with artificial intelligence and next generation features which perform non chaotic bottle maneuvers to remove the condiment while reporting data in real time. Yes.

What the fix?

There are two—count ‘em—two ways to tackle ketchup left in the bottle problems.

  1. Do the corporate funding of schools like MIT. That one percent silliness does not apply to academic institutions near the Charles River.
  2. Move faster. Hey, that nuclear bomb development which dragged on for a decade, old fashioned. We need to accelerate innovation.

The assumption is that innovation is the way to fix the challenges in today’s world. Nothing works, its seems, unless we have more, better, faster technology.

The only problem is that certain technologies like search and information access are not improving. I can identify a couple of other technological enhancements which are not having the desired impact. I wrote about the attention span of a goldfish and a 20 something.

The goldfish had the ability to concentrate for a longer period of time. MIT and other techno-havens are ecosystems. I lived in central Illinois in the winter. When I was a freshman in high school in 1958 I created a terrarium and grew some of the plants which overran my mother’s garden in Campinas, Brazil, before we returned to America.

I got the plants to thrive. I had a glass walled box that was just like Brazil until I left the lid off one day. The plants died. Whatever lived in the terrarium probably assumed the real world of Illinois in January was just like a tropical clime.

Bzzzz. Wrong.

Stephen E Arnold, May 24, 2015

Maana from Heaven: Sustaining Big Data Search

May 23, 2015

Need to search Big Data in Hadoop? Other data management systems? Maana is now ready to assist you. Fresh from stealth mode, the company received an infusion of venture capital which now totals $14.2 million. (You may have to pay to access the details of this cash injection.) Maana garnered only a fraction of the money pumped into search vendors Attivio ($71 million), Coveo ($34 million) or Palantir (hundreds of millions). But Maana has some big name backers; for example, GE Ventures and Intel Capital, among others.

Maana’s manna looks a lot like legal tender.

According to the company:

Maana is pioneering new search technology for big data. It helps corporations drive significant improvements in productivity, efficiency, safety, and security in the operations of their core assets.

This value proposition strikes me as familiar.

Maana is ready to enable customers to perform knowledge modeling, evaluation, data understanding, data shaping, and orchestration. Differentiation is likely to be a challenge. The company offers this diagram to assist prospects in understanding why Maana is different from other Big Data search solutions:


Image from

A key differentiator is that the company says:

Maana is not based on open source Solr/Lucene.

That should chop out the LuceneWorks (Really?) and other open source Big Data options in a competitive fray.

Will Manna’s positioning tactic thwart other proprietary Big Data information access solutions? Hewlett Packard, are you ready to rumble? Oracle. Wait. Oracle is always ready to rumble. Google and In-Q-Tel backed Recorded Future? Oops. Recorded Future is jammed with work and inquiries as I understand it. Whatever. Let the proprietary Big Data search Copa de Data off begin.

Stephen E Arnold, May 23, 2015

Keeping Track of Rockets

May 23, 2015

I have an Overflight for AeroText, which is located at 77 Fourth Avenue, in Waltham, Massachusetts. The company offers a search system. I noted that Rocket Software is located at 77 4th Avenue in Waltham, Massachusetts. Ergo: AeroText is now Rocket Software.

What makes this interesting to me is that Overflight snagged a number of references to a software component causing some consternation. I ran a query for “rocket search” on the GOOG and noted these results:


What jumps out is that there are no links to the Waltham-based outfit and there are quite a few links to information about removing what one outfit (ScarebearSoftware) called a virus. The software in question is “Rocket Search.”

My point is that vendors of search and content processing software have to name their products so that individuals interested in legitimate content processing systems can actually find the company.

In the past, I have commented about Brainware being usurped by an outfit keen to pump YouTube videos out with corresponding erosion of the Brainware “brand.” Brainware is not part of Lexmark, and I don’t think too many folks remember Brainware, trigrams, and the convoluted history of the company. Thunderstone in Cleveland has suffered a similar fate. Thunderstone is for all intents and purposes now associated with games, not search. And there are other examples.

The most recent instance of a vendor losing control of a brand was, until now, Smartlogic. An outfit in Baltimore has encroached on the conceptual real estate and Smartlogic’s Semaphore product name is now lovingly gazed upon by a German outfit with a variant of Smartlogic’s product moniker Semafora at

Now Rocket Software, a company eager to become a mover and shaker in search, faces the malware and virus association.

How does one remediate this problem. First, vendors have to pay attention to the name itself. Second, search vendors have to protect their “semantic real estate.” Third, search vendors have to communicate meaningful, high value information.

Ignoring these suggestions leads to brand erosion. Who can license a product if it cannot be found in Bing, Google, or Yandex? Augmentext can help remediate this type of problem, but it is easier and cheaper to head off invisibility and confusion before they gallop through the indexes churning up semantic mud.

I assume it is difficult to see a path forward when there is spatter on one’s eye glasses.

Stephen E Arnold, May 23, 2015

Bing Does App Indexing

May 22, 2015

I am one of the few people who use my smartphone to make calls and respond to the text instructions from my wife. I am not into apps. I have a nice, multi screened desktop computer which allows me to do what I need and want to do. I am in the minority, and I quite like it that way.

I read “Make Apps Stand Out in Search with App Linking.” I suppose if I needed an app, I would want to be able to locate the candidate software for my consideration. Once I locate a suitable app, I want to read reviews and maybe—not very often—but maybe load a trial version to see if the app actually “apps.” I just submitted one of my for fee columns and titled it “In App or Inept.” The reason? Apps are not exactly the type of software I want to use.

Remember. I work at a desk, three monitors, 13 computers/servers, two high speed data connections, VPNs, and software my team and I built. Apps are not what meet my needs. But there are many attention challenged, entitlement fueled younger folks who are into the “app” thing. I think that most apps are inappropriate for the type of work I do and perhaps other folks should actually do.

I don’t telework or telecommute. I actually work, answer the phone, and produce outputs. Some of the outputs are software like Overflight and Augmentext. Others are outputs like this article pointing out that apps are programs which perform a limited set of functions. For the mobile, telecommuter, concentration deprived, and ever to busy knowledge worker, apps are the cat’s pajamas.

Bing is not going to permit app discovery. I would be happier if Bing did these things:

  1. Indexed more substantive content
  2. Eliminated the need for me to search Microsoft research and Bing for information
  3. Provided an interface which allowed me to concentrate on relevant results
  4. Improved relevance
  5. Provided meaningful ways to present data; for example, time sort, date content added to the index, and other pre-pre diluvium operations.

I chuckled at this diagram:


I have zero idea what the diagram is supposed to mean. I know that when I tested a Lumia Windows phone, I could not locate apps. The sparseness of information was a turn off. Hey, how tough is it to provide a link to the developer’s Web site? Obviously pretty tough.

The Bing enhancements are part of the “deep linking” craze. The idea is that an app does something and data are usually needed for that something. To allow the app to spit out a result, which may or may not be what the user wants, the app “goes to another Web site” or “to a database”. What’s going on is a dumbing down and conveniencing up of information access. Perfect for a user with an attention span less than a goldfish’s and the reading skill of a bright sixth grader.

How does this work? Well, you use code like this:


Don’t worry. Your eyes are not failing. The code snippet was illegible on the Bing blog Web page. New president, same old Microsoft. Enchanting.

Here’s the passage I highlighted in Microsoft blue:

We’re also already in the process of bringing this apps and actions intelligence to Bing and Bing-powered search results including Cortana and Windows 10 and we will have more to share later. In fact, look for an upcoming post on how we will start applying this to our results soon.

Okay, can’t wait. Watch for my in app or in ept article in Information Today. Nah, never mind. You already know that I prefer substantive information access. App finding is a tiny part of the content universe. I want more progress on the more substantive information which is increasingly difficult to find. Use Bing to locate Babak Parviz’s work at Microsoft on the bionic contact lens. Now use Bing to track Dr. Parviz from Google to Amazon. Let me know how that works out for you. Is there an app for that with deep linking no less?

Stephen E Arnold, May 22, 2015

Search 2020: Peering into the Future of Information Access

May 22, 2015

The shift in search, user behaviors, and marketing are transforming bread-and-butter keyword search. Quite to my surprise, one of my two or three readers wrote to one of the goslings with a request. In a nutshell, the reader wanted my view of a write up which appeared in the TDWI online publication. TDWI, according to the Web site, is “your source for in depth education and research on all things data.” Okay, I can related to a categorical affirmative, education, research, and data.

The article has a title which tickles my poobah bone: “The Future of Search.” The poobah bone is the part of the anatomy which emits signals about the future. I look at a new search system based on Lucene and other open source technology. My poobah bone tingles. Lots of folks have poobah bones, but these constructs of nerves and tissues are most highly developed in entrepreneurs who invent new ways to locate information, venture capitalists who seek the next Google, and managers who are hired to convert information access into billions and billions of dollars in organic revenue.

The write up identifies three predictions about drivers on the information retrieval utility access road:

  1. Big Data
  2. Cloud infrastructure
  3. Analytics.

Nothing unfamiliar in these three items. Each shares a common characteristic: None has a definition which can be explained in a clear concise way. These are the coat hooks in the search marketers’ cloakroom. Arguments and sales pitches are placed on these hooks because each connotes a “new” way to perform certain enterprise computer processes.

But what about these drivers: Mobile access, just-in-time temporary/contract workers, short attention spans of many “workers”, video, images, and real time information requirements? Perhaps these are subsets of the Big Data, cloud, and analytics generalities, but maybe, just maybe, could these realities be depleted uranium warheads when it comes to information access?

These are the present. What is the future? Here’s a passage I highlighted:

Enterprise search in 2020 will work much differently than it does today. Apple’s Siri, IBM’s Watson, and Microsoft’s Cortana have shown the world how enterprise search and text analytics can combine to serve as a personal assistant. Enterprise search will continue to evolve from being your personal assistant to being your personal advisor.

How are these systems actually working in noisy automobiles or in the kitchen?

I know that the vendors I profiled in CyberOSINT: Next Generation Information Access are installing systems which perform this type of content processing. The problem is that search, as I point out in CyberOSINT, is that the function is, at best, a utility. The heavy lifting comes from collection, automated content processing, and various output options. One of the most promising is to deliver specific types of outputs to both humans and to other systems.

The future does tailor information to a person or to a unit. Organizations are composed of teams of teams, a concept now getting a bit more attention. The idea is not a new one. What is important is that next generation information access systems operate in a more nuanced manner than a list of results from a Lucene based search query.

The article veers into a interesting high school teacher type application of Microsoft’s spelling and grammar checker. The article suggests that the future of search will be to alert the system user his or her “tone” is inappropriate. Well, maybe. I turn off these inputs from software.

The future of search involves privacy issues which have to be “worked out.” No, privacy issues have been worked out via comprehensive, automated collection. The issue is how quickly organizations will make use of the features automated collection and real time processing deliver. Want to eliminate the risk of insider trading? Want to identify bad actors in an organization? One can, but this is not a search function. This is an NGIA function.

The write up touches on a few of the dozens of issues implicit in the emergence of next generation information access systems. But NGIA is not search. NGIA systems are a logical consequence of the failures of enterprise search. These failures are not addressed with generalizations. NGIA systems, while not perfect, move beyond the failures, disappointments, and constant legal hassles search vendors have created in the last 40 years.

My question, “What is taking so long?”

Stephen E Arnold, May 22, 2015

Peruse Until You Are Really Happy

May 22, 2015

Have you ever needed to quickly locate a file that you just know you made, but were unable to find it on your computer, cloud storage, tablet, smartphone, or company pool drive?  What is even worse is if your search query does not pick up on any of your keywords!  What are you supposed to do then?  VentureBeat might have the answer to your problems as explained in the article, “Peruse Is A New Natural Language Search Tool For Your Dropbox And Box Files.”  Peruse is a search tool that allows users to use their natural flow of talking to find their files and information.

Natural language querying is already a big market for business intelligence software, but it is not as common in file sharing services.  Peruse is a startup with the ability to search Dropbox and Box accounts using a regular question.  If you ask, “Where is the marketing data from last week?” The software will be able to pull the file for you without even opening the file. Right now, Peruse can only find information in spreadsheets, but the company is working on expanding the supported file types.

“The way we index these files is we actually look at them visually — it understands them in a way a person would understand them,” said [co-founder and CEO Luke Gotszling], who is showing off Peruse…”

Peruse’s goal is to change the way people use document search.  Document search has remained pretty consistent since 1995, twenty years later Gotszling is believes it is time for big change.  Gotzling is right, document search remains the same, while Web search changes everyday.

Whitney Grace, May 22, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Is Collaboration the Key to Big Data Progress?

May 22, 2015

The article titled Big Data Must Haves: Capacity, Compute, Collaboration on GCN offers insights into the best areas of focus for big data researchers. The Internet2 Global Summit is in D.C. this year with many exciting panelists who support the emphasis on collaboration in particular. The article mentions the work being presented by several people including Clemson professor Alex Feltus,

“…his research team is leveraging the Internet2 infrastructure, including its Advanced Layer 2 Service high-speed connections and perfSONAR network monitoring, to substantially accelerate genomic big data transfers and transform researcher collaboration…Arizona State University, which recently got 100 gigabit/sec connections to Internet2, has developed the Next Generation Cyber Capability, or NGCC, to respond to big data challenges.  The NGCC integrates big data platforms and traditional supercomputing technologies with software-defined networking, high-speed interconnects and visualization for medical research.”

Arizona’s NGCC provides the essence of the article’s claims, stressing capacity with Internet2, several types of computing, and of course collaboration between everyone at work on the system. Feltus commented on the importance of cooperation in Arizona State’s work, suggesting that personal relationships outweigh individual successes. He claims his own teamwork with network and storage researchers helped him find new potential avenues of innovation that might not have occurred to him without thoughtful collaboration.

Chelsea Kerwin, May 22, 2014

Stephen E Arnold, Publisher of CyberOSINT at

Make Mine Mobile Search

May 21, 2015

It was only a matter of time, but Google searches on mobile phones and tablets have finally pulled ahead of desktop searches says The Register in “Peak PC: ‘Most’ Google Web Searches ‘Come From Mobiles’ In US.”   Google AdWords product management representative Jerry Dischler said that more Google searches took place on mobile devices in ten countries, including the US and Japan.  Google owns 92.22 percent of the mobile search market and 65.73 percent of desktop searches.  What do you think Google wants to do next?  They want to sell more mobile apps!

The article says that Google has not shared any of the data about the ten countries except for the US and Japan and the search differential between platforms.  Google, however, is trying to get more people to by more ads and the search engine giant is making the technology and tools available:

“Google has also introduced new tools for marketers to track their advertising performance to see where advertising clicks are coming from, and to try out new ways to draw people in. The end result, Google hopes, is to bring up the value of its mobile advertising business that’s now in the majority, allegedly.”

Mobile ads are apparently cheaper than desktop ads, so Google will get lower revenues.  What will probably happen is that as more users transition to making purchases via phones and tablets, ad revenue will increase vi mobile platforms.

Whitney Grace, May 21, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Eric Schmidt On Search Ambition and Attitude at the GOOG

May 20, 2015

The article on Business Insider titled Google’s Former CEO Reveals The Complicated Search Question He Wants Google To Be Able To Answer reports on Eric Schmidt’s speech in Berlin where he mentioned the hurdles Google is yet to overcome. Obviously, Google is an incredibly ambitious company, and should never be satisfied. He spelled out one particular question he would like the search engine to be able to answer,

“Try a query like ‘show me flights under €300 for places where it’s hot in December and I can snorkel,'” Schmidt says. “That’s kind of complicated: Google needs to know about flights under €300; hot destinations in winter; and what places are near the water, with cool fish to see. That’s basically three separate searches that have to be cross-referenced to get to the right answer. Sadly, we can’t solve that for you today. But we’re working on it.”

Schmidt also argued on behalf of Google in regards to the EU investigation into Google possibly favoring its own results rather than a fair spread of companies. Schmidt claimed that Google is most interested in simplifying search for users, rather than obliging users to click around. Since Google search is admittedly ad-oriented, Schmidt’s position seems to be at least semi-accurate.

Chelsea Kerwin, May 20 , 2014

Stephen E Arnold, Publisher of CyberOSINT at


« Previous PageNext Page »