Eric Schmidt On Search Ambition and Attitude at the GOOG
May 20, 2015
The article on Business Insider titled Google’s Former CEO Reveals The Complicated Search Question He Wants Google To Be Able To Answer reports on Eric Schmidt’s speech in Berlin where he mentioned the hurdles Google is yet to overcome. Obviously, Google is an incredibly ambitious company, and should never be satisfied. He spelled out one particular question he would like the search engine to be able to answer,
“Try a query like ‘show me flights under €300 for places where it’s hot in December and I can snorkel,'” Schmidt says. “That’s kind of complicated: Google needs to know about flights under €300; hot destinations in winter; and what places are near the water, with cool fish to see. That’s basically three separate searches that have to be cross-referenced to get to the right answer. Sadly, we can’t solve that for you today. But we’re working on it.”
Schmidt also argued on behalf of Google in regards to the EU investigation into Google possibly favoring its own results rather than a fair spread of companies. Schmidt claimed that Google is most interested in simplifying search for users, rather than obliging users to click around. Since Google search is admittedly ad-oriented, Schmidt’s position seems to be at least semi-accurate.
Chelsea Kerwin, May 20 , 2014
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Sinequa and Systran Partner on Cyber Defense
May 20, 2015
Enterprise search firm Sinequa and translation tech outfit Systran are teaming up on security software. “Systran and Sinequa Combine in the Field of Cyber Defense,” announces ITRmanager.com. (The article is in French, but Google Translate is our friend.) The write-up explains:
“Sinequa and Systran have indeed decided to cooperate to develop a solution for detecting and processing of critical information in multiple languages ??and able to provide investigators with a panoramic view of a given subject. On one side Systran provides safe instant translation in over 45 languages, and the other Sinequa provides big data processing platform to analyze, categorize and retrieve relevant information in real time. The integration of the two solutions should thus facilitate the timely processing of structured and unstructured data from heterogeneous sources, internal and external (websites, audio transcripts, social media, etc.) and provide a clear and comprehensive view of a subject for investigators.”
Launched in 2002, Sinequa is a leader in the Enterprise Search field; the company boasts strong business analytics, but also emphasizes user-friendliness. Based in Paris, the firm maintains offices in Frankfurt, London, and New York City. Systran has a long history of providing innovative translation services to defense and security organizations around the world. The company’s headquarters are in Seoul, with other offices located in Daejeon, South Korea; Paris; and San Diego.
Cynthia Murrell, May 20, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Navy Cloud Encounters a Storm Front
May 19, 2015
I read “Slow Progress Forces navy to Change Strategies for Cloud, Data Centers.” I have high regard for US Navy technical professionals. ONION router technology and miniature swarm drones have been based on some Navy research.
The write up troubled me. Here’s the first passage I noted this statement:
Culturally, we have to make this shift from a mistaken belief that all our data has to be near us and somewhere where I can do and hug the server, instead of someplace where I don’t know in the cloud. This is a big shift for many within the department. It’s not going to be an easy transition.”
Like most nations’ military forces resources are available in the form of personnel, machines, and money. Staffing also refreshes on a cadence different from some other government entities and many commercial organizations. There are not too many 70 year old nuclear submarine commanders.
The issue about the shift to cloud computing suggests that more than technical hurdles prevent enterprise and mission critical applications from moving to the cloud. I noted this paragraph as well:
While the Navy is open to using commercial or public clouds, the Marine Corps is going its own way. Several Marine Corps IT executives seemed signal that the organization will follow closely to what the Navy is doing, but put their own twist on the initiative. One often talked about example of this is the Marines decision to not move to the Joint Regional Security Stacks (JRSS) that is part of the Joint Information Environment (JIE) until at least version 2 comes online in 2017. Marine Corps CIO Gen. Kevin Nally said the decision not use the initial versions of JRSS is because Marine Corps’ current security set up is better and cheaper than version 1 or 1.5.
In interpreted the milspeak to mean, “We are doing the cloud but we are focusing on a private cloud, not the public Amazon thing.”
Will enterprise search vendors who emphasize their cloud solution advise their customers about cloud options? Search marketers often tell the prospect many things, and I assume explaining the different approaches to clouds and aggregation will be part of the sales presentation.
Stephen E Arnold, May 19, 2015
Searching Bureaucracy
May 19, 2015
The rise of automatic document conversion could render vast amounts of data collected by government agencies useful. In their article, “Solving the Search Problem for Large-Scale Repositories,” GCN explains why this technology is a game-changer, and offers tips for a smooth conversion. Writer Mike Gross tells us:
“Traditional conversion methods require significant manual effort and are economically unfeasible, especially when agencies are often precluded from using offshore labor. Additionally, government conversion efforts can be restricted by document security and the number of people that require access. However, there have been recent advances in the technology that allow for fully automated, secure and scalable document conversion processes that make economically feasible what was considered impractical just a few years ago. In one particular case the cost of the automated process was less than one-tenth of the traditional process. Making content searchable, allowing for content to be reformatted and reorganized as needed, gives agencies tremendous opportunities to automate and improve processes, while at the same time improving workflow and providing previously unavailable metrics.”
The write-up describes several factors that could foil an attempt to implement such a system, and I suggest interested parties check out the whole article. Some examples include security and scalability, of course, as well as specialized format and delivery requirements, and non-textual elements. Gross also lists criteria to look for in a vendor; for instance, assess how well their products play with related software, like scanning and optical character recognition tools, and whether they will be able to keep up with the volumes of data at hand. If government agencies approach these automation advances with care and wisdom, instead of reflexively choosing the lowest bidder, our bureaucracies’ data systems may actually become efficient. (Hey, one can dream.)
Cynthia Murrell, May 19, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Hybrid Is Essential to SharePoint 2016
May 19, 2015
It looks like SharePoint is planning to bring the cloud to its SharePoint Server 2016 users at critical points, rather than forcing them to go “all cloud.” This technique allows Microsoft to continue with the cloud-based services that they have invested in, while improving the on-premises experience that users are demanding. ZDNet covers the whole story in their article, “Microsoft’s SharePoint 2016: What’s Hybrid Got to do With It?”
The article sums up the much talked about hybrid approach:
“Though it will run on top of Windows Server 2016 R2 and/or Windows Server 2016, SharePoint 2016 will include support for what Microsoft calls ‘cloud-accelerated experiences,’ meaning new hybrid scenarios . . . Instead of trying to push all SharePoint users and all SharePoint workloads to the cloud, Microsoft is acknowledging there are some reasons (compliance among them) that not all data can or should be in SharePoint Online. That said, Microsoft wants to enable its SharePoint users to get at their data wherever it’s stored.”
Stephen E. Arnold is a lifelong leader in search and a long-time expert in SharePoint. He keeps managers and users updated on the latest SharePoint news through his Web service ArnoldIT.com. All eyes should stay peeled for continuing developments, as users get closer to seeing a public release of SharePoint Server 2016.
Emily Rae Aldridge, May 19, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Archive.is Preserves Online Information
May 18, 2015
Today’s information seekers use the Internet the way some of used reference books growing up. Unlike the paper tomes on our dusty bookshelves, however, websites can change their content without so much as a by-your-leave. Suggestions for preserving online information can be found in “Create Publicly Available Web Page Archives with Archive.is” at gHacks.net.
Writer Martin Brinkmann begins by listing several local options familiar to many of us. There’s Ctrl-s, of course, and assorted screenshot-saving methods. Website archivers like Httrack perform their own crawls and save the results to the user’s local machine. Remotely, Archive.org automatically creates snapshots of prominent sites, but users cannot control the results. Enter Archive.is. Brinkmann writes:
“Archive.is is a free service that helps you out. To use it, paste a web address into the form on the services main page and hit submit url afterwards. The service takes two snapshots of that page at that point in time and makes it available publicly. The first takes a static snapshot of the site. You find images, text and other static contents included while dynamic contents and scripts are not. The second snapshot takes a screenshot of the page instead. An option to download the data is provided. Note that this downloads the textual copy of the site only and not the screenshot. A Firefox add-on has been created for the service which may be useful to some of its users. It creates automatic snapshots of every web page that you bookmark in the web browser after installation of the add-on.”
Wow, don’t set and forget that Firefox option! In fact, the article cautions, be mindful of the public availability of every Archive.is snapshot; Brinkmann reasonably suggests the tool could benefit from a password feature. Still, this could be an option to preserve important (but, for the prudent, impersonal) information found online.
Cynthia Murrell, May 18, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
HP Idol and Hadoop: Search, Analytics, and Big Data for You
May 16, 2015
I was clicking through links related to Autonomy IDOL. One of the links which I noted was to a YouTube video labeled “HP IDOL for for Hadoop: Create a Smarter Data Lake.” Hadoop has become a simile for making sense of Big Data. I am not sure what Big Data are, but I assume I will know when my eight gigabyte USB key cannot accept another file. Big Data? Doesn’t it depend on one’s point of view?
What is fascinating about the HP Idol video is that it carries a posting date of October 2014, which is in the period when HP was ramping up its anti-Autonomy legal activities. The video, I assumed before watching, would break from the Autonomy marketing assertions and move in a bold, new direction.
The video contained some remarkable assertions. Please, watch the video yourself because I may have missed some howlers as I was chuckling and writing on my old school notepad with a decidedly old fashioned pencil. Hey, these tools work, which is more than I can say for some of the software we examined last week.
Here’s what I noted with the accompanying screenshot so you can locate the frame in the YouTube video to double check my observation with the reality of the video.
First, there is the statement that in an organization 88 percent of its information is “unanalyzed.” The source is a 2012 study from Forrsights Strategy Spotlight: Business Intelligence and Big Data. Forrester, another mid tier consulting firm, produces these reports for its customers. Okay, a couple of years old research. Maybe it is valid? Maybe not? My thought was that HP may be a company which did not examine the data to which it had access about Autonomy before it wrote a check for billions of dollars. I assume HP has rectified any glitch along this line. HP’s litigation with Autonomy and the billions in write down for the deal underscore the problem with unanalyzed data. Alas, no reference was made to this case example in the HP video.
Second, Hadoop, a variant of Google’s MapReduce technology, is presented as a way to reap the benefits of cost efficiency and scalability. These are generally desirable attributes of Hadoop and other data management systems. The hitch, in my opinion, is that it is a collection of projects. These have been developed via the open source / commercial model. Hadoop works well for certain types of problems. Extract, transform, and load works reasonably well once the Hadoop installation is set up, properly resourced, and the Java code debugged so it works. Hadoop requires some degree of technical sophistication; otherwise, the system can be slow, stuffed with duplicates, and a bit like a Rube Goldberg machine. But the Hadoop references in the video are not a demonstration. I noted this “explanation.”
Third, HP jumps from the Hadoop segment to “what if” questions. I liked the “democratize Big Data” because “Big Data Changes everything.” Okay, but the solution is Idol for Hadoop. The HP approach is to create a “smarter data lake.” Hmmm. Hadoop to Idol to data lake for the purpose of advanced analytics, machine learning functions, and enterprise level security. That sounds quite a bit like Autonomy’s value proposition before it was purchased from Dr. Lynch and company. In fact, Autonomy’s connectors permitted the system to ingest disparate types of data as I recall.
Fourth, the next logical discontinuity is the shift from Hadoop to something called “contextual search.” A Gartner report is presented which states with Douglas McArthur-like confidence:
HP Idol. A leader in the 2014 Garnter Magic Quadrant for Contextual Search.
What the heck is contextual search in a Hadoop system accessed by Autonomy Idol? The answer is SEARCH. Yep, a concept that has been difficult to implement for 20, maybe 30 years. Search is so difficult to sell that Dr. Lynch generated revenues by acquiring companies and applying his neuro-linguistic methods to these firms’ software. I learned:
The sophistication and extensibility of HP Autonomy’s Intelligent Data Operating Layer (Idol) offering enable it to tackle the most demanding use cases, such as fraud detection and search within large video libraries and feeds.
Yo, video. I thought Autonomy acquired video centric companies and the video content resided within specialized storage systems using quite specific indexing and information access features. Has HP cracked the problem of storing video in Hadoop so that a licensee can perform fraud detection and search within video libraries. My experience with large video libraries is that certain video like surveillance footage is pretty tough to process with accuracy. Humans, even academic trainees, can be placed in front of a video monitor and told, “Watch this stream. Note anomalies.” Not exciting but necessary because processing large volumes of video remains what I would describe as “a bit of a challenge, grasshopper.” Why is Google adding wild and crazy banners, overlays, and required metadata inputs? Maybe because automated processing and magical deep linking are out of reach? HP appears to have improved or overhauled Autonomy’s video analysis functions, and the Gartner analyst is reporting a major technical leap forward. Identifying a muzzle flash is different from recognizing a face in a flow of subway patrons captured on a surveillance camera, is it not?
I have heard some pre HP Autonomy sales pitches, but I can’t recall hearing that Idol can crunch flows of video content unless one uses the quite specialized system Autonomy acquired. Well, I have been wrong before, and I am certainly not qualified to be an analyst like the ones Gartner relies upon. I learned that HP Idol has a comprehensive list of data connectors. I think I would use the word “library,” but why niggle?
Fifth, the video jumps to a presentation of a “content hub.” The idea is that HP idol provides visual programming tools. I assume an HP Idol customer will point and click to create queries. The queries will deliver outputs from the Hadoop data management system and the content which embodies the data lake. The user can also run a query and see a list of documents. but the video jumps from what strikes me as exactly what many users no longer want to do to locate information. One can search effectively when one knows what one is looking for and that the needed information is actually in the index. The use case appears to be health care and the video concludes with a reminder that one can perform advanced analytics. There is a different point of view available in this ParAccel white paper.
I understand the strengths and weaknesses of videos. I have been doing some home brew videos since I retired. But HP is presenting assertions about Autonomy’s technology which seem to be out of step with my understanding of what Idol, the digital reasoning engine, Autonomy’s acquired video technology.
The point is that HP seems to be out marketing Autonomy’s marketing. The assert6ions and logical leaps in the HP Idol Hadoop video stretch the boundaries of my credulity. I find this interesting because HP is alleging that Autonomy used similar verbal polishing to convince HP to write a billion dollar check for a search vendor which had grown via acquisitions over a period of 15 years.
Stephen E Arnold, May 16, 2015
Lousy Search Results. An Attention Span Issue?
May 15, 2015
I read the enervating “Humans Have Shorter Attention Span Than Goldfish, Thanks to Smartphones.” Yep, thanks. When I am working and someone speaks to me, I often let out a squeal and twitch. I concentrate on the task at hand to the exclusion of the world. Some folks may lack this old-school concentration.
According to the write up, short attention spans are due to smartphones, not stupidity, a failure to exercise discipline over the mind, or the cranial wiring which permits one to focus. I learned:
According to scientists, the age of smartphones has left humans with such a short attention span even a goldfish can hold a thought for longer. Researchers surveyed 2,000 participants in Canada and studied the brain activity of 112 others using electroencephalograms. The results showed the average human attention span has fallen from 12 seconds in 2000, or around the time the mobile revolution began, to eight seconds.
Right, 12 seconds. That is probably enough attention for pre-Millennials. Eight seconds is too darned long to concentrate on any one thing.

Is this the next Dark Web research specialist I will hire?
When one of the people lobbying me for work whips out a smartphone, scans an iPad, and lets his or her eyes roam around the room—that’s it. No work. The goldfish has a nine second attention span. The fish I have watched in the holding tank in a Chinese restaurant in Wu Han seemed to be able to fix their attention for far long. One red fish just hovered in place and regarded me for 30 seconds maybe more.
Instead of hiring humans, perhaps I should go with a giant koi? Are lousy search skills an example of what happens when one cannot concentrate? Nah, blame the vendor or the IT department. Entitlement management works well.
Stephen E Arnold, May 15, 2015
Developing an NLP Semantic Search
May 15, 2015
Can you imagine a natural language processing semantic search engine? It would be a lovely tool to use in your daily routines and make research a bit easier. If you are working on such a project and are making a progress, keep at that startup because this is lucrative field at the moment. Over at Stack Overflow, an entrepreneuring spirit is trying to develop a “Semantic Search With NLP And Elasticsearch”:
“I am experimenting with Elasticsearch as a search server and my task is to build a “semantic” search functionality. From a short text phrase like “I have a burst pipe” the system should infer that the user is searching for a plumber and return all plumbers indexed in Elasticsearch.
Can that be done directly in a search server like Elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.”
Given that this question was asked about three years ago, a lot has been done not only with Elasticsearch, but also NLP. Search is moving towards a more organic experience, but accuracy is often muddled by different factors. These include the quality of the technology, classification, taxonomies, ads in results, and even keywords (still!).
NLP semantic search is closer now than it was three years ago, but technology companies would invest a lot of money in a startup that can bridge the gap between natural language and machine learning.
Whitney Grace, May 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Automated Search News: Lost in Link Land
May 14, 2015
I scanned the Paper.li’s “The Enterprise Search Daily.” I spotted this item:
Curious, I clicked on it. Here’s what Sinequa displayed:
Isn’t Sinequa one of the vendors Gartner described as a leader of the search pack. Not only was the Paper.li link submitted by Embedded something wrong. The source url is a 404.
So, how are those automated information systems supposed to work? See my write up about IBM’s burrito to get a glimpse of what happens when big ideas cannot be converted into workable components.
Yep, page not found. Reality is different from the marketing hoo hah.
Stephen E Arnold, May 14, 2015
 
	








