Jury Is Still Out on Microsoft Delve
June 11, 2015
Sometimes hailed as Pinterest for the enterprise, Microsoft Delve is a combination of search, social, and machine learning, which produces an information hub of sorts. Delve is also becoming a test subject, as enterprise experts decide whether such offerings intrude into users’ workflow, or enhance productivity. Read more in the Search Content Management article, “Microsoft Delve May Drive Demand for Office365.”
The article summarizes the issue:
“As Microsoft advances further in its mobile-first, cloud-first strategy, new offerings such as Microsoft Delve are piquing companies’ curiosity but also raising eyebrows. Many companies will have to gauge whether services like Delve can enhance worker productivity or run the risk of being overly intrusive.”
As SharePoint unveils more about its SharePoint Server 2016, more will become known about how it functions along with all of its parts, including Delve. It will be up to the users to determine how efficient the new offerings will be, and whether they help or hinder a regular workflow. Until the latest versions become available for public release, stay tuned to ArnoldIT.com for the latest news regarding SharePoint and how it may affect your organization. Stephen E. Arnold is a longtime leader in search and his work on SharePoint is a great go-to resource for users and managers alike.
Emily Rae Aldridge, June 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Enterprise Search Excitement: HP Autonomy Litigation News
June 10, 2015
It is not the weekend and there is some minor Hewlett Packard Autonomy litigation news. I read “HP to Pay $100 Million to Settle Case Tied to Autonomy Deal.” The write up reports that HP will pay $100 million to a settlement fund. In the words of the write up, the money is
to resolve a lawsuit stemming from an impairment charge HP took after paying $10 billion for the British company. The money will go to people who bought HP shares between Aug. 19, 2011 and Nov. 20, 2012.
According to write up, the litigation “has no merit.” If so, $100 million is a hefty chunk for something that is fairy dust.
One consequence of the HP Autonomy dust up is that enterprise search vendors are using some artful metaphors to describe their systems’ capabilities. With the Fast Search problem in Norway and the HP Autonomy issue in the US and the UK, enterprise search vendors have certain made me aware of the consequences of having a disagreement over business models and accounting.
Which enterprise search vendor is next in line to make headlines. I have heard that the Lexmark search push has resulted in some frowns and indigestion. Vivisimo has disappeared into the maelstrom of IBM and its software. Oracle is sending what I interpret as mixed signals about the benefits of its hat trick in search: Endeca, InQuira, and RightNow.
My view is that it is tough be a search vendor looking for traction using words like customer support and business intelligence. Worth watching the HP Autonomy imbroglio and the wordsmithing of vendors trying to sidestep the shockwave of high profile search vendor legal activities.
Stephen E Arnold, June 9, 2015
SharePoint: Enterprise Search Which Will Never Ever Let You Lose Anything Again
May 30, 2015
Bold assertion. I read “Why Using Microsoft SharePoint Will Improve Your Business Performance with a Simple Search Feature.” Memorable for several reasons:
- SharePoint has “amazing search capabilities.” (I mistakenly understood that the “new” SharePoint search was not yet available. Oh, well, I am in Harrod’s Creek, not a “nice venue in London.” Search is better when viewed from a “nice venue” I assume.
- I will never lose anything again. I assume, perhaps incorrectly, that the “anything” refers to a document I created and either parked intentionally or had parked for me by Microsoft’s “amazing” SharePoint. I note that the statement is a categorical, and then often present logical challenges to someone who asks, “Really? What’s the evidence you have to back up this wild and frisky claim?”
- I note that I can type a word or phrase to “surface every relevant document across all of the sites I have access to.” The author adds, “It’s brilliant.” Okay, got it, but I don’t believe it based on observation, our own hands on experiences, and the weed pile of third party vendors who insist their software actually makes SharePoint usable. I would list them, but you probably have these outfits’ burned into your memory.
What is interesting is that the focus of the write up seems to be Microsoft Dynamics GP. It is mentioned a couple of time. There are also references to Delve, another Microsoft search system.
Frankly I am not sure if the cheerleading for “brilliant” search is credible. We have worked on projects in organizations where SharePoint is the “pluming.” In a conference call last week, the client, a relatively large outfit in the Fortune 100, reported these “issues” with SharePoint:
- Users cannot locate documents created within 24 hours and written to the designated SharePoint device
- Documents in a results list do not include the version of the document for which the user searches
- Images of purchase orders for a company issued with a unique code cannot be retrieved
- Queries take more time than a Google query to complete
- The information about employees with specific expertise is not complete; that is, there will be no data about education or certain projects
- Collaboration is flakey
- The system crashes.
I could work through the list, but the point is that SharePoint is big business for those who get a job to maintain it and, in theory, make it work. SharePoint is the fertile field in which third party vendors plant applications to improve on what Microsoft offers. There are integrators who have specialized skills and want SharePoint to remain the money tree plantation the consultants have come to call home.
In short, what can one believe about Microsoft search? Delve into that.
Stephen E Arnold, May 30, 2015
Stephen E Arnold, June 2, 2015
X1 Search: A Unified Single Pane of Glass
May 26, 2015
I read “X1’s Microsoft Enterprise Search Strategy: Better Than Microsoft’s?”
Here’s the passage I noted:
Providing one single pane of glass to a business worker’s most critical information assets is key. Requiring end-users to search Outlook for email in one interface, then log into another to search SharePoint, and then another to search for document and OneDrive is a non-starter. A single interface to search for information, no matter where it lives fits the workflow that business workers require.
The write up points out that X1 starts with an “end user’s email and files.” That’s fine, but there are other data types to which an end user requires access.
My reaction was these questions and the answers thereto:
- What about video?
- What about drafts of financial data or patent applications and other content centric documents in perpetual draft form?
- What about images?
- What about third party content downloaded by a user to a local or shared drive?
- What about Excel files used as text documents and Excel documents with data and generic column names?
- What about versions?
- What about time and data flags versus the time and date information within a content object?
- What about SMS messages?
- What information is related to other information; for example, an offer of employment to a former employee?
- What about employee health, salary, and performance information?
- What about intercepted data from watched insiders using NGIA tools?
- What about geo-plotted results based on inputs from the organization’s tracking devices on delivery vans and similar geo systems?
My point is that SharePoint represents a huge market to search and content processing vendors. The generalizations about what a third party system can do boggle my mind. Vendors as a rule do not focus on the content issues my questions probe. There are good reasons for the emphasis on email and experiences. Tackling substantive findability issues within an organization is just not what most SharePoint search alternatives do.
Not surprisingly, for certain types of use cases, SharePoint search remains a bit of a challenge regardless what system is deployed into a somewhat chaotic sea of code, functions, and components.
A unified single pane of glass is redundant. Solutions to the challenges of SharePoint may deserve this type of remediation because vendors have been tilting at the SharePoint windmill in a highly repetitive way for more than a decade. And to what end? For many, SharePoint information access remains opaque, cloudy, and dark.
Stephen E Arnold, May 26, 2015
Welcome YottaSearch
May 26, 2015
There is another game player in the world of enterprise search: Yotta Data Technologies announced their newest product: “Yotta Data Technologies Announces Enterprise Search And Big Data Analytics Platform.” Yotta Data Technologies is known for its affordable and easy to use information management solutions. Yotta has increased its solutions by creating YottaSearch, a data analytics and search platform designed to be a data hub for organizations.
“YottaSearch brings together the most powerful and agile open source technologies available to enable today’s demanding users to easily collect data, search it, analyze it and create rich visualizations in real time. From social media and email for Information Governance and eDiscovery to web and network server logs for Information Technology Operations Analytics (ITOA), YottaSearch™ provides the Big Data Analytics for users to derive information intelligence that may be critical to a project, case, business unit or market.”
YottaSearch uses the popular SaaS model and offers users not only data analytics and search, but also knowledge management, information governance, eDiscovery, and IT operations analytics. Yotta decided to create YottaSearch to earn revenue from the burgeoning big data market, especially the enterprise search end.
The market is worth $1.7 billion, so Yotta has a lot of competition, but if they offer something different and better than their rivals they stand a chance to rise to the top.
Whitney Grace, May 26, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Search 2020: Peering into the Future of Information Access
May 22, 2015
The shift in search, user behaviors, and marketing are transforming bread-and-butter keyword search. Quite to my surprise, one of my two or three readers wrote to one of the goslings with a request. In a nutshell, the reader wanted my view of a write up which appeared in the TDWI online publication. TDWI, according to the Web site, is “your source for in depth education and research on all things data.” Okay, I can related to a categorical affirmative, education, research, and data.
The article has a title which tickles my poobah bone: “The Future of Search.” The poobah bone is the part of the anatomy which emits signals about the future. I look at a new search system based on Lucene and other open source technology. My poobah bone tingles. Lots of folks have poobah bones, but these constructs of nerves and tissues are most highly developed in entrepreneurs who invent new ways to locate information, venture capitalists who seek the next Google, and managers who are hired to convert information access into billions and billions of dollars in organic revenue.
The write up identifies three predictions about drivers on the information retrieval utility access road:
- Big Data
- Cloud infrastructure
- Analytics.
Nothing unfamiliar in these three items. Each shares a common characteristic: None has a definition which can be explained in a clear concise way. These are the coat hooks in the search marketers’ cloakroom. Arguments and sales pitches are placed on these hooks because each connotes a “new” way to perform certain enterprise computer processes.
But what about these drivers: Mobile access, just-in-time temporary/contract workers, short attention spans of many “workers”, video, images, and real time information requirements? Perhaps these are subsets of the Big Data, cloud, and analytics generalities, but maybe, just maybe, could these realities be depleted uranium warheads when it comes to information access?
These are the present. What is the future? Here’s a passage I highlighted:
Enterprise search in 2020 will work much differently than it does today. Apple’s Siri, IBM’s Watson, and Microsoft’s Cortana have shown the world how enterprise search and text analytics can combine to serve as a personal assistant. Enterprise search will continue to evolve from being your personal assistant to being your personal advisor.
How are these systems actually working in noisy automobiles or in the kitchen?
I know that the vendors I profiled in CyberOSINT: Next Generation Information Access are installing systems which perform this type of content processing. The problem is that search, as I point out in CyberOSINT, is that the function is, at best, a utility. The heavy lifting comes from collection, automated content processing, and various output options. One of the most promising is to deliver specific types of outputs to both humans and to other systems.
The future does tailor information to a person or to a unit. Organizations are composed of teams of teams, a concept now getting a bit more attention. The idea is not a new one. What is important is that next generation information access systems operate in a more nuanced manner than a list of results from a Lucene based search query.
The article veers into a interesting high school teacher type application of Microsoft’s spelling and grammar checker. The article suggests that the future of search will be to alert the system user his or her “tone” is inappropriate. Well, maybe. I turn off these inputs from software.
The future of search involves privacy issues which have to be “worked out.” No, privacy issues have been worked out via comprehensive, automated collection. The issue is how quickly organizations will make use of the features automated collection and real time processing deliver. Want to eliminate the risk of insider trading? Want to identify bad actors in an organization? One can, but this is not a search function. This is an NGIA function.
The write up touches on a few of the dozens of issues implicit in the emergence of next generation information access systems. But NGIA is not search. NGIA systems are a logical consequence of the failures of enterprise search. These failures are not addressed with generalizations. NGIA systems, while not perfect, move beyond the failures, disappointments, and constant legal hassles search vendors have created in the last 40 years.
My question, “What is taking so long?”
Stephen E Arnold, May 22, 2015
Yotta Search: A Full Service Solution
May 17, 2015
I spoke to a colleague who asked me about Yotta Search. I dug through my Overflight files and located a write up about the new enterprise search system from Yotta Data Technologies and a company called Yotta Customer Analytics. One Yotta is in Cleveland. The other is in Silicon Valley. Both are in the analytics game.
A “yotta” is a whole lotta data, the biggest unit of data. I wonder if the company has a comment on a set of yottas?
I checked my files for the company offering Yotta search, based in Cleveland, home of EPI Thunderstone, another enterprise search vendor. The company behind Yotta Search is Yotta Data Technologies.
According the firm’s Web site at www.yottadatatechnologies.com:
Yotta Data Technologies (YDT) is a technology company built on a foundation of deep industry experience and driven by a passion for innovative excellence. We provide data management and information governance solutions to corporations, firms and agencies, whether they be a small local firm or a multinational corporation with offices around the globe. Each of our platforms maintains the high levels of quality, performance and security that are critical within information governance initiatives and any data management project.
The search system appears to be based on open source technology if I understand this Web site information:
Yotta Search is a versatile enterprise search solution being developed by Yotta Data Technologies (YDT) for teams, small to medium sized businesses and large corporations. Yotta Search provides powerful, fast and flexible technology that is not only well beyond full text search, but also powers the search and analysis features of many of the world’s largest internet sites and data platforms.
The operative phrase is “being developed.” The company asserts capabilities in these functions:
- Business intelligence
- Discovery
- Information governance
- Virtual data rooms.
I noticed a news item called “Yotta Data Technologies Announces Enterprise Search and Big Data Analytics Platform.” If the information is correct, Yotta is no longer “being developed,” one can license the system. The url provided is www.yottasearch.info. The story describes the Yotta search system in this way.
YottaSearch is easy – and budget friendly – to implement with a cloud-based, Software-as-a-Service (SaaS) delivery model and a disruptive, subscription-based pricing model.
Key Functionality of the YottaSearch
- Data Point Connectors – Local, Network, Email, Enterprise Systems, Databases, Social Media
- File Crawlers – Detects & Parses over 1,000 file types
- File Indexer – Language Detection, Deduplication, Near Real Time, Distributed, Scalable
- Advanced Search Engines – Based on the high performance Apache Lucene library
- Data Analytics – Intelligent analysis of structured and unstructured data
- Dynamic Dashboards – Explore, analyze, navigate and define large volumes of complex data.
The system can be used for a number of applications, according to the write up:
- Enterprise Search and Analytics
- Information Governance
- IT Operations Analytics (ITOA)
- Investigations & eDiscovery
- Knowledge Management (KM)
- Internet of Things (IoT), Event & Log Data Analysis
Also, Yotta offers global data services and global electronic discovery services. The company’s tag line is “Information intelligence for corporations, firms, and agencies.”
Like I said, a lotta yottas and a robust line up of functionality which some more established search and content processing systems do not possess. Is Yotta competing with Elastic or is Yotta competing with the ABC vendors: Attivio, BA Insight, or Coveo? Worth watching.
Stephen E Arnold, May 17, 2015
Quote to Note: How to Make Search Relevant
May 16, 2015
Short honk: I read “Intranet Search? Sssh! Don’t Speak of It.” It seems that enterprise search is struggling and sweeping generalizations about information governance and knowledge management are not helping the situation. But that’s just my opinion.
But set that “issue” aside. Here’s the quote I noted:
The only way this situation [search is a problem’] will change is with intranet managers stepping up to the challenge and telling stories internally. The problem with search analytics (even if you do everything that Lou Rosenfeld [search wizard] recommends) is that there is no direct evidence of the day-to-day impact of search.
Will accountants respond to search stories? Why is there no direct evident of the day to day impact of search? Perhaps search, along with some other hoo hah endeavors, is simply not relevant in today’s business environment? Won’t more hyperbole filled marketing solve the problem? Another conference?
The wet blanket on enterprise search remains “there is no direct evidence of the day to day impact of search.” After 30 or 40 years of implementations and hundreds of millions in search development, why not? Er, what about this thought:
Search is a low value utility which has been over hyped.
Stephen E Arnold, May 17, 2015
Elasticsearch Transparent about Failed Jepsen Tests
May 11, 2015
The article on Aphyr titled Call Me Maybe: Elasticsearch 1.5.0 demonstrates the ongoing tendency for Elasticsearch to lose data during network partitions. The author goes through several scenarios and found that users can lose documents if nodes crash, a primary pauses, a network partitions into two intersecting components or into two discrete components. The article explains,
“My recommendations for Elasticsearch users are unchanged: store your data in a database with better safety guarantees, and continuously upsert every document from that database into Elasticsearch. If your search engine is missing a few documents for a day, it’s not a big deal; they’ll be reinserted on the next run and appear in subsequent searches. Not using Elasticsearch as a system of record also insulates you from having to worry about ES downtime during elections.”
The article praises Elasticsearch for their internal approach to documenting the problems, and especially the page they opened in September going into detail on resiliency. The page clarifies the question among users as to what it meant that the ticket closed. The page states pretty clearly that ES failed their Jepsen tests. The article exhorts other vendors to follow a similar regimen of supplying such information to users.
Chelsea Kerwin, May 11, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Semantic Search: The View from a Taxonomy Consultant
May 9, 2015
My team and I are working on a new project. With our Overflight system, we have an archive of memorable and not so memorable factoids about search and content processing. One of the goslings who was actually working yesterday asked me, “Do you recall this presentation?”
The presentation was “Implementing Semantic Search in the Enterprise,” created in 2009, which works out to six years ago. I did not recall the presentation. But the title evoked an image in my mind like this:
I asked, “How is this germane to our present project?’
The reply the gosling quacked was, “Semantic search means taxonomy.” The gosling enjoined me to examine this impressive looking diagram:
Okay.
I don’t want a document. I don’t want formatted content. I don’t want unformatted content. I want on point results I can use. To illustrate the gap between dumping a document on my lap and presenting some useful, look at this visualization from Geofeedia:
The idea is that a person can draw a shape on a map, see the real time content flowing via mobile devices, and look at a particular object. There are search tools and other utilities. The user of this Geofeedia technology examines information in a manner that does not produce a document to read. Sure, a user can read a tweet, but the focus is on understanding information, regardless of type, in a particular context in real time. There is a classification system operating in the plumbing of this system, but the key point is the functionality, not the fact that a consulting firm specializing in taxonomies is making a taxonomy the Alpha and the Omega of an information access system.
The deck starts with the premise that semantic search pivots on a taxonomy. The idea is that a “categorization scheme” makes it possible to index a document even though the words in the document may be the words in the taxonomy.
For me, the slide deck’s argument was off kilter. The mixing up of a term list and semantic search is the evidence of a Rube Goldberg approach to a quite important task: Accessing needed information in a useful, actionable way. Frankly, I think that dumping buzzwords into slide decks creates more confusion when focus and accuracy are essential.
At lunch the goslings and I flipped through the PowerPoint deck which is available via LinkedIn Slideshare. You may have to register to view the PowerPoint deck. I am never clear about what is viewable, what’s downloadable, and what’s on Slideshare. LinkedIn has its real estate, publishing, and personnel businesses to which to attend, so search and retrieval is obviously not a priority. The entire experience was superficially amusing but on a more profound level quite disturbing. No wonder enterprise search implementations careen in a swamp of cost overruns and angry users.
Now creating taxonomies or what I call controlled term lists can a darned exciting process. If one goes the human route, there are discussions about what term maps to what word or phrase. Think buzz group and discussion group and online collaboration. What terms go with what other terms. In the good old days, these term lists were crafted by subject matter and indexing specialists. For example, the guts of the ABI/INFORM classification coding terms originated in the 1981-1982 period and was the product of more than 14 individuals, one advisor (the now deceased Betty Eddison), and the begrudging assistance of the Courier Journal’s information technology department which performed analyses of the index terms and key words in the ABI/INFORM database. The classification system was reasonably, and it was licensed by the Royal Bank of Canada, IBM, and some other savvy outfits for their own indexing projects.
As you might know, investing two years in human and some machine inputs was an expensive proposition. It was the initial step in the reindexing of the ABI/INFORM database, which at the time was one of the go to sources of high value business and management information culled from more than 800 publications worldwide.
The only problem I have with the slide deck’s making a taxonomy a key concept is that one cannot craft a taxonomy without knowing what one is indexing. For example, you have a flow of content through and into an organization. In a business engaged in the manufacture of laboratory equipment, there will be a wide range of information. There will be unstructured information like Word documents prepared by wild eyed marketing associates. There will be legal documents artfully copied and pasted together from boiler plate. There will be images of the products themselves. There will be databases containing the names of customers, prospects, suppliers, and consultants. There will be information that employees download from the Internet or tote into the organization on a storage device.
The key concept of a taxonomy has to be anchored in reality, not an external term list like those which used to be provided by Oracle for certain vertical markets. In short, the time and cost of processing these items of information so that confidentiality is not breached is likely to make the organization’s accountant sit up and take notice.
Today many vendors assert that their systems can intelligently, automatically, and rapidly develop a taxonomy for an organization. I suggest you read the fine print. Even the whizziest taxonomy generator is going to require some baby sitting. To get a sense of what is required, track down an experienced licensee of the Autonomy IDOL system. There is a training period which requires a cohesive corpus of representative source material. Sorry, no images or videos accepted but the existing image and video metadata can be processed. Once the system is trained, then it is run against a test set of content. The results are examined by a human who knows what he or she is doing, and then the system is tuned. After the smart system runs for a few days, the human inspects and calibrates. The idea is that as content flows through the system and periodic tweaks are made, the system becomes smarter. In reality, indexing drift creeps in. In effect, the smart software never strays too far from the human subject matter experts riding herd on algorithms.
The problem exists even when there is a relatively stable core of technical terminology. The content of a lab gear manufacturer is many times greater than the problem of a company focusing on a specific branch of engineering, science, technology, or medicine. Indexing Halliburton nuclear energy information is trivial when compared to indexing more generalized business content like that found in ABI/INFORM or the typical services organization today.
I agree that a controlled term list is important. One cannot easily resolve entities unless there is a combination of automated processes and look up lists. An example is figuring out if a reference to I.B.M., Big Blue, or Armonk is a reference to the much loved marketers of Watson. Now handle a transliterated name like Anwar al-Awlaki and its variants. This type of indexing is quite important. Get it wrong and one cannot find information germane to a query. When one is investigating aliases used by bad actors, an error can become a bad day for some folks.
The remainder of the slide deck rides the taxonomy pony into the sunset. When one looks at the information created 72 months ago, it is easy for me to understand why enterprise search and content processing has become a “oh, my goodness” problem in many organizations. I think that a mid sized company would grind to a halt if it needed a controlled vocabulary which matched today’s content flows.
My take away from the slide deck is easy to summarize: The lesson is that putting the cart before the horse won’t get enterprise where it must go to retain credibility and deliver utility.
Stephen E Arnold, May 9, 2015