Machine Learning Resolves Enterprise Search
August 26, 2015
One of the main topics of discussion on Beyond Search is enterprise search. We always try to find the juicy details behind enterprise search’s development, groundbreaking endeavors, and problems that search experts need to be aware of. One thing we can all agree on is that enterprise search is full of problems. The question is will all of enterprise search’s problems ever be solved?
Ron Miller proposed a possible solution on TechTarget’s Search Content Management blog, “Will Machine Learning Revamp Enterprise Search Software?” Machine learning offers a bevy of solutions for many industries and what is very intriguing about the process is that we have yet to scratch the surface of its possible applications. Miller points out that machine learning should deliver more accurate and broader search results than the traditional search index.
Miller imagines this scenario:
“I think we’re going to see tools where the machine can automatically generate results, based on what the user is working on. The information could perhaps populate onto a split screen, suggesting additional information that could potentially be helpful for the user, and then apply machine learning to the user’s response.”
He suggests machine learning driven enterprise search will anticipate a user’s information need and even help shape their daily work routine. These are very feasible conjectures and machine learning has already shaped such industries as the medical field and engineering. The main item to ask is when will machine learning become inexpensive enough to implement in enterprise search?
Whitney Grace, August 26, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Data Lake Is a Hub: For Wheel I Tell You
August 25, 2015
When I read “Why Do I Need a Data Lake,” I thought about Mel Blanc. Mr. Blanc was a voice actor who enlivened the Jack Benny Show and Warner Bros. cartoons. For Mr. Benny, Mr. Blanc was the “sound” of the Maxwell automobile and the participant in the famous “Sí…Sy…sew…Sue” routine.
So what? I imagined Mr. Blanc reading aloud the write up to me as Daffy Duck.
Here’s a passage I highlighted and enjoyed:
The data lake has the potential to transform the business by providing a singular repository of all the organization’s data (structured AND unstructured data; internal AND external data) that enables your business analysts and data science team to mine all of organizational data that today is scattered across a multitude of operational systems, data warehouses, data marts and “spreadmarts”. [Emphasis in the original]
Note that the lake has “potential to transform”. I also like the categorical imperative of “all the organization’s data.” I find the “all” notion quite humorous because there are digital data which are not likely to be pooled and processed. One example is data governed by government contracts for which rules of secrecy apply. Another is digital information germane to a legal matter and in the control of the firm’s legal eagles. There are other examples as well. So the “all” is bobbing buoy. But what the heck is a spreadmart?
But the chortle inducing passage is the conversion of a data lake into a “hub and spoke service architecture.” That is quite a metaphorical shift.
Here’s another passage I highlighted:
the head of EMC Global Services Big Data Delivery team, termed this a “Hub and Spoke” analytics environment where the data lake is the “hub” that enables the data science teams to self-provision their own analytic sandboxes and facilitates the sharing of data, analytic tools and analytic best practices across the different parts of the organization.
I worked through the requisite list of dot points and then came upon a list of confusions for which I was prepared by the lake wheel juxtaposition. One confusion warrants some of my attention: “Create multiple data lakes.”
The idea is that an organization needs just “ONE [emphasis in original] data lake;
a singular repository where all of the organizations data – whether the organization knows what to do with that data or not – can be made available. Organizations such as EMC are leveraging technologies such as virtualization to ensure that a single data lake repository can scale out and meet the growing analytic needs of the different business units – all from a single data lake.
I can hear Daffy as vivified by Mr. Blanc saying, “Do me a big data favor and scold anyone who starts talking about data lakes (plural) instead of a data lake.”
Okay, scold.
EMC, as I understand the firm’s strategy, is contemplating this action: The company has considered selling itself to one of its subsidiaries.
There you go. An example of a hub and spoke, data lake type analysis applied to storage. Why do I need a data lake.
Stephen E Arnold, August 25, 2015
Enterprise Search: MarkLogic Cheerleader Is Surprised
August 25, 2015
Navigate to this link. You will need a LinkedIn account. Lucky you. Here’s the “comment” about a mid tier consulting firm’s magic whozit. The remark amused me:
It’s crazy to me that MarkLogic is not even on the list. All I can say is Gartner is making a mistake by forgetting it. I’m no expert on targeted marketing or how big the enterprise search market is vs the operational db market. But I know MarkLogic as a company is going after the operational db market instead. Yet almost all our customers deploy search applications. And I work for MarkLogic because after hundreds of ES [enterprise search] projects, MarkLogic was my favorite engine by far to install/use.
Well, crazy is as crazy does. My reaction to this comment is a question, “Isn’t MarkLogic an SGML database?” Even Oracle’s aged alternative can be searched, but the internals are, I hate to say it, a database. Bummer.
However, MarkLogic has some aspects which appear to lure mid tier wizards:
- MarkLogic is proprietary NoSQL. I think there are some open source NoSQL alternatives. Gartner’s experts seem to prefer proprietary solutions, not the community goodies.
- MarkLogic is getting long in the tooth. The company was founded in 2001, which based on my lousy math, is 14 years ago. Ah, technology does march on with the JSON thing, the Elastic gizmos, and an appetite for continued cash infusions. According to Crunchbase, MarkLogic has sucked in $176.6 million in funding with the most recent infusion coming in May 2015. I heard that a couple of years ago, MarkLogic was in the $6 million range. If that number was close to reality, the company has to get its dancing shoes on and win the international tango competition.
- MarkLogic “helped power the US government healthcare.gov site.” I remember reading something about that Web site. Any publicity is good publicity as the saying goes.
Is MarkLogic a unicorn or just another endangered species? Sorry. No answers in Harrod’s Creek. We just use open source software. Works okay. Can’t beat the price either.
Stephen E Arnold, August 25 2015
Elasticsearch is the Jack of All Trades at Goldman Sachs
August 25, 2015
The article titled Goldman Sachs Puts Elasticsearch to Work on Information Week discusses how programmers at Goldman Sachs are using Elasticsearch. Programmers there are working on applications to exploit both the data retrieval capabilities as well as the faculty it has for unstructured data. The article explains,
“Elasticsearch and its co-products — Logstash, Elastic’s server log data retrieval system, and Kibana, a dashboard reporting system — are written in Java and behave as core Java systems. This gives them an edge with enterprise developers who quickly recognize how to integrate them into applications. Logstash has plug-ins that draw data from the log files of 165 different information systems. It works natively with Elasticsearch and Kibana to feed them data for downstream analytics, said Elastic’s Jeff Yoshimura, global marketing leader.”
The article provides detailed examples of how Elastic is being used in legal, finance, and engineering departments within Goldman Sachs. For example, rather than hiring a “platoon of lawyers” to comb through Goldman’s legal contracts, a single software engineer was able to build a system that digitized everything and flagged contract documents that needed revision. With over 9,000 employees, Goldman currently has several thousand using Elasticsearch. The role of search has expanded, and it is important that companies recognize the many functions it can provide.
Chelsea Kerwin, August 25, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
What Might be Left Out of SharePoint 2016
August 25, 2015
When a new version of any major software is released, users get nervous as to whether their favorite features will continue to be supported or will be phased out. Deprecation is the process of phasing out certain components, and users are warily eyeing SharePoint Server 2016. Read all the details in the Search Content Management article, “Where Can We Expect Deprecation in SharePoint 2016?”
The article begins:
“New versions of Microsoft products always include a variety of additional tools and capabilities, but the flip side of updating software is that familiar features are retired or deprecated. We can expect some changes with SharePoint 2016.”
While Microsoft has yet to officially release the list of what will make the cut and what will be deprecated, they have made it known that InfoPath is being let go. To stay on top of future developments as they happen, stay tuned to ArnoldIT.com. Stephen E. Arnold has made a lifetime career out of all things search, and he lends his expertise to SharePoint on a dedicated feed. It is a great resource for SharePoint tips and tricks at a glance.
Emily Rae Aldridge, August 25, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Insights Into SharePoint 2013 Search
August 25, 2015
It has been awhile since we have discussed SharePoint 2013 and enterprise search. Upon reading “SharePoint 2013: Some Observations On Enterprise Search” from Steven Van de Craen’s Blog, we noticed some new insights into how users can locate information on the collaborative content platform.
The first item he brings our attention to is the “content source,” an out-of-the-box managed property option that create result sources that aggregate content from different content sources, i.e. different store houses on the SharePoint. Content source can become a crawled property. What happens is that meta elements from Web pages made on SharePoint can be added to crawled properties and can be made searchable content:
“After crawling this Web site with SharePoint 2013 Search it will create (if new) or use (if existing) a Crawled Property and store the content from the meta element. The Crawled Property can then be mapped to Managed Properties to return, filter or sort query results.”
Another useful option was mad possible by a user’s request: making it possible to add query string parameters to crawled properties. This allows more information to be displayed in the search index. Unfortunately this option is not available out-of-the-box and it has to be programmed using content enrichment.
Enterprise search on SharePoint 2013 still needs to be tweaked and fine-tuned, especially as users’ search demands become more complex. It makes us wonder when Microsoft will release the next SharePoint installment and if the next upgrade will resolve some of these issues or will it unleash a brand new slew of problems? We cannot wait for that can of worms.
Whitney Grace, August 25, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Recipes, Recipes. The Gray Lady Cooks
August 24, 2015
Years ago I heard a Googler talk about recipes. I did not think too much about recipes. At the time, I was good to go with a Mountain Dew and a bag of M&M peanuts. Zoom, zoom, zoom.
Not long ago I learned that IBM Watson, the money spinning wonder machine from the lads and lasses in Almaden, Armonk, and Manhattan, wrote a cook book. Get your copy of “Cognitive Cooking with Chef Watson: Recipes for Innovation from IBM & the Institute of Culinary Education” right now. Yummy.
Not to be out parboiled, the New York Times has, according to “The New York Times Makes 17,000 Tasty Recipes Available Online: Japanese, Italian, Thai & Much More,” has been busy in the kitchen too.
Here’s a passage I noted:
Have a look around, and you’ll see that the site also offers a number of useful functions for those who make a free account there, such as the ability to save the recipes you want to make later and a recommendation engine to give you suggestions as to what to make next. But still, even though sites like these guarantee that none of us will ever go hungry for lack of a recipe, we can only do as well by any of them as our actual, physical cooking skills allow.
Which cutting edge company will step forward with a kitchen robot able to let the annoying human go back to the couch and contemplate a potato? I suppose I could check out my supply of miso and soy. Nah, too much real work. I am going to nuke a burrito in the microwave and watch cartoons.
Stephen E Arnold, August 24, 2015
Alphabet Google: EU Spells Trouble
August 24, 2015
You will need a copy of the dead tree edition of the Wall Street Journal or one of those nifty for fee accounts. Navigate to “EU Deepens Antitrust Investigation into Google’s Practices.” Do not complain to me if the link is dead. Buy a newspaper. The practices of newspapers are above reproach—mostly.
The point of the write up is that the “bloc” (Cold Warish term, no?) wants information about Google’s advertising contract practices. Yikes. Actual contracts. I don’t recall getting a contract for the Adsense ads which grace this blog.
Anyway the “real” newspaper reported:
The European Commission, the bloc’s competition watchdog, has sent out questionnaires to companies requesting more detailed information into Google’s business practices in those areas, according to two documents seen by The Wall Street Journal. A Google spokeswoman declined to comment. The European Commission didn’t respond to a request for comment.
Well, without verification why question the accuracy of the report?
Shift gears to Alphabet. What can Alphabet Google spell with its new Scrabble letters? I could go for, “We use algorithms.” I also like, “Please, ask the new CEO.”
Stephen E Arnold, August 24, 2015
More Enterprise Search Revisionism: Omitted Companies Are the Major News
August 24, 2015
A flurry of news items hit my Overflight system in the last couple of days. Gartner, one of the expert for hire mid tier consulting firms, issued a “Gartner’s Magic Quadrant for Enterprise Search.” I am not sure if you can access the report. I had to log in to LinkedIn and work through various screens until this gem presented itself to me.
I followed the link and learned that the “Magic Quadrant for 2015” includes these firms:
The Challengers. To me a challenger means a person or thing that engages in any contest, as of skill, strength, etc.
- LucidWorks, founded in 2007
- Mindbreeze, a unit of Microsoft centric Fabasoft in Austria. The search unit fired up a decade ago
- Google, ah, dear old Google and its pricey Google Search Appliances. You can find the license fees for some devices via the GSAAdvantage service. Google has been sort of selling GSAs for a decade.
- Dassault Systems, yep, the French engineering outfit working to convert Exalead’s ageing technology into a product component solution. Exalead dates from 2000. Yikes, that makes the technology 15 years old, an aeon in technology time.
The good news is that LucidWorks has its roots in open source. The other three outfits are proprietary technology.
The second group is Niche Players. The companies in this sector are:
- Expert System. An outfit which opened its doors in 1992 and whose stock is publicly traded. The share price on August 23, 2015 was $2.13 a share
- Recommind, founded in year 2000, is a legal system whose technical approach often reminds me of Autonomy’s systems and methods. The firm was founded in 2000 and now, according to this story, has $70 million in revenue
- Squiz, which is, by golly, not an open source solution despite its origins in the 2001 P@noptic academic/research setting in Australia. Just try searching for that spelling “P@noptic.”
The third group is Visionaries which to me means “given to or characterized by fanciful, not presently workable, or unpractical ideas, views, or schemes.” The dictionary entry here also points out these clarifications: unreal, imaginary, idealistic, impractical, and unrealizable. Here are the search outfits in this category:
- BA Insight. This is an company founded in 2004. The founder raised some venture money and then found himself looking for his future elsewhere. In the presentations I have heard, BA Insight is [a] an enterprise search system replacement for whatever you have running, [b] a business intelligence system, [c] a metatagging machine, [d] some combination of these functions.,
- IBM. Ah, dear, old IBM. The company does the home grown thing with scripts and algorithms from its research labs. IBM was founded in It does the open source thing by building in 1911. The company has had a long time to figure out what to do since the STAIRS III and Web Fountain days. Now IBM search means use of open source, community supported, free Lucene. Plus, It does the acquisition thing with SPSS Clementine (remember than, gentle reader), Vivisimo, i2, and Cybertap, among other information access companies IBM has purchased. At the end of the day, I am not sure what search means because IBM has been promoting the heck out of Watson. You remember Watson. It was a TV game show winner. Watson wrote a cook book. Watson is curing cancer. Watson is doing all sorts of wonderful things. I suppose that’s why it is a visionary with 13 consecutive quarters of revenue decline.
- IHS (Information Handling Service. IHS leverages technology from The Invention Machine (founded in 1992) an acquisition built to locate systems and methods from patent documents. The IHS search system is called Goldfire and positioned as an enterprise search system. IHS, according to Attivio, licenses the Fast Search & Transfer influenced UIA technology platform. IHS for me is a publishing company, but I suppose that doesn’t matter in today’s fluid world.
The final group of search vendors is labeled leaders. So what’s a leader? According to my online dictionary, a leader is a person or thing that leads. And “lead” means to go before or with to show the way; conduct or escort. No, I will not refer to Ashley Madison, gentle reader. I will play this straight. The leaders are:
- Attivio, founded in 2007. It must be a leader because a “visionary” uses the Attivio technology to be a visionary. Is that self referential like articles about Google’s right to be forgotten which must be forgotten?
- Coveo, founded in 2004. This company has been, like Attivio, successful in attracting venture capital.The company once focused on Microsoft Windows as did BA Insight. Now the firm is into customer support but the mid tier consultants remember the good old days of enterprise search.
- Hewlett Packard. Ah, HP, the company wrote a check for $11 billion in 2011, promptly wrote off billions, and embarked on a much loved legal challenge to Dr. Michael Lynch and some other favored individuals. HP, like IBM, has been racking up declining revenues for five consecutive quarters and is in the process of dividing itself into two separate companies. Does this suggest that HP some challenges? Keep in mind Autonomy was founded in the mid 1990s.
- Lexmark. This is a relative newcomer to enterprise search. The company bought Brainware of trigram fame. Lexmark bought the 1980s search darling ISYS Search Software, which was founded in 1988. The company also snagged Kofax, which got into the content processing game with its acquisition of Kapow. I did hear that Lexmark is looking at some shortfalls related to search and content processing. I reported on the chopping of 500 jobs a couple of months ago. But leaders must expect some setbacks like Hewlett Packard. Perhaps Lexmark will reveal the shortfall from its “search related” endeavors. I would peg the number somewhere in the $75 to $80 million range in the last 18 months.
- Sinequa. This marketing centric, social media maven was founded in 2002. The company has some big European clients, but I am not certain that the push into the US has met with the “name in lights” success some French stakeholders expected. Sinequa is obviously a leader in search. I classify the company as a business process outfit, but the mid tier consultants are more informed than an old guy in rural Kentucky.
My view of the enterprise search sector is different.The companies in this list are oldies, a couple dating from the late 1980s and early 1990s. Let’s see. In Internet time, that pegs some technology as prehistoric.,
There is a notable omission too. The list of companies identified by the mid tier outfit has missed the company which has been driving a bulldozer through deals.
What company is that?
Elastic, gentle reader. This outfit is in the process of providing the folks at Goldman Sachs with some information access love. The company has shoved aside the Lucid Works outfit which is scrambling to reposition itself as a Big Data spark something. There are cloud versions of Elastic available for a darned reasonable price. Check out SearchBlox, for example. Keep in mind that Elasticsearch was a second act to Compass, another search system.
A question which I asked myself is, “Why has a mid tier outfit which is so darned expert in enterprise search overlooked the big dog?” Frankly I have no evidence other than the odd little grid in the Linked In post. I assume that the experts at the mid tier firms don’t know much about what’s happening in search. Another thought is that the Elastic folks don’t buy much third party expert input about search. Whatever the reason, I suggest you, gentle reader, become familiar with Elasticsearch in the free or for fee variant.
Another gap I noticed is the omission of the appliance folks. Right off the bat, I think Index Engines, Maxxcat, and Thunderstone deserve a tiny footnote. Maxxcat, for example, is pretty good in the enterprise content indexing arena. Buy a box and plug it in. Index Engines does a great job making some specialized content instantly accessible. And Thunderstone? Well, the company has some darned good technology.
A third lacuna is the omission of the wild and crazy, Fast Search & Transfer tinged SharePoint search. There are upwards of 150 million SharePoint installations. Like it or not, Microsoft also shoves search down my throat each time I use Windows 10. Yikes. The system may have a legacy of considerable interest, but the darned thing is out there. Maybe a teeny tiny footnote? I would suggest that the mid tier outfit identify the vendor which sells more search into Microsoft installations than any other vendor. Nope. I won’t identify this outfit. The president agreed to a Search Wizards Speak interview and then backed out. Too bad for him. No life preserver from me again.
What’s the value of this league table or grid thing from the mid tier consulting firm.
First, it allows the companies in the list to issue a news release. I have already seen references to some of the companies. This post was inspired by the junk mail Linked In shoots at me on a regular basis. There’s nothing like PR which gets a company’s name in front of a bunch of red hot prospects.
Second, the mid tier consulting firm can visit with each company. I can imagine that on those visits, the mid tier consulting firm might just mention the firm’s strategic and tactical for fee services. Hey, if I worked for a mid tier consulting firm, I would be sure to explain why retaining me was the best darned thing since sliced bread. Oh, wait. I worked at Booz, Allen & Hamilton before it drifted into Snowden drifts. I responded to requests; I don’t recall making sales calls. Life is different now I suppose.
Third, the mid tier reports practically force me to write blog posts. I am delighted to be spurred into action.
Fourth, how much does it cost to use these systems? Why not make a table which presents the name of the company, the search system name so that I know what IBM asserts actually performs enterprise search and what HP calls its cloud stuff with Autonomy made ever so easy? Why not states that such and such a search system begins at $X for the license fee and $Y for the on going support, upgrades, and maintenance? Why not present average hourly engineering and technical service fees? Hey, even the best of this animal shelter of disparate systems fail. Did I say crash? Did I say flame out? Did I say deliver irrelevant results? Well, often in my experience.
To wrap up, the Visionaries, the Challengers, the Leaders, and the Niche Players can output news releases. Some my try to dismiss my observations, which is just peachy keen with me. I assume that failed webmasters, thwarted academicians, and unemployed home economics majors will explain that the best of the best appear in the league table.
Present reality any way one wants. I don’t have to make this stuff work anymore. I don’t have to explain to the CFO why the costs associated with enterprise search will continue to go up until the system is removed from the company. I will no longer have to attend a conference filled with cheerleaders for a utilitarian technology which most companies have learned is pretty much the same as it has been since the days of Fulcrum and Verity.
Remember. This is 2015. Most of the technology presented in the mid tier report is getting old. The world wants mobile. The world wants predictive outputs. The world wants search which actually delivers relevant results.
Maybe that is secondary today?
Will I read the complete report if a copy becomes available to me?
Nah. Marketing stuff bores me.
Stephen E Arnold, August 24, 2015
The Integration of Elasticsearch and Sharepoint Adds Capabilities
August 24, 2015
The article on the IDM Blog titled BA Insight Brings Together Elasticsearch and Sharepoint describes yet another vendor embracing Elasticsearch and falling in love again with Sharepoint. The integration of Elasticsearch and Sharepoint enables customers to use Elasticsearch through Sharepoint portals. The integration also made BA Insight’s portfolio accessible through open source Elasticsearch as well as Logstash and Kibana, Elastic’s data retrieval and reporting systems, respectively. The article quotes the Director of Product Management at Elastic,
“BA Insight makes it possible for Elasticsearch and SharePoint to work seamlessly together…By enabling Elastic’s powerful real-time search and analytics capabilities in SharePoint, enterprises will be able to optimize how they use data within their applications and portals.” “Combining Elasticsearch and SharePoint opens up a world of exciting applications for our customers, ranging from geosearch and pattern search through search on machine data, data visualization, and low-latency search,” said Jeff Fried, CTO of BA Insight.”
Specific capabilities that the integration will enable include connectors to over fifty system, auto-classification, federation to improve the presentation of results within the Sharepoint framework, applications like Smart Previews and Matter Comparison. Users also have the ability to decide for themselves whether they want to use the Sharepoint search engine or Elastic’s, or combine them and put the results together into a set. Empowering users to make the best choice for their data is at the heart of the integration.
Chelsea Kerwin, August 24, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph