IBM and Its Federated Search Camelot
July 25, 2015
Short honk: I scanned my Twitter feed this morning. What did I see? An impossible assertion from the marketing crazed folks at IBM Watson. Let me tell you, IBM Watson and its minions output a hefty flow of tweets. A year or so ago, IBM relied on mid tier consulting firms experts like Dave Schubmehl (yep, the fellow who sold my research on Amazon without my permission). Now there are other voices.
But the message, not just the medium, are important. IBM’s assertion is that there will be no more “data silos in enterprise search.” You can learn about IBM’s “reality” in a webcast.
Now, I am not planning on sitting through a webcast. I would, however, like to enumerate several learnings from my decades of enterprise information access work. You can use this list as a jump start for your questions to the IBM wizards. Here goes:
- In an enterprise, what happens when an indexing system makes available in a federated search system information to a legal matter which is not supposed to be available to anyone except the attorneys involved in the matter?
- In an enterprise, what happens if information pertinent to a classified government project is made available in a federated search system which has not be audited for access control compliance?
- What happens when personnel information containing data about a medical issue is indexed and made available in an enterprise search system when email attachments are automatically indexed?
- How does the federated system deal with content in servers located in a research facility engaged in new product research?
- What happens when sales and pricing data shared among key account executives is indexed and made available to a contractor advising the company?
- What is the process for removing pointers to data which are not supposed to be in the enterprise search system?
- What security measures are in place to ensure that a lost or stolen mobile device does not have access to an enterprise search system?
- How much manual work is required before an organization turns on the Watson indexing system?
These will get you started on the cross silo issues?
Oh, the answer to these questions is that the person identified as responsible for making the data available may get to find a future elsewhere. Amazon warehouses are hiring in southern Indiana.
Alternatively one can saddle up a white stallion, snag a lance, and head for the nearest windmill.
Stephen E Arnold, July 25, 2015
SharePoint 2013 Enterprise Search Configuration
July 25, 2015
In just 14 easy steps, you too can configure “SharePoint 2013 for a SharePoint 2013” site. Now this is not enterprise search, but when it comes to Microsoft and information access, trivialities just don’t matter.
The screenshots show what options to select. There is no explanation in Step 4 for what to do if you click “Basic Search Center” instead of “Enterprise Search Center.” A real MSFT lover will know the difference between “basic” and “enterprise” for a SharePoint site.
Follow the clicks to Step 9. Note that under the category search one selects “Search Settings”, not “Search and offline availability.” Again the clarity is astounding.
Cut and paste your way to Step 13 where you configure search navigation. Just click “everything” and presumably the URL, the description, and the link will be locked and loaded. And if not? Well, there will be no errors, gentle reader.
The coup de grace is Step 14. Here’s the instruction which is crystal clear:
Just go and check “Use the same results page settings as my parent” is selected from the subsite search site settings.”
You are good to go—directly to a consulting firm specializing in installing a third party search system into your SharePoint solution. Sorry, but that approach usually works. The Fast Search thing from the mid 1990s? Not exactly flawless in my experience. Configuration files are still nestled deep in the innards but the graphical interface may not get you where you need to be.
Stephen E Arnold, July 25, 2015
Big Data: Slow Down, Think
July 25, 2015
i read “Contradictions of Big Data.” Few articles which I see take a common sense approach to Big Data baloney. (Azure chip consultants bristle at my use of baloney. Too bad.) I liked this article.
The article appeared in my Overflight a day ago even though the write up was posted in March 2015. Big Data does not mean rapid data.
I highlighted this passage:
have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.
I do not agree. The yap about Big Data has almost overpowered the craziness of search engine optimization’s shouting about semantic search.
The write up points out:
Take it from me [Martyn Jones] , most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.
The write up points out the silliness of velocity and several other slices of marketing baloney. (Make a sandwich, please.)
I found this paragraph insightful:
I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research. If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.
Harsh words for those who combine an undergraduate degree minor in math with Twitter and come up with data scientist.
Hopefully other will pick up this practical approach to the sliced and processed meat wrapped in plastic and branded Big Data.
Stephen E Arnold, July 25, 2015
Palantir Sucks in More Dinero
July 24, 2015
I am all for keeping the companies involved with law enforcement and intelligence entities out of the public eye. The hoo hah about Hacking Team is a grim reminder of what happened to Gamma Group and FinFisher when information about their services and products hit the “real” journalists’ radar.
I want to point you to “Confirmed. Palantir Raise a Huge $450 Million Investment.” The write up points out:
This [more cash investments] confirms a report last month that the company was raising up to $500 million at a valuation of $20 billion – making it the third most valuable “startup” on the Valley scene. (If you can call a 16-year-old company that reportedly generates millions in revenue a “startup.”)
Palantir is a unicorn wearing an invisibility saddle, tack, and saddle blanket. That’s okay with me. My observation is that Palantir has technology which is intended to prevent untoward acts. Are these untoward acts being prevented? I will let you answer that question.
I have no comment on whether the Palantir technology works. Even court documents related to Palantir’s dust up with i2 Group Ltd (a former client of mine) are not public. Why would i2, the pioneer in Palantir’s software segment, get involved with legal eagles?
Perhaps someone will have an answer some day. For now, I will ignore the partially invisible unicorn. The company has plenty of stakeholders who are trying to figure out Palantir so my efforts are redundant.
Stephen E Arnold, July 24, 2015
Real Journalists and Presstitution
July 24, 2015
I read and enjoyed an article for one word: “presstitute.” You can see the word in context in “Are Media Companies One Native Ad Away from Becoming Presstitutes.” Perhaps the word “native” is not clear? Inclusions, inserts, or paid advertorials will make the meaning of native clear.
The idea is that “real” journalists were before the eye opening days of yellow journalism were objective. Messrs. Pulitzer and Hearst were like Mark Zuckerberg and Larry Page more than a century ago.
Flash forward to the present and the “real” journalists are struggling to make their well honed business model work in a world of iPhones and Instagram.
Read the original essay. You get some dancing around the May pole, but the article is significant because of the word “presstitute” in my opinion. That’s a business model with legs. No comment about whether the legs are comely, hirsute, appropriate, or inappropriate from me, however.
Stephen E Arnold, July 214, 2015
Yahoo: A Return to Web Search?
July 24, 2015
I have only a hazy recollection of a conversation with Dave Filo, one of the founders of Yahoo. That was a long time ago. Chris Kitze and I had started The Point, which was a curated list of G-rated Web sites. The telephone call was to discuss what we were doing and what Yahoo was doing. We were doing essentially the same thing, which was okay. We aimed at doing the Good Housekeeping Seal of Approval thing with our Top 5% of the Internet. The Yahooligans were creating a general directory of Internet sites. Our approaches were complementary. We sold to Lycos (CMGI) and Yahoo did its Yahoo thing until today.
I thought about the manually assembled Web directory and the look at the listings approach of Yahoo. We had a lousy search engine along with categories for the Point. I never thought of Yahoo as being a Web search engine. That came later when Yahoo experimented, licensed, bought Inktomi, and ended up with a deal to get a Web search thing from Microsoft.
Imagine how the headline “Yahoo Wants to Return to Its Roots as a Search Engine” created some associative dissonance for me. Yahoo was a list. A manually constructed list of links. Yahoo was a directory first. Search came later and, in my opinion, never arrived. The write up states:
Yahoo wants to be a search giant once more.
Even the azure chip consultants are struggling with this Xoogler vision. I highlighted this gem from the ground level of consulting insight:
However, Gartner analyst Mike McGuire tells Quartz he thinks Yahoo’s renewed focus on search is “a bit quixotic,” questioning its ability to execute and capture market share.
Okay. Yahoo is a weird 1990s thing which is, I suppose, the last portal standing. Search is a bridge too far for many companies. Maybe that’s why there are just a couple of Web search engines that get the bulk of the traffic and an information highway with some smaller outfits which the high speed drivers zoom right by. When was the last time you stopped at Qwant.com or Unbubble.eu?
I understand the enthusiasm for writing something, anything, that seems new and fresh. But Yahoo does not have roots in search. Consequently it, like many other companies, has disappointed with its approach to information access. Nevertheless, the article goes its merry way just like Yahoo. Sympathetic harmonics at work.
Stephen E Arnold, July 24, 2015
Web Sites Going The Way Of The Dodo
July 24, 2015
Apps are supposed to replace Web sites, but there is a holdup for universal adoption. Search Engine Watch explains why Web sites are still hanging tight and how a new Google acquisition might be a game changer: “The Final Hurdle Is Cleared-Apps Will Replace Web Sites.” The article explains that people are “co-users” of both apps and classic Web sites, but online browsers are still popular. Why is that?
Browsers are universal and can access any content with a Web address. Most Web sites also do not have an app counterpart, so the only way to access content is to use the old-fashioned browser. Another issue is that apps cannot be crawled by search engines, so they are left out of search results. The biggest pitfall for apps is that they have to be downloaded in order to be accessed, which takes up screen space and disk space.
A startup has created a solution to making apps work faster:
“Agawi has developed a technology to stream apps, just like Netflix streams videos. Instead of packaging the entire app into a single, large file for the user to download, the app is broken up into many small files, letting users interact with small portions of the app while the rest of it is downloading. In the short term, it appears that Google wants to deploy Agawi for users try an app before downloading the full version.”
Google acquired Agawi, but do not expect it to be accessible soon. Google enjoys putting its own seal of approval on all acquisitions and making sure it works well. Mobile device usage is increasing and more users are moving towards using them over traditional computers. Search marketers will need to be more aware than ever about how search engines work with apps and encourage clients to make an app.
Whitney Grace, July 24, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Plethora of Image Information
July 24, 2015
Humans are visual creatures and they learn and absorb information better when pictures accompany it. In recent years, the graphic novel medium has gained popularity amongst all demographics. The amount of information a picture can communicate is astounding, but unless it is looked for it can be hard to find. It also cannot be searched by a search engine…or can it? Synaptica is in the process of developing the “OASIS Deep Image Indexing Using Linked Data,”
OASIS is an acronym for Open Annotation Semantic Imaging System, an application that unlocks image content by giving users the ability to examine an image closer than before and highlighting data points. OASIS is linked data application that enables parts of the image to be identified as linked data URIS, which can then be semantically indexed to controlled vocabulary lists. It builds an interactive map of an image with its features and conceptual ideas.
“With OASIS you will be able to pan-and-zoom effortlessly through high definition images and see points of interest highlight dynamically in response to your interaction. Points of interest will be presented along with contextual links to associated images, concepts, documents and external Linked Data resources. Faceted discovery tools allow users to search and browse annotations and concepts and click through to view related images or specific features within an image. OASIS enhances the ability to communicate information with impactful visual + audio + textual complements.”
OASIS is advertised as a discovery and interactive tool that gives users the chance to fully engage with an image. It can be applied to any field or industry, which might mean the difference between success and failure. People want to fully immerse themselves in their data or images these days. Being able to do so on a much richer scale is the future.
Whitney Grace, July 24, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
One Million Minutes of Unfindable Video
July 23, 2015
I read “AP Makes One Million Minutes of Historical Footage Available on YouTube.” This struck me as an anomaly. The AP is an outfit which, as I recall, rattled sabers and showed knives to people who quote from their articles. Also, the AP is in a revenue hunt; that is, the good old days of newspapers are history. The company is, like many outfits sired in the stable of dead tree journalism, adapting. Need a real time news feed with search, the AP offers this via a tie up with a former Bell Labs’ person. I will wager $1.00 in pennies that you cannot name the vendor? Send your answers to benkent2020 at yahoo dot com.
The AP write up reports that lots of video has been digitized and placed on YouTube. There are links to videos which AP finds interesting. The word “find” brings up an interesting question: “How does one locate a video?” and “How does one locate a series of related videos?” and “How does one find a video with a specific segment of text in it?” and “How does one find a video with a specific image in it?”
The answer, gentle reader, is that one cannot. I know that AP is excited about this collection. I assume that Google is pleased that the collection is not on Facebook.
As a user, the approach to locating a video is somewhat unsatisfying. Prepare your patient self to guess keywords, click, and watch in serial fashion one million videos. Well, maybe a couple.
Without search, this collection, like Google’s Life Magazine images, is useful to folks with time on their hands and even more time on their hands. A dump is not useful to me. To you, gentle reader, and to the executives at AP, I am picking nits. The problem is that these nits are the size of the synthetic creatures in Jurassic World. Big nits. My hunch is that the ad revenue from these videos will be the size of regular, run of the mill nits. I hope I am wrong. Don’t forget to submit the name of the AP’s real time, online news intelligence service. I will accept entries for 24 hours.
Stephen E Arnold, July 23, 2015
Lucidworks (Really?) Does Fusion Too
July 23, 2015
I read “Lucidworks Delivers Fusion 2.0 with Spark Integration.” The idea is that search is not exactly flying off the shelves. Why not download Elasticsearch and move on? The way to make search relevant is to make it a Big Data thing. This is the hard to believe path IBM took with Vivisimo’s technology. Where is Vivisimo in the IBM revenue picture? Well, that picture seems gloomy. Maybe the Big Data thing doesn’t work particularly well.
In terms of venture backed Lucidworks, the write up explains:
Fusion 2.0 provides an organization with access to a streamlined, consumer-like search experience with enterprise-grade speed and scalability. The new release integrates Lucidworks’ Fusion with Apache Spark to enable real-time data analytics. Fusion 2.0 also features a new version of the company’s SiLK user interface (UI) that simplifies dashboard visualizations and enhances the user experience. The SiLK UI runs on top of Fusion and the Apache Solr search platform, upon which Fusion is based. SiLK gives users the power to perform ad-hoc search and analysis of massive amounts of multi-structured and time series data. Users can swiftly transform their findings into visualizations and dashboards.
I think I understand. Wrappers of software provide more developer-friendly tools. The may be one slight hitch in the git along. Those familiar with the technology of open source and fluent in the mumbo jumbo jargon that Lucid and other repositioning enterprise search vendors employ may not comprise a giant pool of prospects.
In short, writing wrappers is hard work. Dealing with fusion in an effective manner is harder work. Eliminating the latency that accompanies layers and handoffs is the hardest work of all.
The challenge will be generating substantial organic revenue and having enough profit to satisfy the investors which have been very patient with the Lucidworks outfit. No, really.
Stephen E Arnold, July 23, 2015