IBM and Algorithmic Bias

January 25, 2018

I read “Unexplainable Algos? Get Off the Market, Says IBM Chief Ginni Rometty.” The idea is in line with Weapons of Math Destruction and the apparent interest in “black box” solutions. If you are old enough, you will remember the Autonomy IDOL system. It featured a “black box” which licensees used without the ability to alter how the system operated. You may also recall that the first Google Search Appliances locked users out as well. One installed the GSA and it just worked—at least, in theory.

This article includes information derived from the IBM content output for the World Economic Forum where it helps to have one’s own helicopter for transportation.

I noted this statement:

“When it comes to the new capabilities of artificial intelligence, we must be transparent about when and how it is being applied and about who trained it, with what data, and how,” the IBM chairman, president and CEO wrote.

I don’t want to be too picky but IBM owns the i2 Analyst Notebook system. If you are not familiar with this platform, it provides law enforcement and intelligence professionals with tools to organize, analyze, and marshal information for an investigation. As a former consultant to i2, I am not sure if the plumbing developed by i2 is public. In fact, IBM and Palantir jousted in court when IBM sued Palantir for improper use of its intellectual property; that is a fancy way of saying, “Palantir engineers tried to figure out how i2 worked.” The case settled out of court and many of the documents are sealed because no one party to the case wanted certain information exposed to bright sunlight.

IBM operates a number of cybersecurity services. One of these has the ability to intercept a voice call and map that call to email and other types of communications. The last time I received some information about this service I had to sign a bundle of documents. The idea, of course, is that much of the technology was, from my point of view, a “black box.”

So what?

The statement by IBM’s CEO is important because it is, in my opi9nion, hand waving. IBM deals in systems which are neither fully understood by some of the IBM experts selling these solutions, and some of the engineers who may know more about the inner working of secret or confidential systems and methods are not talking. An expert knows stuff others do not; therefore, why talk and devalue one’s expertise.

To sum up, talk about making math centric systems and procedures transparent is just noise. The number of people who can explain how systems which emerged from Cambridge University like Autonomy’s Neurolinguistic System or i2’s Analyst Notebook are in short supply.

How can one who does not understand explain how a complex system works. Black boxes exist to keep those which thumbs for fingers from breaking what works.

Talk doesn’t do much to deal with the algorithmic basics:

  1. Some mathematical procedures in wide use are not easily explained or reverse engineered; hence, the IBM charge that Palantir tried a short cut through the words to the cookie jar
  2. Most next generation systems are built on a handful of algorithms. I have identified 10 which I explain in my lectures about the flaws embedded in “smart” systems. Each of the most widely used algorithms can be manipulated in a number of ways. Some require humans to fiddle; other fiddle when receiving inputs from other systems.
  3. Explainable systems are based on rules. By definition, one assumes the rules work as the authors intended. News flash. Rule based systems can behave in unpredictable, often inexplicable ways. A fun example is for you, gentle reader, to try and get the default numbering system in Microsoft Word to perform consistently with regard to left justification of numbered lists.
  4. Chain a series of algorithms together in a work flow. Add real time data to update thresholds. Watch the outputs. Now explain what happened. Good luck with that.

I love IBM. Always marketing.

Stephen E Arnold, January 25, 2018

One of Big Datas Giants Accused of Big Time Fraud

January 15, 2018

Palantir, one of the biggest names in big data has been praised for its innovative solutions since it began 2004. However, it has been getting attention for all the wrong reasons lately, as we saw in a recent Deal Street Asia story, “Palantir Holder Says Company Sabotaged Stock Sale to Chinese.”

One of Palantir Technologies Inc.’s early investors accused the data-mining startup of sabotaging his attempt to sell his $60 million stakes to a Chinese company so directors and executives could enrich themselves by selling their stock instead.

Marc Abramowitz, a 63-year-old lawyer and investor, contends that when Palantir executives got wind of his offer to sell his stock to Chinese private equity firm CDH Investments Fund Management Co., they sunk the deal by offering to sell their shares to CDH instead, according to a lawsuit filed Thursday in Delaware. Palantir’s campaign to spoil Abramowitz’s sale demonstrates the Silicon Valley company’s “willingness to intentionally interfere with shareholder transactions in an effort…’

It may be tough to prove this in court, however. Palantir is famous for its secrecy, though that may become a thing of the past when they go public. Either way, this is an interesting look at the cutthroat world of big data and the potential things people do to stay on top.

Patrick Roland, January 15, 2018

Data Analysis Startup Primer Already Well-Positioned

December 22, 2017

A new startup believes it has something unique to add to the AI data-processing scene, we learn from VentureBeat’s article, “Primer Uses AI to Understand and Summarize Mountains of Text.” The company’s software automatically summarizes (what it considers to be) the most important information from huge collections of documents. Filters then allow users to drill into the analyzed data. Of course, the goal is to reduce or eliminate the need for human analysts to produce such a report; whether Primer can soar where others have fallen short on this tricky task remains to be seen. Reporter Blair Hanley Frank observes:

Primer isn’t the first company to offer a natural language understanding tool, but the company’s strength comes from its ability to collate a massive number of documents with seemingly minimal human intervention and to deliver a single, easily navigable report that includes human-readable summaries of content. It’s this combination of scale and human readability that could give the company an edge over larger tech powerhouses like Google or Palantir. In addition, the company’s product can run inside private data centers, something that’s critical for dealing with classified information or working with customers who don’t want to lock themselves into a particular cloud provider.

Primer is sitting pretty with $14.7 million in funding (from the likes of Data Collective, In-Q-Tel, Lux Capital, and Amplify Partners) and, perhaps more importantly, a contract with In-Q-Tel that connects them with the U.S. Intelligence community. We’re told the software is being used by several agencies, but that Primer knows not which ones. On the commercial side, retail giant Walmart is now a customer. Primer emphasizes they are working to enable more complex reports, like automatically generated maps that pinpoint locations of important events. The company is based in San Francisco and is hiring for several prominent positions as of this writing.

Cynthia Murrell, December 22, 2017

Bye-Bye Silicon Valley Monopoly

December 14, 2017

Silicon Valley is a technology epicenter and used to be synonymous with modern innovation, but that is no longer the case.  CNBC reports that, “Billionaire Investor Peter Thiel: Silicon Valley’s Monopoly On Big Growth Tech Companies Is Over.”   Peter Thiel is a famous Silicon Valley investor.  He helped launch PayPal, was an early investor in Facebook and Airbnb, and he also launched Palantir Technologies.  As one of the top Silicon Valley insiders, he said that:

‘I have been investing in the technology space — entrepreneur and investor over the past 20 years in Silicon Valley — and within the area of IT, it has for the last 10, 15 years in the US and the world been extremely centered on Silicon Valley,’ Thiel says, speaking at the Future Investment Initiative in Riyadh, Saudi Arabia, Thursday.  ‘I think there are a lot of reasons for that, but the question is, ‘Where is the growth going to happen the next 10 years?’ And what I would tend to think is that it will be more diversified from just Silicon Valley.’

Thiel continued that technology startups can be built anywhere, you just need the right people, money, and the right governance structures.  He was surprised that so many technology businesses popped up in Silicon Valley, but that happened because of the number of mentors and entrepreneurship concentrated in one area.  Innovators went where the action was happening.  It is similar to how actors go to Hollywood and writers head to New York City.

Thanks to Silicon Valley, technology has changed the world, so the next venture company can be located anywhere.  Take a guess about where the next big technology might be or if it will be spread out along the grid.

Whitney Grace, December 14, 2017

Business Intelligence: A List of 238 Firms

November 30, 2017

Need a list of “fermium” business intelligence tools. That’s no typo. That is the word on page 2 of Top Business intelligence Solutions. Looking past the misspelling, the write up from Predictive Analytics Today presents a listing in no particular order of more than 200 business intelligence tools. The text is accompanied by little boxes with scores in them like this:

image

The list was a lot of work. The names of companies are collected in these major categories:

  1. Free cloud business intelligence solutions
  2. Free open source business intelligence tools
  3. Free proprietary business intelligence tools
  4. Open source commercial business intelligence tools
  5. Top business intelligence companies
  6. Free extract, transform and load software
  7. Top extract, transform and load software
  8. Cloud SaaS on demand business intelligence solutions
  9. Freemium cloud business intelligence solutions
  10. Open source balanced scorecard software
  11. Top balanced scorecard software
  12. Open source and free dashboard software
  13. Top dashboard software
  14. Embedded business software
  15. Open source and free unified modeling language tools
  16. Open source and free business process management tools

What I found interesting about the list was:

  • For fee vendors appear in “free” categories; for example, IBM Watson and Microsoft
  • Many of the vendors have versions of their software for the intelligence and law enforcement community. Most of these versions of the companies with specialized tools are not free
  • None of the specialist firms which I track appear on the list; for example, BAE Systems, a company whose tools rival those of many of the other firms on the list.
  • The vendor Attivio was left out. This surprised me because Attivio pitches itself as a business intelligence solution and it has a tie up with Tibco, a product dependent in part on software created by the founders of Recorded Future, a company which I track because it has robust intelligence capabilities embodied in its products and services.
  • There are curious omissions. One important one is Palantir, whose Gotham product powers a number of commercial business intelligence applications like those from Thomson Reuters’ financial product line.
  • Many vendors appear in multiple categories. This left me confused. For major vendors it would have been helpful to provide the company name “IBM” with a summary of what the company offers as free, freemium, open source, proprietary, etc.

Nevertheless, the listing is interesting for those wanting to track some of the vendors pursuing the business intelligence sector. To learn about companies not on the Predictive Analytics’ list, follow DarkCyber, my weekly video program. Each week, I profile intelligence companies which are often off the radar of some commercial procurement teams. That’s unfortunate because the firms I follow are indeed cutting edge when it comes to real life intelligence analysis. Most of these products, in my experience, cost money either for engineering, training, support, or add ons.

You can find the video by navigating to this link or running a query for Arnold Dark Cyber on Google.com or on Googlevideo.com.

Stephen E Arnold, November 30, 2017

Enterprise Search: Will Synthetic Hormones Produce a Revenue Winner?

October 27, 2017

One of my colleagues provided me with a copy of the 24 page report with the hefty title:

In Search for Insight 2017. Enterprise Search and Findability Survey. Insights from 2012-2017

I stumbled on the phrase “In Search for Insight 2017.”

image

The report combines survey data with observations about what’s going to make enterprise search great again. I use the word “again” because:

  • The buy up and sell out craziness which culminated with Microsoft’s buying Fast Search & Transfer in 2008 and Hewlett Packard’s purchase of Autonomy in 2011 marked the end of the old-school enterprise search vendors. As you may recall, Fast Search was the subject of a criminal investigation and the HP Autonomy deal continues to make its way through the legal system. You may perceive these two deals as barn burners. I see them as capstones for the era during which search was marketed as the solution to information problems in organizations.
  • The word “search” has become confusing and devalued. For most people, “search” means the Danny Sullivan search engine optimization systems and methods. For those with some experience in information science, “search” means locating relevant information. SEO erodes relevance; the less popular connotation of the word suggests answering a user’s question. Not surprisingly, jargon has been used for many years in an effort to explain that “enterprise search” is infused with taxonomies, ontologies, semantic technologies, clustering, discovery, natural language processing, and other verbal chrome trim to make search into a Next Big Thing again. From my point of view, search is a utility and a code word for spoofing Google so that an irrelevant page appears instead of the answer the user seeks.
  • The enterprise search landscape (the title of one of my monographs) has been bulldozed and reworked. The money in the old school precision and recall type of search comes from consulting. Search Technologies was acquired by Accenture to add services revenue to the management consulting firm’s repertoire of MBA fixes. What is left are companies offering “solutions” which require substantial engineering, consulting, and training services. The “engine”, in many cases, are open source systems which one can download without burdensome license fees. From my point of view, search boils down to picking an open source solution. If those don’t work, one can license a proprietary system wrapped around open source. If one wants a proprietary system, there are some available, but these are not likely to reach the lofty heights of the Fast Search or Autonomy IDOL systems in the salad days of enterprise search and its promises of a universal search system. The universal search outfit Google pulled out of enterprise search for a reason.

I want to highlight five of the points in the 24 page write up. Please, register to get your own copy of this document.

Here are my five highlights. My comments are in italics after each quote from the document:

Read more

Lucidworks: The Future of Search Which Has Already Arrived

August 24, 2017

I am pushing 74, but I am interested in the future of search. The reason is that with each passing day I find it more and more difficult to locate the information I need as my routine research for my books and other work. I was anticipating a juicy read when I requested a copy of “Enterprise Search in 2025.” The “book” is a nine page PDF. After two years of effort and much research, my team and I were able to squeeze the basics of Dark Web investigative techniques into about 200 pages. I assumed that a nine-page book would deliver a high-impact payload comparable to one of the chapters in one of my books like CyberOSINT or Dark Web Notebook.

I was surprised that a nine-page document was described as a “book.” I was quite surprised by the Lucidworks’ description of the future. For me, Lucidworks is describing information access already available to me and most companies from established vendors.

The book’s main idea in my opinion is as understandable as this unlabeled, data-free graphic which introduces the text content assembled by Lucidworks.

image

However, the pamphlet’s text does not make this diagram understandable to me. I noted these points as I worked through the basic argument that client server search is on the downturn. Okay. I think I understand, but the assertion “Solr killed the client-server stars” was interesting. I read this statement and highlighted it:

Other solutions developed, but the Solr ecosystem became the unmatched winner of the search market. Search 1.0 was over and Solr won.

In the world of open source search, Lucene and Solr have gained adherents. Based on the information my team gathered when we were working on an IDC open source search project, the dominant open source search system was Lucene. If our data were accurate when we did the research, Elastic’s Elasticsearch had emerged as the go-to open source search system. The alternatives like Solr and Flaxsearch have their users and supporters, but Elastic, founded by Shay Branon, was a definite step up from his earlier search service called Compass.

In the span of two and a half years, Elastic had garnered more than a $100 million in funding by 2014and expanded into a number adjacent information access market sectors. Reports I have received from those attending Elastic meetings was that Elastic was putting considerable pressure on proprietary search systems and a bit of a squeeze on Lucidworks. Google’s withdrawing its odd duck Google Search Appliance may have been, in small part, due to the rise of Elasticsearch and the changes made by organizations trying to figure out how to make sense of the digital information to which their staff had access.

But enough about the Lucene-Solr and open source versus proprietary search yin and yang tension.

Read more

Web Search Training Wheels: A Play for Precision

August 10, 2017

I read “How to Instantly Boost the Accuracy of Search Results on Google and Bing.” i love the word “instantly”, particularly when coupled to “accuracy.” The write up describes an overlay called Advangle, which helps a person create a search with more than 2.6 words. Interesting neologism Advangle.

These services are what I call “training wheels.” The idea is that a person looking for information fills in a form, which helps the person create a query more sophisticated than “pizza.” Many systems in the last 50 years have tried these types of interfaces. In fact, one can find them in the whiz bang interfaces available to cyber OSINT software users. I won’t drag the old Dow Jones interface into this post, nor will I provide screenshots of Palantir Gotham interfaces. (Hey, you probably know about these already.)

The write up, however, does not explore the concept in too much detail. I noted this statement:

The Advantage interface makes it easier to string together targeted searches with the right syntax, and in half the time it would take to type it all out by hand.

Saving time, not prediction or recall, is the unique selling proposition.

It is useful to keep in mind that formal search operators are still available to users of Bing, Google, Yandex, and a number of other systems. The problem is that as Web search has massified, a tiny faction of the users of ad supported Web search systems bother with formal operators like filetype: or other oddities.

The real problems with search are far deeper than an interface overlay. Let me highlight several which I find consistently troublesome:

  1. Finding a way to impart the skills of well executed reference interview conducted by an expert in online search and retrieval. (Marydee Ojala, Ruth Patel, Anne Mintz, Ulla de Stricker, and Barbara Quint are individuals who can help a PhD formulate a statement of what information and data are needed, convert that desire into appropriate queries of appropriate databases, and deliver a filtered list of results.) Software, no matter how nifty the interface, at this time cannot replicate this expertise.
  2. Individuals who need information are more crippled than their counterparts from 30 years ago. Online systems have worked hard to let popularity and past user behavior provide a context for a query like “cyrus.” If you think you will get the pop star before a long dead historical figure, you are more sophisticated than the eager consumers of pop up ads on a Pixel phone
  3. Databases are governed by editorial policies. In the good old days of 1975, creators of databases figured out what and how to index. Today most users believe that Google has “all” the world’s information. Nothing could be more wrong headed. Indexes, particularly free ones, include what creates traffic. If the content gets a little too frisky, censorship, filtering, and smart / predictive software steps in and delivers “better” information.

I suggest you give the Advantage service a try. You may find that it is better than a room stuffed with Quints and Ojalas and others of this ilk.

My approach is simple: Know what one wants. Formulate a suitable query. Pass the query across the sources/databases likely to have indexed the information. Review the results. Think about the information gaps. Repeat the process.

Pretty crazy today, right?

Who has time to figure out what companies are in the cyber OSINT business or what Dark Web sites continue to offer contraband in the wake of AlphaBay and Hansa.

Research via digital resources, unlike checking Facebook, is a bit of a mental workout.

On the other hand, why not let the ad supported search engines deliver exactly what they think you need. Better yet, let these outfits provide that information before you know you need it.

A system that actually delivered precise, on point, timely, and authoritative results would be great. It would be nice to be able to live forever and travel to the stars.

Reality is a tad different. UX is not yet a replacement for knowing how to research in a way that moves beyond finding Game of Thrones.

Stephen E Arnold, August 10, 2017

Smartlogic: A Buzzword Blizzard

August 2, 2017

I read “Semantic Enhancement Server.” Interesting stuff. The technology struck me as a cross between indexing, good old enterprise search, and assorted technologies. Individuals who are shopping for an automatic indexing systems (either with expensive, time consuming hand coded rules or a more Autonomy-like automatic approach) will want to kick the tires of the Smartlogic system. In addition to the echoes of the SchemaLogic approach, I noted a Thomson submachine gun firing buzzwords; for example:

best bets (I’m feeling lucky?)
dynamic summaries (like Island Software’s approach in the 1990s)
faceted search (hello, Endeca?)
model
navigator (like the Siderean “navigator”?)
real time
related topics (clustering like Vivisimo’s)
semantic (of course)
taxonomy
topic maps
topic pages (a Google report as described in US29970198481)
topic path browser (aka breadcrumbs?)
visualization

What struck me after I compiled this list about a system that “drives exceptional user search experiences” was that Smartlogic is repeating the marketing approach of traditional vendors of enterprise search. The marketing lingo and “one size fits all” triggered thoughts of Convera, Delphes, Entopia, Fast Search & Transfer, and Siderean Software, among others.

I asked myself:

Is it possible for one company’s software to perform such a remarkable array of functions in a way that is easy to implement, affordable, and scalable? There are industrial strength systems which perform many of these functions. Examples range from BAE’s intelligence system to the Palantir Gotham platform.

My hypothesis is that Smartlogic might struggle to process a real time flow of WhatsApp messages, YouTube content, and mobile phone intercept voice calls. Toss in the multi language content which is becoming increasingly important to enterprises, and the notional balloon I am floating says, “Generating buzzwords and associated over inflated expectations is really easy. Delivering high accuracy, affordable, and scalable content processing is a bit more difficult.”

Perhaps Smartlogic has cracked the content processing equivalent of the Voynich manuscript.

image

Will buzzwords crack the Voynich manuscript’s inscrutable text? What if Voynich is a fake? How will modern content processing systems deal with this type of content? Running some content processing tests might provide some insight into systems which possess Watson-esque capabilities.

What happened to those vendors like Convera, Delphes, Entopia, Fast Search & Transfer, and  Siderean Software, among others? (Free profiles of these companies are available at www.xenky.com/vendor-profiles.) Oh, that’s right. The reality of the marketplace did not match the companies’ assertions about technology. Investors and licensees of some of these systems were able to survive the buzzword blizzard. Some became the digital equivalent of Ötzi, 5,300 year old iceman.

Stephen E Arnold, August 2, 2017

New Enterprise Search Market Study

August 1, 2017

Don Quixote and Solving Death: No Problem, Amigo

I read “Global Enterprise Search Market 2017-2022.” I was surprised that a consulting firms would invest time and energy in writing about a market sector which has not been thriving. Now don’t start sending me email about my lack of cheerfulness about enterprise search. The sector is thriving, but it is doing so with approaches that are disguised as applications which deliver something other than inflated expectations, business closures, and lawsuits.

Image result for don quixote

I will slay the beast that is enterprise search. “Hold still, you knave!”

First, let’s look at what the report covers, then I will tackle some of the issues about which I think as the author of the Enterprise Search Report and a number of search-related articles and analyses. (The articles are available from the estimable Information Today Web site, and the free analyses may be located at www.xenky.com/vendor-profiles.

The write up told me that enterprise search boils down to these companies:

Coveo Corp
Dassault Systemes
IBM Corp
Microsoft
Oracle
SAP AG

Coveo is a fork of Copernic. Yep, it’s a proprietary system which originally was focused on providing search for Microsoft. Now the company has spread its wings to include a raft of functions which range from the cloud to customer support / help desk services.

Dassault Systèmes is the owner of Exalead. Since the acquisition, Exalead as a brand has faded. The desktop search system was killed, and its proprietary technology lives on mostly as a replacement for Dassault’s internal search system which was based on Autonomy. Most of the search wizards have left, but the Exalead technology was good before Dassault learned that selling search was indeed a challenge.

IBM offers a number of products which include open source Lucene, acquired technology like Vivisimo’s clustering engine, and home brew code from its IBM wizards. (Did you  know that the precursor of PageRank was an IBM “invention”?) The key is that IBM uses search to sell services which have a higher margins than providing a free version of brute force information access.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta