Palantir Technologies: Gored by a Fact-Curious Bull?

September 2, 2019

A colleague forwarded me a link to “How CIA Backed Palantir Is Helping Police Root Out Thought Crimes.” Sexy title. It whispers to indexing spiders, “I am your friend. Index me, and we can kick back and talk about IPOs.” A couple of brief observations on this Sunday morning. Cue “Sunday Morning Coming Down.”

First, Palantir Technologies faces the same challenge other search-centric companies encounter: Generating sufficient payoff for investors. Not every investor is okay with losing a truck load of cash for tax purposes. The fix? An IPO. I am not sure that a single version of Palantir can pull this off. Therefore, it seems as if a company can be created to do Palantir’s “Let’s talk about what we do” work. Another company does the “it’s really secret” work.

Second, the CIA operates a venture fund, and that fund injects cash into a large number of companies. In that sense, Palantir Technologies is not unusual. The company was founded in 2003, and its technology is now behind the curve when compared to some of the newer investigative and intelware tools. Poke around Herzliya, and you will spot a number of “interesting” companies with much more zip zip implementations of the technology introduced decades ago by i2 Ltd. Yep, Palantir is what DarkCyber thinks of as a “me too” product, and that means more competition. Implications? See the preceding paragraph. Why does the CIA invest? Well, In-Q-Tel has not been delivering supported products which play particularly well with legacy systems or other tools in use. The investment helps make sure some basic compatibility and standard functions are implemented in a solution. Let’s not assume that meetings are the same as “telling a Silicon Valley whiz kid what to do.”

Third, the last DarkCyber heard was that the LA police department is indeed a customer. But the LAPD gets a bit of a discount and with budgets being budgets there is a possibility that LAPD will shift to another system. There are dozens of companies eager to provide Palantir type services within 30 minutes of Tyson’s Corner in Northern Virginia. But the problem with local police departments is that there are lots of them. But most of these outfits are strapped for cash, headcount, and cyber expertise. Selling more to outfits without much money to spend is not a recipe for success in DarkCyber’s opinion.

Fourth, the thought police angle is okay, but so far predictive analytics are useful up to a point. And that point is an unpredicted event, an outlier, or something that the 80 percent confidence analysis missed. Consider the performance of Predictive Policing and Palantir Gotham interaction in LA. There were problems because the data flowing into the system were like most data — not well groomed, house trained, and consistent. Therefore, neither PredPol nor Gotham turned cartwheels in joy when results were analyzed. Our DarkCyber video explored some of these issues in April 2019. Here’s a link to one in our LA Police Report series. Others can be located by searching this blog for DarkCyber PredPol.

Net net: Palantir Technologies is an important company. It faces challenges. How the firm figures out how to keep investors really happy, please stakeholders, and deal with the flow of better, faster, cheaper solutions from Israel and other countries are big problems.

Forget science fiction. Focus on the reality of the policeware and intelware markets.

Stephen E Arnold, September 2, 2019

The New Lingo of Enterprise Search

August 28, 2019

Enterprise search is back. My Google Alert has been delivering market research reports which tell me that finding information is huge. Plus, there have been some announcements about funding which have surprised me. Examples include:

  • Capacity raised $13.2 million. Source: DarkCyber
  • LucidWorks snagged an additional $100 million. Source: Globe News Wire
  • Squirro pulled in additional funds, but the timing of the Salesforce investment and additional funding of this Zurich based company remains a bit of a mystery. Source: Venture Lab

These are just three examples plucked from my box of note cards about search vendors.

What’s interesting is the lingo, the jargon, and the argot these outfits are using. Frankly the plumbing is usually open source, a fact which the companies bury beneath the blizzard of buzzwords.

Here are some examples:

AI powered

actionable insights

artificial intelligence

cloud

cognitive

connect the dots

data mining

fusion

information mining

machine learning

natural language

pattern detection

platform

self learning

transform

The problem with the vendors collecting investment funds are easy to identify:

  1. The content processed is text. The unstructured information in videos, podcasts, messaging apps like WhatsApp, images like chemical structures and engineering drawings, etc. are not included.
  2. Indexing content residing on cloud platforms may work today, but as market dynamics shift, access to that content my be blocked or prohibited by regulations in certain countries
  3. Federation, on-the-fly so that real time information is available remains a challenge which typically requires script fiddling or new content filters
  4. Configuration of “smart” systems is not significantly different from the complex, time consuming, and expensive procedures which added friction to some Autonomy, Convera, Fast Search & Transfer, and similar systems’ deployment
  5. Maintenance is an issue, micro services work well in a low latency environment. Under loads, the magic of sub three second response can disappear
  6. Search remains an idiosyncratic solution. Many departments require specific features. As a result, enterprise search — regardless of the wrappers around open source information retrieval systems — is a series of customizations.

To sum up, enterprise search has failed to deliver for more than 50 years. Despite the optimism that investors have for “finding the next Google”, enterprise search vendors will find themselves hitting a revenue ceiling just as Autonomy, Fast Search, and similar firms did.

The fix was acquisitions and allegations of financial fancy dancing. If we assume that investors still dream of a 10x or higher return, is it possible that LucidWorks can generate sufficient revenue to pull off an IPO or a sale like Exalead, Vivisimo, and other search vendors were able to complete before the hammer fell?

This is an important question because new enterprise search vendors are popping up like mushrooms. The incumbents like Attivio, Coveo, Mindbreeze, and Sinequa are also trying to smash a ball over the fence.

Net net: Enterprise search appears to be putting on the worn slippers last used by the founders of Fast Search & Transfer. Maybe Microsoft will buy another enterprise search vendor? The problem is that enterprise search is easy to make visible with marketing LED lights. Delivering sustainable revenues is a far greater challenge when Amazon is a competitor and a platform enabler.

What happens when Amazon competes more aggressively, raises its prices, or bundles text search into another of its services?

Answer: Nothing particularly beneficial for the investors in new and improved enterprise search solutions based on Lucene/Solr and dusted with disco glitter.

Stephen E Arnold, August 28, 2019

Google Cookies: Dancing Around

August 28, 2019

In my Google Version 2: The Calculating Predator, I summarized a number of Google innovations which embed tracking. One of the more interesting approaches was for Google to become the Internet; that is, when you run a query, you are accessing the Internet as it exists within Google. (If you want more information, write benkent2020 @ yahoo dot com. I sell a set of “fair copies” of these original books I submitted to a now defunct publisher in Brexitland. There are some minor typos and a dropped graphic or two, but the info is there.)

I wrote the Google monographs in 2003 to 2008.

The tracking functions, the walled garden, the Google version of the Internet — each of these were in place more than 15 years ago. Therefore, any modification of Google’s cookies polices and the associated technology like Ramanathan Guha’s and Alon Halevy’s innovations is a very big job. Given the present state of the Google architecture, I am not sure that the existing crew of 100,000 plus could make such modifications without having many Google services break. “Services”, however, are not what users experience. The services are the internal operations that ensure ads get displayed, the click stream data are collected, the internal components have access to fresh user behavior data, and the public facing outputs like search results, “did you mean”, and even the “I’m feeling lucky” are in line with what Google’s financial demands require. Remember: Ads have to be displayed and users induced to click on them to make the Yahoo-GoTo-Overture inspired system function.

Cookies, including the special DoubleClick variety and the garden variety “expire a long time in the future” type are important to the Google system. If you can’t find content in an index, the reason may be that the site’s content is no longer generating clicks. Indexing becomes more important with each passing day. How does one control costs? Well, those cookies and beacons are helpful. No signals of click love, then less frequent or zero indexing. Thus, indexing costs can be managed which is almost impossible if a spider just follows links, changed content, and new information. Where is an index to the content on “beat sites” like Beatstars.com? Answer: The content is not indexed if our recent test queries are accurate. (I know, “What’s beat content? Not in this write up, gentle reader, not in this write up.)

Against this background I want to call your attention to “Deconstructing Google’s Excuses on Tracking Protection.” The write up is a reasonable analysis of Google saying that it wants to be more respectful of user’s privacy.

DarkCyber thought the summary of cookies was good. Here’s the passage we circled:

Our high-level points are:

1) Cookie blocking does not undermine web privacy. Google’s claim to the contrary is privacy gas lighting.

2) There is little trustworthy evidence on the comparative value of tracking-based advertising.

3) Google has not devised an innovative way to balance privacy and advertising; it is latching onto prior approaches that it previously disclaimed as impractical.

4) Google is attempting a punt to the web standardization process, which will at best result in years of delay.

My concern is that this type of write up does not specifically state what Google is doing. The use of the phrase “gas lighting” and the invocation of Shoshana Zuboff’s The Age of Surveillance Capitalism are very trendy.

Unfortunately, plain talk is needed. With Google search the primary conduit of what is “important”, the game is no longer one of cookies.

Exactly what can a government or a committee do to address more than 15 years of engineering specifically designed to track people, cluster individuals into groups, predict what the majority of those in a statistically valid cluster want, and make sense of individual user behavior cues?

One step may be that writers and analysts adopt a more direct, blunt way of explaining Google/DoubleClick tracking. The reason individuals do not speak out is that there is what I call “Google fright”. It affects news release services. It affects analysts. It affects “real journalists.” It affects Google’s would be government watch dogs.

Who doesn’t want a Google mouse pad or T shirt? Darned few. Fear of Google may be a factor to consider when reading about DarkCyber’s favorite ad supported, Web search system.

Stephen E Arnold, August 28, 2019

Enterprise Search: AI and a Low Spend

August 26, 2019

DarkCyber Read “Capacity Raises $13.2 Million to Index Emails, Files, and More with AI.” The company was founded in 2017. We noted this passage:

Capacity (formerly Jane.ai), [is] a startup developing a platform that indexes data from apps, teams, and more and enables users to search through the corpus using natural language.

Plus, the system learns and improves over time.

The company’s funding to deliver AI, multi-source enterprise search is “over $21 million.”

One of the founders is CEO David Karandish, formerly the CEO of Answers.com. He is quoted as saying:

[Capacity] is an intuitive, intelligent AI-powered Teammate who gives employees instant access to the information they need to do their jobs well.

The indexing system can process content from such systems as:

  • ADP human resource information
  • Box
  • NetSuite
  • Google Gmail
  • Microsoft Exchange
  • Microsoft OneDrive
  • Sage human resource information
  • Salesforce
  • ServiceNow
  • Zendesk

The system includes “a chatbot with natural language processing capabilities that integrates with popular messaging apps such as Slack and Skype.”

We noted this statement:

Capacity can deliver company-wide announcements, like daily news and event notifications, and onboard new hires by providing access to forms that need to be completed. For customers with websites that have FAQ sections, it can be made public-facing to help cut down on customer service requests.

If Capacity can deliver, outfits like LucidWorks will have some explaining to do to its investors.

Stephen E Arnold, August 26, 2019

Amazonia for August 26, 2019

August 26, 2019

Amazon has been criticized in the last seven days. If anything, the scrutiny of the firm has increased. Examples include reactions to good news tweets from happy warehouse workers to stronger hints that government investigations are gathering steam. Other developments DarkCyber noted are:

Amazon AWS Crashes

DarkCyber spotted a report from FXStreet with the disconcerting headline: “The Amazon Web Services Crash Is Causing Havoc with Crypto Exchanges (Could Explain BitMex).” The write up presents this information:

AWS has crashed according to reports on twitter causing havoc at crypto currency exchanges.

Coindesk has chimed in, reporting that KuCoin is having problems.

If true, one might pose this question:

How reliable is Amazon AWS?

DarkCyber hypothesizes that the answer will be, “Good enough.” But is good enough good enough? DarkCyber is feeling Gnostic today.)

More Publishers Grousing, Squawking, and Releasing Legal Eagles

Reuters reported that top US publishers are suing Amazon Audible. The reason? Copyright infringement. The real news outfit reported:

Audible was sued by some of the top U.S. publishers for copyright infringement on Friday, aiming to block a planned rollout of a feature called ‘Audible Captions’ that shows the text on screen as a book is narrated.

The idea is that Amazon needs to obtain permission to display text on a screen. (Will some produce a motion picture channeling “Snakes on a Plane” with the title “Text on a Screen? The FBI agent could be played by Maya Mavgee maybe?)

Amazon Gives Up Control of It Site and Other Horrors

“Amazon Has Ceded Control of Its Site. The Result: Thousands of Banned, Unsafe or Mislabeled Products” has a serious allegation about the online bookstore. The pay walled story includes a nifty illustration. Here’s a snippet of the image:

image

Presumably the stuffed animals might harm you. The clock? Maybe it will chop off a child’s fingers. The flashlight? It could explode and remove your entire hand! The sticker? Oh, the sticker?

How many Amazon products are banned? Ars Technica says, “4,100” and references the Wall Street Journal.

The consequences are too horrible to contemplate. Amazon has to clean up its product offerings?

What would this product do to you?

image

The answer DarkCyber knows not.

PS. For a similar “Amazon is bad” write up. Check out the New York Times’ disclosure that the George Orwell you buy on Amazon may be a fake, rewritten, or some other dastardly bastardization of 1984 in 2019. Source: New York Times, complete with pay wall, begging for email address, etc. from a somewhat needy Gray Lady.

Amazon: Hard Sell at the Pentagon

ProPublica may be doing a type of journalism not practiced at the Washington Post. The nonprofit news out published “How Amazon and Silicon Valley Seduced the Pentagon.” The subtitle is a click magnet:

Tech moguls like Jeff Bezos and Eric Schmidt have gotten unprecedented access to the Pentagon. And one whistle blower who raised flags has paid the price.

When printed out, the article required 13 pages. Please navigate to the source document or one of the recycled versions of the story.

Several observations are warranted:

  1. Blowing the whistle on big wheels does not seem to be a career enhancing action. Just sayin’.
  2. The emphasis on Amazon is okay, but the real subject of the write up is the GOOG. But once Google fired the Department of Defense, changing the title was probably easier than beefing up the Amazon content.
  3. The Google may have been in a prime position to nab significant billions from the DoD. But quitting Project Maven, opening the door for Anduril Industries, and igniting a certain Silicon Valley big wheel to toss around suggestions of treason was significant.

There is juicy Amazon fruit in the write up. But the Google is front and center in this interesting company.

Will Amazon “win” the JEDI contract? DarkCyber is not sure. We hope it works better than the first delivered F 35 aircraft when JEDI leaves the launch pad. (No, we did not consult an “oracle” for this information.)

Amazon Enhances Australia

ZDNet published “What Amazon Web Services Security Certification Is Doing for Government.” The main idea is that the government of Australia is “now getting its hands on new technology.” DarkCyber learned:

When Amazon Web Services (AWS) achieved protected-level certification earlier this year, which meant it could provide storage for highly sensitive government workloads out of its AWS Asia Pacific region in Sydney, the company’s head of solution architecture Simon Elisha said it helped “unlock innovation” for the public sector.

Will similar benefits accrue to the US if Amazon wins the JEDI competition?

Also related to Australia: ZDNet reports that Amazon now offers a job placement service for Australian veterans. Good for Australian veterans, yes. The initiative appears to be part of Amazon’s effort to teach programmers how to make Amazon the world’s operating environment and know about Amazon’s hundreds of products, services, and functions.

Amazon: Big Revenue, Tiny Profits

The write up “Amazon’s Tiny Profits Explained.” We had a habit of napping in Econ 101 and just studying for the tests in Finance class. Amazon uses a range of techniques to keep profits down. There’s even a hockey stick and earthworm chart to show how the numbers have flower for a decade. Mr. Bezos worked on Wall Street, which may be something to keep in mind.

image

DarkCyber thinks it understands the profit method. The write up does not tackle a question DarkCyber finds more interesting; that is,

Why does Amazon pay low or no taxes?

The write up has an answer: Investment. We noted:

Amazon’s internal investments also keep its tax bill down, saving the company money. While we don’t know exactly what Amazon pays in taxes, various estimates suggest its rate is low thanks in part to its huge investments in its business. What we do know is that its taxes have provided plenty of fodder for presidential candidates like Joe Biden, who’s mentioned it on his campaign and on Twitter, and Elizabeth Warren, who included the company as an example in her new corporate tax proposal. President Donald Trump has also harangued the company for not paying enough in taxes. Amazon has responded that it’s simply paying what the government says it owes.

How skilled are Amazon’s finance and tax professionals? Skilled enough to keep Mr. Bezos happy.

Oh, Oh, Alexa: Dumber than Google?

We noted this write up by a relative of Debbie Downer called “The Results Are In: Alexa Is Legitimately Dumber than Siri and Google Assistant.” First off, DarkCyber would just say “Alexa is dumber than Siri and Google Assistant.” The legitimately and the results don’t add much. Alexa is dumb could be considered suitable as a headline as well.

The main point of the write up? Alexa is dumb.

We noted this statement:

The venture capital firm recently asked Amazon Alexa, Apple’s Siri, and Google Assistant the same 800 questions. Google Assistant was the most successful of the bunch and was able to answer 93% of the questions correctly. In comparison, Siri was only able to get 83% of the questions right, and Alexa got 80%. Samsung’s Bixby and Microsoft’s Cortana, both lesser-used voice assistants, didn’t even make the cut.

I am not sure is I have much confidence in venture capital funded or completed research. The scores appear to fall within the range of competent smart software systems. Keep in mind that accuracy rates with 10 to 20 percent “wrong” answers is likely to make decisions generated by these wondrous numerical recipes wrong— a lot. If one of those questions pertains to the antidote required to save your child, are you going to rely on smart software or a trained physician?

Dumb, by the way, is relative. Identifying rotten tomatoes is different from identifying bad actors. But the name of the game today is “good enough.” That’s what these smart systems deliver. And you know what? That’s good enough, which is something Debbie Downer intuits.

A Vote of Not Much Confidence

The assumption that Amazon is the solution to a range of problems may be correct for some people. “Companies Should Disclose Amazon Web Services as Material Risk” reminds people that “Amazon’s hack prone cloud computing platform” is an issue. The negative paint daub is a reaction to the former AWS professional who breached security at Capital One and possibly more than 24 other companies. DarkCyber noted this statement in the report:

regardless of any potential SEC actions, shareholders should be demanding answers about AWS usage from companies already in their portfolio and those in which they are considering investing.

Amazon Forecast Available

Amazon has made its machine learning technology to the public. Amazon Forecast is a managed service which outputs forecasts. With the technology one can predict demand for products and services. The system also makes it possible to predict infrastructure requirements, energy demand, and similar variables; for example, allocation of police resources. Amazon Forecast produces private, custom models that can help developers make predictions that are up to 50% more accurate than traditional methods.Amazon Forecast automatically sets up a data pipeline, ingests data, trains a model, provides accuracy metrics, and performs forecasts. Amazon asserts that developers do not have to have any expertise in machine learning to use the service. More information is available at https://aws.amazon.com/forecast/. DarkCyber anticipates that as this product matures, its functions will be a direct competitive threat to Palantir Technologies, Recorded Future, and similar policeware and intelware vendors.

Amazon to Increase Staff in Portland

BizJournals reported that Amazon will add up to 400 new jobs in Portland, Oregon. This “real news” item is protected by a pay wall. But a free version with more information is available from MarketWatch at this link. Amazon has been a good corporate citizen. We learned:

The company has created more than 3,500 full-time jobs in Oregon since 2010 and invested over $9 billion in the state, including customer fulfillment facilities, cloud infrastructure, and compensation to its employees.

Amazon India

We reported that Amazon has been chugging toward India. The Amazon facility is, according to Reuters, “its biggest global campus.” Amazon India is growing fast and needs to expand in Hyderabad. How big?

The new campus in India, spread over 9.5 acres and costing “hundreds of millions of dollars”, will house over 15,000 employees, the company said. Amazon has 62,000 employees in India, roughly a third of whom are based in Hyderabad.

Portland’s 400 staff additions sends an interesting signal.

Move Over US Medical Database/Taxonomy Experts. AWS Is Now the Sheriff of This Here Domain

The individuals who build controlled vocabularies have embraced the term “metadata”. Goodbye, indexing. Jargon is better. Some of the people who build controlled term lists are into certain fields. Medical terminology is an example which keeps “Taxonomy in a Day” types at bay.

Who should create approved medical terminology? How about the National Institutes of Health?

Wrong.

The correct answer appears in “The ADHA Is Simplifying Its Clinical Terminology Database with AWS.” The ZDNet write up reports like a good “real news” outfit:

the ADHA has developed NCTS 2.0 to be more simplified by taking a serverless approach to the system to take advantage of the AWS shared responsibility model.

DarkCyber thinks that this is important, a harbinger, and an approach coming to America.

Defining terms frames reality. When reality is the AWS SageMaker system, there will be some downstream adjustments that individuals, indexers, and commercial health and database publishers will find interesting.

Change or die in the Amazon forest.

Amazon Bahrain Is Open and Training People

Get trained up or get left at the station. AWS is holding cloud training for Bahrain businesses. Why? you ask.

Trade Arabia states:

the new region adds to the already existing investment of infrastructure from Amazon in the Middle East with the already operational Amazon CloudFront edge locations in the cities Dubai, and Fujairah, in the United Arab Emirates.

Amazon AWS Inspires Third Party Hardware

We found “Renesas Electronics Enhanced RX65N WiFi Connectivity Cloud Kit Simplifies Secure IoT Endpoint Device Connections to Amazon Web Services” long winded. The main point is that Renesas built a card which includes on board support for Amazon FreeRTOS. Connection to AWS is, thus, easy. What else is on the device? Here’s a short list: Dual bank flash for over-the-air (OTA) firmware updates and Trusted Secure IP (TSIP). The cost? Just $50.

Amazon Supported Ignite: Farm to Consumer Start Up

All the Farms is a Web site that finds farms. The idea is that a person can locate fresh produce near one’s home. According to the Register Guard:

The US Ignite Startup Accelerator Program, partnered with Amazon Web Services, this year accepted 19 startups from across the country. Each was deemed a business-ready startup with a product that could help create “smart cities.”

Like Google, Amazon wants to spot high potential start ups. If some of those outfits need cloud technology, it is possible that the Bezos bulldozer could hook a needy outfit up to the megawatt outfit’s data center. Any connection to Whole Foods? The write up does not speculate.

Amazon and Blockchain

Amazon has announced that its Managed Blockchain is going to get cloud support through Amazon’s CloudFormation. The idea is that scaling will be easier. Source: FXStreet

Gaps in AWS Security? Your Problem

According to Forbes, the capitalist tool, yes. “The Truth About Privileged Access Security On AWS and Other Public Clouds” reveals that basic security services are provided but:

the free version often doesn’t go far enough to support PAM at the enterprise level. To AWS’s credit, they continue to invest in IAM features while fine-tuning how Config Rules in their IAM can create alerts using AWS Lambda. AWS’s native IAM can also integrate at the API level to HR systems and corporate directories, and suspend users who violate access privileges.

The write  up points out:

  1. AWS can’t protect you
  2. Use the security model provided
  3. Use the AWS identity infrastructure
  4. You can go cross cloud with security.

How? It’s simple. Just assemble the parts shown in the figure below:

shared responsibility model

Remember how IBM, Oracle, and Microsoft would lock customers in? Amazon uses the same methods.

Partners/Resellers/Consultants

Amazon continues to gather third parties for a Bezos bulldozer ride. Examples are:

Academy Software Foundation. This outfit has snagged AWS as a premier member. Wait. Amazon has joined the movie industry outfit. Source: Newkerala

Druva. The data protection start up enables intelligent data storage on AWS. Source: Silicon Angle

Rockset. The company has released areal time SQL for Amazon’s DynamoDB. Source: MarketWatch

SoftServe. The consulting firm has expanded its relationship with Amazon. Source: Yahoo

Stackery. The serverless workflow software is now available on AWS. Apps can be managed from development to production. Source: Digital Journal

Wespac. The Little Ripper drone is now an Amazon partner.

Customers can now tap into near real time video streaming via the cloud. Anduril Industries, are you nervous? Source: Aero News Net

Stephen E Arnold, August 22, 2019

Search: Useless Results Finally Recognized?

August 22, 2019

I cannot remember how many years ago it was since I wrote “Search Sucks” for Barbara Quint, the late editor of Searcher. I recall her comment to me, “Finally, someone in the industry speaks out.”

Flash forward a decade. I can now repeat her comment to me with some minor updating: “Finally someone recognized by the capitalist tool, Forbes Magazine, recognizes that search sucks.

The death of search was precipitated by several factors. Mentioning these after a decade of ignoring Web search still makes me angry. The failure of assorted commercial search vendors, the glacial movement of key trade associations, and the ineffectuality of search “experts” still makes me angry.

Image result for fake information

There are other factors contributing to the sorry state of Web search today. Note: I am narrowing my focus to the “free” Web search systems. If I have the energy, I may focus on the remarkable performance of “enterprise search.” But not today.

Here are the reasons Web search fell to laughable levels of utility:

  1. Google adopted the GoTo / Overture / Yahoo approach to determining relevance. This is the pay-to-play model.
  2. Search engine optimization “experts” figured out that Google allowed some fiddling with how it determined “relevance.” Google and other ad supported search systems then suggested that those listings might decay. The fix? Buy ads.
  3. Users who were born with mobile phones and flexible fingers styled themselves “search experts” along with any other individual who obtains information by looking for “answers” in a “free” Web search system.
  4. The willful abandonment of editorial policies, yardsticks like precision and recall, and human indexing guaranteed that smart software would put the nails in the coffin of relevance. Note: artificial intelligence and super duped automated indexing systems are right about 80 percent of the time when hammering scientific, technical, and engineering information. Toss is blog posts, tweets, and Web content created by people who skipped high school English and the accuracy plummets. Way down, folks. Just like facial recognition systems.

The information presented in “As Search Engines Increasingly Turn To AI They Are Harming Search” is astounding. Not because it is new, but because it is a reflection of what I call the Web search mentality.

Here’s an example:

Yet over the past few years, search engines of all kinds have increasingly turned to deep learning-powered categorization and recommendation algorithms to augment and slowly replace the traditional keyword search. Behavioral and interest-based personalization has further eroded the impact of keyword searches, meaning that if ten people all search for the same thing, they may all get different results. As search engines depreciate traditional raw “search” in favor of AI-assisted navigation, the concept of informational access is being harmed and our digital world is being redefined by the limitations of today’s AI.

The problem is not artificial intelligence.

Read more

The Platform of the Future Is…

August 2, 2019

What’s the platform of the future? Here are your choices:

[a] Artificial intelligence

[b] Neuro linguistic services

[c] Silicon brain implants connected to the cloud

[d] Indexing

[e] Pay to play content.

Did you pick “d”: Indexing.

If you did, you are on the same wavelength as the rock and roll, up and down advisory and analyst firm IDC.

The pronouncement comes from Stewart Bond, research director at IDC Research Inc. (Note: DarkCyber has written reports for IDC. The firm sold these reports on Amazon without DarkCyber’s permission, and IDC did not pay for the use of the DarkCyber reports. How much were our reports? $3,200 for eight pages of goodness? Want to know more? Drop us an email: darkcyber333 at yandex dot com.)

This revelation appeared in Silicon Angle which presented a summary of an interview with IDC Research’s director. Other gems from the write up were:

Pre-existing silos and multicloud can give companies a lot of disparate spaces to scavenge through. The most sensible place to start may be with the available data about all that data — or metadata.

Yes, indexing, an art practiced for millennia.

We noted this statement:

Companies are realizing that poorly cleansed or inaccurately labeled data are resulting in inaccurate insights. And vendors are rushing to the rescue. The number of vendors offering cataloging solutions has increased about 240% in the last year and a half, according to Bond’s research.

Hmm. What’s the research methodology? Remember that IDC has generated some specious numbers in the past; for example, the amount of time a person in a company spends looking for information. DarkCyber is curious about this 18 month period, the sample, the methodology, and the reliability of the analytic process. A 2.4X increase is robust, particularly for indexing and the accompanying tasks embraced in the sweeping generalization.

And we put an exclamation mark next to this passage:

Multicloud has flung data all over the place. Effective software must have spider legs that can reach out and quickly gather intelligence about it. Data cataloging may do this with machine learning, human annotation, Google-like search features, etc. “I think that’s going to be the data platform of the future,” Bond stated. Informatica Corp. currently leads in this market, according to Bond.

Okay, flinging data all over the place. Colorful. We also noted that Informatica Corp. is the leader in “this market.” Exactly what market are we thinking about. Google, search, cloud—what, which?

Keep in mind that Informatica has been around since 1993, and it has grown to about $1 billion a year in revenue. Impressive when compared to the local tire store, but a bit behind the curve when it comes to data. Amazon in the last quarter generated about $8 billion. Annualized Amazon is about 32X bigger than Informatica. Who will win in the cloud cataloging game? Informatica? Sure it will.

But why the love for Informatica? One possibility is that Informatica is a client or prospect of IDC. That’s an idea worth considering.

And where did this “indexing” pronouncement appear? In Silicon Angle. Here’s the explanation which appeared with the IDC research director’s startling insight:

SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content. If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.

DarkCyber interprets this information as a way to make “sponsored” content less front and center.

“Indexing” is a sure fire way to generate buzz for a consulting company and maybe, just maybe, some revenue from sponsored video for Silicon Angle.

The video is here.

Stephen E Arnold, August 2, 2019

Elastic App Search Ready for On-Premise Deployment

July 29, 2019

One of the most successful enterprise search companies, Elastic, is bringing its cloud-based App Search platform down to Earth. The company announces this development in its blog post, “Elastic App Search: Now Available as a Self-Managed Download.” Their director of product marketing, Diane Tetrault, writes:

“Empowered with valuable feedback from the community over the last few months’ beta program, the team has worked hard to bring the simplicity and power of the Elastic App Search Service to any infrastructure. It’s now available to download and deploy at scale, alongside the default distribution of Elastic Stack 7.2 (or later), anywhere.

We noted:

“While Elastic App Search has been around for over a year as a cloud-based solution, this release represents an important milestone. It highlights our commitment to offer the greatest flexibility in how and where developers deploy next-generation search experiences. Whether it be an online store, a geolocal directory, a vast music collection, or a SaaS application, Elastic App Search is the quickest way to build fluid and engaging search experiences. … “It is no secret that Elasticsearch is a powerhouse for search use cases of all kinds. That said, with great power comes great configurability. Our team worked relentlessly to channel the limitless potential of Elasticsearch into a streamlined package, purpose-built for application search use cases. In other words, you can now bring the relevance, scale, and speed of Elasticsearch to any application you’re building.”

App Search is free to use alongside the default distribution of the Elastic Stack. Naturally, the platform includes features Elastic users have come to rely on, like schema-free indexing, language-specific text analysis, pre-configured algorithms, relevance tuning, astute analytics, and impressive APIs and UI frameworks. In addition, they are introducing new user-management features that allow for easy-to-use role-based access controls or the built-in user management. Interested readers can check out the free trial.

Elastic began as Elasticsearch Inc. in 2012, simplified its name in 2015, and went public in 2018. The company is based in Mountain View, California, and maintains offices around the world. It also happens to hiring for quite a few positions at the time of this writing, in case any readers are interested.

Cynthia Murrell, July 29, 2019

HP-Autonomy and the KPMG Due Diligence Document

June 15, 2019

I noted this article in The Register, a UK online publication: “HP CFO Cathie Lesjak Didn’t Even Read KPMG’s Autonomy Due Diligence Before $11bn Biz Gobble.” The write up reports that Hewlett Packard professionals did not read a report about Autonomy prepared by the accounting and consulting services firm KPMG. DarkCyber finds the information in the article interesting. We noted this statement in the Register’s write up:

Barrister Robert Miles QC asked her: “I think you didn’t, yourself, read a due diligence report prepared by KPMG, is that right?” Lesjak replied: “I did not.”

As intriguing as this exchange between Autonomy’s attorney and an HP executive involved in the astounding $11 billion purchase, the Register provides a link to the “confidential” and “draft” report about the finances of Autonomy.

Image result for buyer beware

The document is available at this link. Note: that confidential documents can be removed from public access at any time. DarkCyber, an organization with more time but fewer resources than HP, read the document online.

DarkCyber’s conclusion is that HP’s failure to read the KPMG draft deprived the HP executives of information germane to the purchase price of $11 billion.

Other items of interest to DarkCyber in the KPMG document dated August 9, 2011, were:

  • KPMG itself lacked access to certain information; for example, certain details related to Autonomy’s income taxes
  • Autonomy’s financials (top line revenue and profits) were softening after the $870 million in revenue reported in FY2010
  • Autonomy used a method known as “Tower” in order to achieve certain financial objectives; namely, obtain maximum financial benefits from its activities such as loans.

The KPMG report is a “draft” and its authors presented sufficient information (even though that information is incomplete) to call into question the purchase of Autonomy for $11 billion.

The deal did not work out for either HP or Autonomy. HP lost traction with its shareholders. Autonomy found itself mired in an unpleasant and highly visible legal battle.

DarkCyber’s view is that companies engaged in search, retrieval, content processing, and allied disciplines have an unusual track record. For example, a number of little known companies simply failed to meet their revenue objectives and went out of business. Examples include Delphes (Canada), Entopia (Israel), InQuire, and others.

Other firms engaged in Autonomy-type software and services sought buyers in order to avoid financial problems. Examples include Exalead (acquired by Dassault), Vivisimo (acquired by IBM), and others.

Convera and Fast Search & Transfer are examples of enterprise search and Autonomy-type services caught in the same business quagmire as Autonomy; that is, robust promises about technology, difficulties generating sustainable revenue, problems in satisfying customers, and problems controlling infrastructure, R&D, and customer support costs. Convera (once Excalibur) was rescued by Allen & Company but was unable to deliver satisfactory solutions to information processing needs at Intel and the NBA. Fast Search & Transfer was involved in a financial investigation related to the company’s balance sheets. Microsoft stepped in and bought Fast Search in 2008.

Most of these problems with Autonomy-type companies stemmed from a combination of these miscalculations, errors in judgment, or over optimistic marketing:

  1. Search and retrieval is difficult to define; therefore, whatever system is installed at an organization will disappoint most of a system’s users. For this reason, large companies have a specialized system for legal, one for bench chemists, one for marketing, etc. Due to disenchantment, competitors can make a sale only to face clamors for engineering fixes or termination of the contract. Sustainable revenues are, therefore, a characteristic of Autonomy-type companies. (The KPMG report makes clear that Autonomy relied on acquisitions to increase its top line revenue.)
  2. Enterprise search vendors typically over promise and under deliver. Sales professionals and marketers glibly explain the value of unlocking the hidden value of an organization’s data. The reality is that the costs of determining what data are available, who can view certain data, cleansing and validating that data, indexing the data, and then keeping the indexes up to date and in line with access privileges is a significant burden. The cost of “unlocking’ exceed the available resources and appetite for investment in many licensees of Autonomy-type search systems. (The KPMG rolls these costs into undifferentiated line items, a serious omission. These costs help explain the “you can’t get there from here” problem inherent in Autonomy-type software.)
  3. Autonomy-type systems from the period covered in the KPMG report were mostly proprietary code. Over time, these code bases became increasingly complex and at the same time more fragile. As a result, the costs of standing up a system, fine tuning it, and then tailoring it to the needs of the licensee grew over time. Like the content preparation work in item 2, the ongoing costs of the Autonomy-type system added another set of hard to control costs. (The KPMG report does not provide detail related to the costs of triage engineering to fix urgent problems, on-going fixes, and work needed to keep the foundation system current with competitors’ innovations.)

There are other issues with the KPMG which DarkCyber noticed.

Net net: KPMG did a good job making clear that the deal was likely to be a difficult one due to the tax methods, the intra company financial processes, and the mechanisms used to allow Autonomy to demonstrate growth and reasonable margins over the period of time covered by the KPMG professionals.

HP seemed oblivious to the issues “enterprise search” posed; specifically, enterprise search is a niche business delivering expensive, proprietary solutions which rarely satisfy its users regardless of the vendor involved.

HP wanted to buy and buy big and fast. Autonomy appeared to be the solution to HP’s problems. KPMG identified the issues. Impulse buy? Maybe. Uninformed buy? Looks like it. Did Autonomy buff its show car software? Of course, getting the customer to buy is the objective.

Profiles of selected Autonomy-type software vendors are available without charge at the Xenky.com Vendors Web page. You can find that collection of vendor profiles at this link.

Stephen E Arnold, June 15, 2019

Google: What Does Relevance Mean?

May 11, 2019

Here’s the question for you: “What’s relevance?” The answer — if I understand the allegedly true information in “Google Creates ‘Dedicated Placement’ in Search results for AMP Stories, Starting with Travel Category” — is what Google decides you may see.

Forget the AMP thing because it is a content tiering play. No AMP, no display in a special section of results. Simple. Easy to understand, right?

Why is this important?

  1. Most users (searchers) accept what Google delivers, and Google delivers what generates revenue..
  2. The majority of users want convenience and will not want to spend time “looking for information”. (When one does not exert data energy, what one gets is good enough. Try to explain this information issue, the fish only know water. The world of gaseous oxygen is a tough concept.
  3. Users do not perceive the scope of the machinations which content producers and advertisers eager for clicks and eyeballs undertake in order to appear in the special AMP listing. Few care or have the knowledge foundation to discern the machinery grinding away.

Google pulls the strings. Relevance is what generates revenues or helps Google meet its objectives.

## puppet 300

Who controls relevance for a particular person looking for information?

Does this redefinition of relevance impact me and my DarkCyber researchers? No. The reason is that we know that search results on Google are skewed. We know content disappears from the index. We know that to track down a particular citation or document we have to resort to old fashioned methods. Phone calls, use of niche search tools, and even visits to libraries with information on microfilm are not unusual for us.

The problem is that for a majority of people looking for information online, those skills and the knowledge which lubricates their functioning is either gone or quickly eroding.

Try to find the US Army’s updated guideline for software procurement via Google? Try to locate information about Threatgrid and its connections to other security firms. Try to locate documents germane to the CMS MIC program which back up and sometimes replaces FBI personnel’s investigations of health care fraud. Try to find English language content about Moonwalk, a video service of considerable interest to some people.

For years, I have retained some interesting content because I know that content may not be findable the next time I use the “AMP’ed” up Google or the other aggressively filtering Web indexing systems. Sometimes you can hear my team’s teeth gnashing over the whine of our local storage systems.

I call this the findability crisis. Someone has public information, but others cannot find it. Therefore, that information is effectively unfindable or “gone.” Hasta la vista.” And there’s no, “I’ll be back” for these content objects.

With shallower indexing and deletion of “old” content (which some call either history or evidence), the world of free, ad supported Web search and retrieval is going medieval. To get information, one has to be one of the top one percent of information professionals.

Interesting? Only if one knows what’s happening, gentle reader.

Relevance? Yep, new definition. New world of information. Knowledge is not power. Knowledge is danger maybe?

Stephen E Arnold, May 11, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta