Search Engines: Bias, Filters, and Selective Indexing

March 15, 2021

I read “It’s Not Just a Social Media Problem: How Search Engines Spread Misinformation.” The write up begins with a Venn diagram. My hunch is that quite a few people interested in search engines will struggle with the visual. Then there is the concept that typing in a search team returns results are like loaded dice in a Manhattan craps game in Union Square.

The reasons, according to the write up, that search engines fall off the rails are:

  • Relevance feedback or the Google-borrowed CLEVER method from IBM Almaden’s patent
  • Fake stories which are picked up, indexed, and displayed as value infused,

The write up points out that people cannot differentiate between accurate, useful, or “factual” results and crazy information.

Okay, here’s my partial list of why Web search engines return flawed results:

  1. Stop words. Control the stop words and you control the info people can find
  2. Stored queries. Type what you want but get the results already bundled and ready to display.
  3. Selective spidering. The idea is that any index is a partial representation of the possible content. Instruct spiders to skip Web sites with information about peanut butter, and, bingo, no peanut butter information
  4. Spidering depth. Is the bad stuff deep in a Web site? Just limit the crawl to fewer links?
  5. Spider within a span. Is a marginal Web site linking to sites with info you want killed? Don’t follow links off a domain.
  6. Delete the past. Who looks at historical info? A better question, “What advertiser will pay to appear on old content?” Kill the backfile. Web indexes are not archives no matter what thumbtypers believe.

There are other methods available as well; for example, objectionable info can be placed in near line storage so that results from questionable sources display with latency or slow enough to cause the curious user to click away.

To sum up, some discussions of Web search are not complete or accurate.

Stephen E Arnold, March 15, 2021

Open Source Software: The Community Model in 2021

January 25, 2021

I read “Why I Wouldn’t Invest in Open-Source Companies, Even Though I Ran One.” I became interested in open source search when I was assembling the first of three editions of Enterprise Search Report in the early 2000s. I debated whether to include Compass Search, the precursor to Shay Branon’s Elasticsearch reprise. Over the years, I have kept my eye on open source search and retrieval. I prepared a report for an the outfit IDC, which happily published sections of the document and offering my write ups for $3,000 on Amazon. Too bad IDC had no agreement with me, managers who made Daffy Duck look like a model for MBAs, and a keen desire to find a buyer. Ah, the book still resides on one of my back of drives, and it contains a run down of where open source was getting traction. I wrote the report in 2011 before getting the shaft-a-rama from a mid tier consulting firm. Great experience!

The report included a few nuggets which in 2011 not many experts in enterprise search recognized; for instance:

  1. Large companies were early and enthusiastic adopters of open source search; for example Lucene. Why? Reduce costs and get out of the crazy environment which put Fast Search & Transfer-type executives in prison for violating some rules and regulations. The phrase I heard in some of my interviews was, “We want to get out of the proprietary software handcuffs.” Plus big outfits had plenty of information technology resources to throw at balky open source software.
  2. Developers saw open source in general and contributing to open source information retrieval projects as a really super duper way to get hired. For example, IBM — an early enthusiast for a search system which mostly worked — used the committers as feedstock. The practice became popular among other outfits as well.
  3. Venture outfits stuffed with oh-so-technical MBAs realized that consulting services could be wrapped around free software. Sure, there were legal niceties in the open source licenses, but these were not a big deal when Silicon Valley super lawyers were just a text message away.

There were other findings as well, including the initiatives underway to embed open source search, content processing, and related functions into commercial products. Attivio (formed by former super star managers from Fast Search & Transfer), Lucid Works, IBM, and other bright lights adopted open source software to [a] reduce costs, [b] eliminate the R&D required to implement certain new features, and [c] develop expensive, proprietary components, training, and services.

Read more

Security Vendors: Despite Marketing Claims for Smart Software Knee Jerk Response Is the Name of the Game

December 16, 2020

Update 3, December 16, 2020 at 1005 am US Eastern, the White House has activate its cyber emergency response protocol. Source: “White House Quietly Activates Cyber Emergency Response” at Cyberscoop.com. The directive is located at this link and verified at 1009 am US Eastern as online.

Update 2, December 16, 2020 at 1002 am US Eastern. The Department of Treasury has been identified as a entity compromised by the SolarWinds’ misstep. Source: US “Treasury, Commerce Depts. Hacked through SolarWinds Compromise” at KrebsonSecurity.com

Update 1, December 16, 2020, at 950 am US Eastern. The SolarWinds’ security misstep may have taken place in 2018. Source: “SolarWinds Leaked FTP Credentials through a Public GitHub Repo “mib-importer” Since 2018” at SaveBreach.com

I talked about security theater in a short interview/conversation with a former CIA professional. The original video of that conversation is here. My use of the term security theater is intended to convey the showmanship that vendors of cyber security software have embraced for the last five years, maybe more. The claims of Dark Web threat intelligence, the efficacy of investigative software with automated data feeds, and Bayesian methods which inoculate a client from bad actors— maybe this is just Madison Avenue gone mad. On the other hand, maybe these products and services don’t work particularly well. Maybe these products and services are anchored in what bad actors did yesterday and are blind to the here and now of dudes and dudettes with clever names?

Evidence of this approach to a spectacular security failure is documented in the estimable Wall Street Journal (hello, Mr. Murdoch) and the former Ziff entity ZDNet. Numerous online publications have reported, commented, and opined about the issue. One outfit with a bit of first hand experience with security challenges (yes, I am thinking about Microsoft) reported “SolarWinds Says Hack Affected 18,000 Customers, Including Two Major Government Agencies.”

One point seems to be sidestepped in the coverage of this “concern.” The corrective measures kicked in after the bad actors had compromised and accessed what may be sensitive data. Just a mere 18,000 customers were affected. Who were these “customers”? The list seems to have been disappeared from the SolarWinds’ Web site and from the Google cache. But Newsweek, an online information service, posted this which may, of course, be horse feathers (sort of like security vendors’ security systems?):

Read more

Google: Simplifying Excellence

October 22, 2020

Almost everyone knows Google. I spotted an eclectic write up in Entertainment Overdose (an estimable publication). The article “Eric Schmidt, Who Got YouTube for a Premium, Assumes Social Media Networks Are Amplifiers for Idiots” contains a quote. This is an alleged statement attributed to Eric Schmidt, the overseer of Google until 2018.

Here’s the alleged pearl of wisdom:

The context of social networks serving as amplifiers for idiots and crazy people is not what we intended.

But it happened with YouTube, right? Who was running the company at this time? I think it was Mr. Schmidt.

It seems that Mr. Schmidt’s social world view is divided into those who are not crazy (possibly Google employees and those who share some Google mental characteristics but are in some way in touch with reality) and those who are crazy. Crazy means mentally deranged, which may be a bad thing. Plus, the “crazy” group uses social media as “amplifiers.” This seems to suggest that anyone using social media falls into the crazy category. Is this correct?

Note the “we”. The royal “we” appears to embrace the senior management of Google.

Now check out the Rupert Murdoch “real” news Wall Street Journal for October 22, 2020. The story to which I direct your attention is called “Google Ex-CEO Hits DOJ As Antitrust Battle Looms.” [When the story is posted to wsj.com, you will have an opportunity to purchase access. Until then, hunt for the dead tree edition and look on Page A-1.]

The write up reports that Mr. Schmidt said:

There’s a difference between dominance and excellence.

Is the idea may be that operating like a plain vanilla monopoly not acceptable. This suggests that monopoly delivering “excellence” is a positive for everyone.

Is YouTube dominant or excellent? Are those who post links to children’s playgrounds to the delight of individuals with proscribed tendencies idiots? (There are other, more suitable terms I believe.)

Read more

Exclusive: Interview with DataWalk’s Chief Analytics Officer Chris Westphal, Who Guides an Analytics Rocket Ship

October 21, 2020

I spoke with Chris Westphal, Chief Analytics Officer for DataWalk about the company’s string of recent contract “wins.” These range from commercial engagements to heavy lifting for the US Department of Justice.

Chris Westphal, founder of Visual Analytics (acquired by Raytheon) brings his one-click approach to advanced analytics.

The firm provides what I have described as an intelware solution. DataWalk ingests data and outputs actionable reports. The company has leap-frogged a number of investigative solutions, including IBM’s Analyst’s Notebook and the much-hyped Palantir Technologies’ Gotham products. This interview took place in a Covid compliant way. In my previous Chris Westphal interviews, we met at intelligence or law enforcement conferences. Now the experience is virtual, but as interesting and information in July 2019. In my most recent interview with Mr. Westphal, I sought to get more information on what’s causing DataWalk to make some competitors take notice of the company and its use of smart software to deliver what customers want: Results, not PowerPoint presentations and promises. We spoke on October 8, 2020.

DataWalk is an advanced analytics tool with several important innovations. On one hand, the company’s information processing system performs IBM i2 Analyst’s Notebook and Palantir Gotham type functions — just with a more sophisticated and intuitive interface. On the other hand, Westphal’s vision for advanced analytics has moved past what he accomplished with his previous venture Visual Analytics. Raytheon bought that company in 2013. Mr. Westphal has turned his attention to DataWalk. The full text of our conversation appears below.

Read more

Palantir Technologies: A Problem for Intelware Competitors?

September 24, 2020

The Palantir Technologies initial public offering is looming. Pundits are excited; for example, “Palantir Has A Long Uphill Battle Towards Customer Acquisition, But Benefits From Stickiness And Contract Expansion” makes clear that the journey to profitability may be like the Beatles observed: A long and winding road. Others are focused on churn; for example, “5 things to Know about Palantir’s Upcoming IPO.” DarkCyber’s response: “Just five?”

The issue is intelware. Many companies have tried to convert selling to law enforcement, intelligence agencies, and regulators into a billion dollar software and services business. There are some success stories; for example, Booz Allen fits the bill. The company sells time. The company has its own software, not much, but it exists. The company cheerleads, which is a nice way to say that for money “experts” will talk about promising products from the competitive marketplace.

Palantir is more like Autonomy than a blue-chip consulting firm. Autonomy played the “secret black box” chip with its neuro-linguistic programming. It worked until it did not. The firm licensed its black box to BAE Systems in the 1990s. The Autonomy marketing machine then generated revenue slowly and steadily. Then Autonomy acquired companies and cranked up its sales machine. At “peak Autonomy,” the well managed outfit Hewlett Packard, grabbed a brass ring with Autonomy engraved on it.  The cost was north of $10 billion and years of legal bills. Autonomy was a publicly traded company, and it had a revenue track record dating from 1996. The HP deal was completed in October 2011. That means that the FY2010 data give us an idea about how much secret black box software can generate with “advanced” software, great marketing, and demanding management. The revenue for Autonomy after 15 years was in the neighborhood of $870 million.

7 graph A

One of Palantir Gotham’s innovations: A right mouse click displays a wheel of choices. The interface is definitely jazzier than that of Analyst’s Notebook, now owned by IBM.

Palantir Technologies opened for business in 2003. The company has been in business for 17 years. Yep, that’s two years longer than Autonomy. And what is Palantir’s alleged revenue for the last fiscal year? $742 million. The company’s advantages were the support of Peter Thiel (a Silicon Valley Thor), secrecy, a method for importing ANB files (if you don’t know what this is, well, what can I tell you in a free blog post?), and okay sales and so-so marketing. (One of Palantir’s innovations was a wheel of choices, not Bayesian methods wrapped in mystery.)

If my math is correct, Autonomy generated $128 million more revenue that Autonomy. If one uses 2011 dollars, not the Rona roiled 2020 dollars, the difference is more like $400 million, give or take $20 million or so. Yep, Autonomy appears to have outperformed Palantir: Less time, more revenue.

What?

Why?

Who?

How?

Let’s take each question.

First, what? The lackluster performance of Palantir Technologies illustrates the difficulty intelware companies, even ones with great advantages like the aforementioned ANB filter, have making really big money quickly. Remember. To generate less revenue than Autonomy, Palantir required $2.6 billion in funding. DarkCyber thinks that patient investors may be nervous about their investment which could melt away like a real snowflake. You can work out the math. Take 17 years of losses, subtract the revenue generated over 17 years, add in some interest just for spice, and mix into a pressurized container containing the fumes of burning a big cash pile. Read more

Dark Patterns: Is the Future of Free Video Editing Software Duplicity, Carelessness, and Indifference?

August 31, 2020

One of the DarkCyber team suggested a run down of three free video editing software solutions. We had just finished a couple of our for-fee write ups about technology related to warfighting, and I concluded that the group wanted a break from million watt beam weapons.

I said, “Okay, just use a machine we don’t rely on for real work.” Stephanie was thrilled when Ben said he would help. The three “free” software solutions these two set about installing were:

DaVinci Resolve, allegedly “the standard for high end post production and finishing on more Hollywood feature films, television shows and commercials than any other software.” You can get a free copy at this link. (There is a $300 version too.)

HitFilm Express, allegedly “a free video editing software with professional-grade VFX tools and everything you need to make awesome content, films or gaming videos.” You can get a free copy at this link.

Shotcut, a free, open source, cross platform video editor. You can get a copy at this link.

We never got to the review. We were trapped in what sure looks like the FXHome / HitFilm Express dark pattern. It was a swamp populated by creatures dependent on auto reply email, bizarre instructions, and names like “Dibs” and “Joe.” So wholesome, yet so frustrating despite the friendly monikers.

This blog post is about dark patterns, not the video editing software. Sorry, Stephanie (the team member who cooked up the idea for the story.) Read on to find out why DarkCyber cares about a single firm and its enthusiastic pursuit of dark patterns.

The illustration below is a depiction of Dante’s Inferno. About eight layers down is the Dark Pattern of FXHome. That’s better than spending every day, all day with Beelzebub and the gang.

What’s a dark pattern?

The phrase means, according to the ever reliable Wikipedia, “A user interface that has been carefully crafted to trick users into doing things, such as buying insurance with their purchase or signing up for recurring bills.”

Stephanie tried to install the software and was greeted with a Web page presenting her with options to upgrade the free software by purchasing $25 to $50 dollar bundles of macros and pre-sets. Puzzled, she retrieved the details for the accounts we use to purchase software, pay for subscriptions, and buy crap from Amazon.

I ignored her grumbling, but I noticed when two of my engineers were standing behind her staring at the screen and getting that weird look in their eyes when something does not compute. I walked over to the group and said, “When will you finish your reviews of these three tools?”

Stephanie said, “I am running behind. I spent yesterday and today trying to get the software to work. Apparently someone installed a version of HitFilm Express last year, and now FXHome took the money, sent a series of steps, and nothing works.”

I said, “Okay, write the company. Explain what happened and get help to install the software.”

My two engineers nodded and walked away. This, in my experience, meant that the HitFilm Express software was something that presented numerous challenges. Researching and analyzing EMP technology was more appealing than not-so-free software.

I told Stephanie to give me the user name and password she used to buy the software. I happily logged in from a different machine, created a user name and password, saw the same difficult to evade plea to buy add-in packs, and I bought a $39 pack. The video editor came up but no add in software.

Now I was intrigued. Two installations. Almost $80US down a rat hole and no special add in packs. I told my engineers to log in, get the install information, and see if each could get the software to work.

Nope. FXHome has a system to take money. FXHome does not have a functional, reliable system to deliver what the customer purchased.

Now I am thinking cyber fraud. Call me silly, but I am a suspicious person, and when we write about next generation weapons, what type of customers do we have? Certainly not the Vatican or Green Peace.

I found a customer support email which is managed by “smart” software. The email to which I was directed is support@fxhome.com and along the line of a series of email exchanges over the span of nine days a human included his/her name. That individual identified himself/herself as Dibs McCallum.

The dark patterns we believe the user interface implements for the free software includes these elements:

  1. Blandishments to purchase upgrades before allowing downloads
  2. Instructions for installing software which do not install software
  3. Customer service interfaces intended to frustrate those seeking information; for example, the FXHome system strips attachments even though people or bots like Dibs McCallum request them and your truly attaches them. Even more dutifully I resend the attachments and receive zero acknowledgement or information about the failure.

Where am I? Well, definitely there is no review of FXHome. It is tough to write about software which does not function. The upside is that I have an anecdote for my next cyber crime lecture. As we were editing this story, PayPal reported a refund of $39. FXHome still has $39 and we have no functioning software.

When I step back and look at this series of events involving three of my team and the ever helpful Dibs McCallum, who insisted that attachments showing the unhelpful error messages HitFilm Express displayed, did not arrive.

Then there was this email:

Allow me to explain. You buy from us. If you want a refund within 14 days you get one.
That is why I have refunded both your order 0000000000000 for $39 that you made by credit card under the email seaky2000@yahoo.com and also your order 0000000000000 for $39 that you made via PayPal that you made under the email 00@arnoldit.com. Both amounts will appear in your prospective credit card and PayPal statements within the next 5-10 working days. Though most likely far sooner. This does mean your software packs will no longer work of course. Those effects will be deactivated and you are left with the free HitFilm Express without the extra content. It is always best to remember what email you use for purchases as it can be confusing if you habitually use more than one email. We are always dealing with this confusion with customers. Very common.
Best Regards, Joe Gould, Business Coordinator

Notice the phrase “We are always dealing with this confusion”.

Yeah, Joe said, “Always.” What’s that old saw about doing the same thing over and over? Was it ground hog day or one of Dante’s circles of Hell?

The dark pattern is apparently accidental. A situation exists which creates an “always” situation. Why not figure out changes to the system to eliminate an “always” problem. Why not think through making the interface work with a customer, not against the customer. Why not skip the “buy more add in packs”? Just charge people money.

What’s free mean? Upsells, confusing purchase options, and a “system” designed to make the craziness of Microsoft customer support for non-installable $0.99 HEVC codecs look like a paragon of lucidity.

One answer is that it earned this write up in Beyond Search and DarkCyber. It has converted sweet Stephanie into a termagant and HitFilm Express hater. (Good work that.)

Observations:

  • Generating sustainable revenue is difficult. If a product is “good,” people will pay for it. If a product is not so good, carelessness, indifference, or laziness generates “buy this, then that” solutions. Helpful? Not so much. Suggestion for FXHome: Less weird orange color and more begging for dollar options like Indiegogo or Patreon, among others?
  • Competing against Adobe, Apple, Magix, and other for-fee video editing programs is difficult. Yes, DarkCyber understands that FXHome needs revenue. Suggestion: Why not sell a subscription to upgrades?
  • Relying on an interface and the people who conceived it may not be a winning tactic. Staff changes and additional inputs may provide the creative spark that moves beyond what sure look like dark patterns. Suggestion: Skip the hear, speak, and see no evil approach to your current upgrade interface. Listen and fix the problem. “Always”. Wow, that’s an endorsement of clear thinking.

Is DarkCyber suspicious? Yep. FXHome could be a YouTube video titled UXMoan.

Stephen E Arnold, August 31, 2020

Google: Human Data Generators

July 29, 2020

DarkCyber spotted this interesting article, which may or may not be true. But it is fascinating. The story is “Google Working on Smart Tattoos That Turn Skin into Living Touchpad.” The write up states:

Google is working on smart tattoos that, when applied to skin, will transform the human body into a living touchpad via embedded sensors. Part of Google Research, the wearable project is called “SkinMarks” that uses rub-on tattoos. The project is an effort to create the next generation of wearable technology devices…

DarkCyber believes that the research project makes it clear that Google is indeed intent collecting personal data. Where will the tattoo be applied? Forehead in Central America street gang fashion?

image

Russian prisoner style with appropriate Google iconography?

image

A tasteful tramp stamp approach?

image

The possibilities are plentiful if the report is accurate.

Stephen E Arnold, July 29, 2020

Digital Fire hoses: Destructive and Must Be Controlled by Gatekeepers

July 16, 2020

Let’s see how many individualistic thinkers I have offended with my headline. I apologize, but I am thinking about the blast of stories about the most recent Twitter “glitch”: “Apple, Biden, Musk and Other High-Profile Twitter Accounts Hacked in Crypto Scam.”

Are you among the individuals whom I am offending in this essay?

First, we have the individuals who did not believe my observations made in my ASIS Eagleton Lecture 40 years ago. Flows of digital information are destructive. The flows erode structures like societal norms, logical constructs, and organizational systems. Yep, these are things. Unfettered flows of information cut them down, efficiently and steadily. In some cases, the datum can set up something like this:

image

Those nuclear reactions are energetic in some cases.

Second, individuals who want to do any darn thing they want. These individuals form a cohort—either real or virtual—and have at it. I have characterized this behavior in my metaphor of the high school science club. The idea is that anyone “smart” thinks that his or her approach to a problem is an intelligent one. Sufficiently intelligent individuals will recognize the wisdom of the idea and jump aboard. High school science clubs can be a useful metaphor for understanding the cute and orthogonal behavior of some high technology firms. It also describes the behavior of a group of high school students who use social media to poke fun or “frame” a target. Some nation states direct their energies at buttons which will ignite social unrest or create confusion. Thus, successful small science clubs can grow larger and be governed — if that’s the right word — by high school science club management methods. That’s why students at MIT put weird objects on buildings or perform cool pranks. Really cool, right?

Third, individuals who do not want gatekeepers. I use the phrase “adulting” to refer to individuals able to act in an informed, responsible, and ethical manner when deciding what content becomes widely available and what does not. I used to work for an outfit which published newspapers, ran TV stations, and built commercial databases. The company at that time had the “adulting” approach well in hand. Individuals who decry informed human controls. It is time to put thumbs in digital dikes.

Read more

Search: Contentious and Increasingly Horrible

May 25, 2020

I dropped enterprise search, commercial search, and vertical search to the bottom of my “Favorite Topics” list years ago.

Why?

The individuals popping up and off at conferences were disconnected from the realities of looking for information under stressful circumstances.

image

Hey, big rocks, how did you move from that quarry kilometers away and get yourselves smoothed down? Just like modern online search systems, you won’t get an answer. Finding information relevant to a query is as difficult as getting megalithic stones to become Chatty Kathies.

The thumb typing crowd, some are now in their mid forties, ASSUME that search has to think for the stupid user.

The techniques range from smart software which skews results in what are to an experienced researcher stupid ways. For those search experts concerned with making their information or their name appear number one on a results list, good search was anything that produced a top spot in a result list even if that result was stupid, irrelevant, or shameless ego jockeying. Then there are the chipper, super confident experts who emerged from an educational system which awarded those who showed up and sort of behaved a blue ribbon. Yep, everything that group does is just wonderful. Yeah, right.

You can see the consequences of two forces colliding when you read Science Magazine’s “They Redesigned PubMed, a Beloved Website. It Hasn’t Gone Over Well.”

You can work through the examples in the source article. The pain points range from appearance to search functionality.

Why did this happen?

The change is a result of people who do not have the experience of performing search under stressful conditions. No, I don’t mean locating the Cuba Libre restaurant in Washington, DC, on a Google Map. I mean looking up technical information to complete a lab test, perform a diagnosis, locate a procedure, or some similar action. There is a pandemic going on, isn’t there?

The complaints indicate that the “new” PubMed is not perceived as a home run.

Go read the original.

I want to offer several observations:

  1. Those who do research with intent need predictability; that is, when a Boolean query is entered, the results should reflect that logic. Modern systems think Boolean is stupid. There you go, a value judgment from those with “Also Participated” ribbons in high school.
  2. Interfaces should allow the user to select an approach. There are some users who like a blinking dot or a question mark. Enter the commands and get a text output. Others like the Endeca style training wheels, although I doubt if any of the modern “helper” interfaces know what Endeca offered. Other may want some other type of interface like a PhD approach; that is, push here, dummy. The point is: Why not allow the user to select the interface?
  3. Change is introduced for dark purposes. Catalina has many points of friction so that Apple can extend its span of control. Annoying? Sure is. Why doesn’t Apple tell the truth about these friction points? What? Tell the truth, are you crazy. Apple, like Facebook and Google, are doing what they can to protect their hegemony, and the user is the victim. Tough. The same logic applies to PubMed. Dollars to donuts there is a “reason” for the change, and it may be due to whimsy, money, or the need to demonstrate the team is actually doing something instead of just having meetings with contractors.

Net net: Search, as I wrote for Barbara Quint in the now departed magazine Searcher, search is dead. Each day the hope for a better, more appropriate way to locate online information becomes lost in the mists of time. Getting relevant information from PubMed or any modern systems is like trying to get the stone of Ollantaytambo to explain how the rocks moved eons ago.

Finding information today is more difficult than at any other time in my professional career. That’s a big problem.

Stephen E Arnold, May 24, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta