Deepset: Following the Trail of DR LINK, Fast Search and Transfer, and Other Intrepid Enterprise Search Vendors

April 29, 2022

I noted a Yahooooo! news story called “Deepset Raises $14M to Help Companies Build NLP Apps.” To me the headline could mean:

Customization is our business and services revenue our monetization model

Precursor enterprise search vendors tried to get gullible prospects to believe a company could install software and employees could locate the information needed to answer a business question. STAIRS III, Personal Library Software / SMART, and the outfit with forward truncation (InQuire) among others were there to deliver.

Then reality happened. Autonomy and Verity upped the ante with assorted claims. The Golden Age of Enterprise Search was poking its rosy fingers through the cloud of darkness related to finding an answer.

Quite a ride: The buzzwords sawed through the doubt and outfits like Delphis, Entopia, Inference, and many others embraced variations on the smart software theme. Excursions into asking the system a question to get an answer gained steam. Remember the hand crafted AskJeeves or the mind boggling DR LINK; that was, document retrieval via linguistic knowledge.

Today there are many choices for enterprise search: Free Elastic, Algolia, Funnelback now the delightfully named Squiz, Fabasoft Mindbreeze, and, of course, many, many more.

Now we have Deepset, “the startup behind the open source NLP framework Haystack, not to be confused with Matt Dunie’s memorable “haystack with needles” metaphor, the intelware company Haystack, or a basic piles of dead grass.

The article states:

CEO Milos Rusic co-founded Deepset with Malte Pietsch and Timo Möller in 2018. Pietsch and Möller — who have data science backgrounds — came from Plista, an adtech startup, where they worked on products including an AI-powered ad creation tool. Haystack lets developers build pipelines for NLP use cases. Originally created for search applications, the framework can power engines that answer specific questions (e.g., “Why are startups moving to Berlin?”) or sift through documents. Haystack can also field “knowledge-based” searches that look for granular information on websites with a lot of data or internal wikis.

What strikes me? Three things:

  1. This is essentially a consulting and services approach
  2. Enterprise becomes apps for a situation, department, or specific need
  3. The buzzwords are interesting: NLP, semantic search, BERT,  and humor.

Humor is a necessary quality which trying to make decades old technology work for distributed, heterogeneous data, email on a sales professionals mobile, videos, audio recordings, images, engineering diagrams along with the nifty datasets for the gizmos in the illustration, etc.

A question: Is $14 million enough?

Crickets.

Stephen E Arnold, April 29, 2022

NCC April TikTok: Yeah, Not Good for Teenies

April 29, 2022

We wonder whether China will more aggressively exploit TikTok’s ability to influence. The New York Post describes “How TikTok Has Become a Dangerous Breeding Ground for Mental Disorders.” Apparently, tiktoks discussing mental health conditions are trending, especially among teen girls. This would be a good thing—if they were all produced by medical experts, contained good information, and offered guidance for seeking professional help when warranted. Instead influencers, many of whom are teenagers themselves, purport to help others self-diagnose their mental conditions. As one might imagine, this rarely goes well. Writer Riki Schlott tells us:

“After nearly two years of lockdowns and school closures, lonely teens are spending more time online, and many inevitably come across mental health content on TikTok. When they do, the platform’s algorithm kicks in, serving suggestible young girls even more videos on the topic. While mental health awareness is surely a good thing, well-meaning influencers are inadvertently harming young, impressionable viewers, many of whom seem to be incorrectly self-diagnosing with disorders or suddenly manifesting symptoms because they are now aware of them.”

The author continues, expanding her warning to include social media in general:

“Eating disorders have also been shown to spread within friend groups. As a member of Gen Z, I’ve watched firsthand what social media has done to a generation of young women — it even left behind self-harm scars on many of my peers’ wrists. I know a terrifying number of peers who have self harmed, many of whom were habitual social media users. Rates of depression have doubled among teen girls between 2009 and 2019, and self-harm hospital admissions have soared 100 percent for girls aged 10 to 14 during the rise of social media between 2010 and 2014, the most recently available data.”

Clearly a solution is needed, but Schlott knows where we cannot turn—politicians are too “clueless” to craft effective regulations and the platforms are too greedy to do anything about it. Instead it falls to parents to take responsibility for their teens’ media consumption, as difficult as that may be. Citing psychology professor and author on the subject Dr. Jean Twenge, the write-up advises a few precautions. First parents must recognize that, unlike playing age-appropriate games or texting friends on their devices, social media is completely inappropriate for children, tweens, and young teens. The platforms themselves officially limit accounts to those 13 and older, but Twenge suggests holding off until a child is 16 if possible. She also proposes a household rule whereby everyone, including parents, stops using electronic devices an hour before bedtime and leaves their phones outside their bedrooms at night. Yes, parents too—after all, leading by example is often the only way to convince teens to comply.

Cynthia Murrell, April 29, 2022

NCC April Microsoft: Customer and User Focused?

April 29, 2022

Bill Gates designed Microsoft to make personal computers more user friendly. While the Microsoft operating system is among the easiest to learn, unfortunately it is also the most hackable. Black hat bad actors adore Microsoft systems, especially when the company releases a new update. Bleeping Computer shares a problem with the newest Windows update: “Microsoft: Windows Domain Controller Restarts Caused By LSASS Crashes.”

The bug occurred in the Local Security Authority Subsystem Service (LSASS). The LSASS crashed, users lost access to their Windows accounts, shown an error message, then the system rebooted. The LSASS crash bug was one of many issues that a Microsoft patch fixed in January 2022:

“Microsoft addressed the LSASS crash issue in out-of-band updates released in mid-January 17 [1, 2] to fix numerous other critical bugs introduced during the January 2022 Patch Tuesday, including Hyper-V no longer starting, L2TP VPN connections failing, and ReFS volumes becoming inaccessible.”

Bad actors discover coding errors in Microsoft systems then exploit them. The bad actors detect many vulnerabilities during updates, then they quickly devise plans to take advantage of users. Threat Post explains a new hacker trick in, “Microsoft Accounts Targeted By Russian-Themed Credential Harvesting.” Russia has threatened cyber attacks with their current war plan, so it did not take long for bad actors to create spam campaigns. The spam email reads:

“Unusual sign-in activity

We detected something unusual about a recent sign-in to the Microsoft account

Sign-in details

Country/region: Russia/Moscow

IP address:

Date: Sat, 26 Feb 2022 02:31:23 +0100

Platform: Kali Linux

Browser: Firefox

A user from Russia/Moscow just logged into your account from a new device, If this wasn’t you, please report the user. If this was you, we’ll trust similar activity in the future.

Report the user

Thanks,

The Microsoft account team”

As with other spam, users are encouraged to click on a link and submit a response. If users respond to the link, they will most likely receive an email asking for login details and payment information.

My thought was that Windows Defender and other Microsoft security services would handle these types of issues. Guess not.

Whitney Grace, April 29, 2022

Kyndi: Advanced Search Technology with Quanton Methods. Yes, Quonton

April 29, 2022

One of my newsfeeds spit out this story: “Kyndi Unveils the Kyndi Natural Language Search Solution – Enables Enterprises to Discover and Deliver the Most Relevant and Precise Contextual Business Information at Unprecedented Speed.” The Kyndi founders appear to be business oriented, not engineering focused. The use of jargon like natural language understanding, contextual information, artificial intelligence, software robots, explainable artificial intelligence, and others is now almost automatic as if generated by smart software, not people who have struggled to make content processing and information retrieval work for users.

The firm’s Web site does not provide much detail about the technical pl8umbing for the company’s search and retrieval system. I took a quick look at the firm’s patents and noted these. I have added bold face to highlight some of  the interesting words in these documents.

  • A method using Birkhoff polytopes and Landau numbers. See US11205135 “Quanton [sic] Representation for Emulating Quantum-lie Computation on Classical Processors,”  granted December 21, 2021. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.
  • A method employing combinatorial hyper maps. See US10985775 “System and Method of Combinatorial Hypermap Based Data Representations and Operations,” Granted April 20, 2021. Inventor: Arun Majumdar, possibly in Alexandria, Virginia. (As a point of interest the document Includes the word bijectively.)
  • A method making use of Q-Medoids and Q-Hashing. See US10747740 “Cognitive Memory Graph Indexing, Storage and Retrieval,” granted August 18, 2020. Inventor: Arun Majumdar, possibly in San Mateo, California.
  • A method using Semantic Boundary Indices and a variant of the VivoMind* Analogy Engine. See US10387784 “Technical and Semantic Signal Processing in Large, Unstructured Data Fields,” granted August 20, 2019. Inventor: Arun Majumdar, possibly in Alexandria, Virginia. *VivoMind was a company started my Arun Majumdar prior to his relationship with Kyndi.
  • A method using rvachev functions and  transfinite interpolations. See US10372724 “Relativistic Concept Measuring System for Data Clustering,” granted August 6, 2019. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.
  • A method using Clifford algebra. See US10120933 “Weighted Subsymbolic Data Encoding,” granted November 6, 2018. Inventor: Arun Majumdar, possibly in Alexandria, Virginia.

The inventor is not listed on the firm’s Web site. Mr. Majumdar’s contributions are significant. The chief technology officer is Dan Gartung, who is a programmer and entrepreneur. However, there does not seem to be an observable link among the founders, the current CTO, and Mr. Majumdar.

The company will have to work hard to capture mindshare from companies like Algolia (now working to reinvent enterprise search), Mindbreeze, Yext, and X1 (morphing into an eDiscovery system it seems), among others. Kyndi has absorbed more than  $20 million plus in venture funding, but a competitor like Lucidworks has captured in the neighborhood of $200 million.

It is worth noting that one facet of the firm’s marketing is to hire the whiz kids from a couple of mid tier consulting firms to explain the firm’s approach to search. It might be a good idea for the analysts from these firms to read the Kyndi patents and determine how the Vivomind methods have been updated and applied to the Kyndi product. A bit of benchmarking might be helpful. For example, my team uses a collection of Google patents and indexes them, runs tests queries, and analyzes the result sets. Almost incomprehensible specialist terminology is one thing, but solid, methodical analysis of a system’s real life performance is another. Precision and recall scores remain helpful, particularly for certain content; for example, pharma research, engineered materials, and nuclear physics.

Stephen E Arnold, April 29, 2022

Disinformation: Live and Obvious in the Windows 11 Crazy Train

April 28, 2022

I noted that a number of OSINT experts sidestepped the issues of misinformation (making stuff up), disinformation (data which nuke other information), and reformation (moving the data walnut shells like a walnut shell wizard). The experts offered comments at a recent conference I attended, and I was fascinated by the avoidance of what seems to be as a showstopper for analysts.

Let me give you an example unrelated to the professional OSINT lecturers.

The first is the story in Ars Technica. The headline is “Businesses Are Adopting Windows 11 More Quickly Than Past Versions, Says Microsoft.” Straightforward and actual factual.

Now consider “Windows 10 Still Growing, But Win 11 Had Another Bad Month, Says AdDuplex.” This appears to report data slightly off course with the Ars Technica write up.

Okay, are both sort of true? Is one statement more accurate than another? Maybe one or both are baloney?

The problem is that in order to figure out which is disinformation, one has to do quite a bit of work.

Now imagine that a really smart machine learning system ingests the content and shoves it into a whiz bang smart software system. The smart software will do what? Identify the rightness or wrongness of each set of factoids? Will the smart software go with a simple voting method and the most likely rightness will emerge from the murky plumbing of the smart software? Will the system punt as some digitally learned systems do?

The answer is that manipulation of information can generate outputs that may be disconnected from what is shaking in the real world.

Is this a problem? Yep. Is there a fix? Nope. Are there downstream consequences? Does a calculating predator exist in the technology theme park?

Stephen E Arnold, April 28, 2022

How Apps Use Your Data: Just a Half Effort

April 28, 2022

I read an quite enthusiastic article called “Google Forces Developers to Provide Details on How Apps Use Your Data.” The main idea is virtue signaling with one of those flashing airport beacons. These can be seen through certain types of “info fog,” just not today’s info fog. The digital climate has a number of characteristics. One is obfuscation.

The write up states:

… the Data safety feature is now on the Google Play Store and aims to bolster security by providing users details on how an app is using their information. Developers are required to complete this section for their apps by July 20, and will need to provide updates if they change their data handling practices, too. 

That sounds encouraging. Google’s been at the data harvesting combine controls for more than two decades. Now app developers have to provide information about their use of an app user’s data and presumably flip on the yellow fog lights for what the folks who have access to those data via an API or a bulk transfer are doing. Amusing thought forced regulation after 240 months on the info highway.

However, what app users do with data is half of the story, maybe less. The interesting question to me is, “What does Google do with those data?”

The Data Safety initiative does not focus on the Google. Data Safety shifts the attention to app developers, presumably some of whom have crafty ideas. My interest is Google’s own data surfing; for example, ad diffusion, and my fave Snorkelization and synthetic “close enough for horseshoes” data. Real data may be to “real” for some purposes.

After a couple of decades, Google is taking steps toward a data destination. I just don’t know where that journey is taking people.

Stephen E Arnold, April 28, 2022

NCC April Vendor Contracts: How to Be Slick and Lose Customer Trust

April 28, 2022

I read “Build Vs. Buy: Vendor Contract Shenanigans.” The write up is an excellent reminder of the character traits of MBAs and lawyers; that is, you lose if we provide you with a contract you sign without understanding. The article contains a number of examples of legal behavior which might strike some people as fraud. Oh, well, that is a signed contract, and your firm must comply. I love it when the lawyer tells a contracting officer, “Hey, we are sorry. These are standard terms.” Yep, standard for whom?

Let me highlight three of the methods used to inflict maximum gain for the vendor and delivering discomfort to the customer. Please, consult the original write up for the fourth item on the list.

First, the vendor (in this case, the Google) specifies that when the guaranteed level of service fails, the customer must get everyone in the chain to notify one another that the Googley service did not deliver. A failure to complete this notification within 30 days means you forfeit a “service credit.” (I don’t know what a service credit means, but I don’t think it means cash money.)

Second, the vendor collects the money before service begins. If you don’t use what you bought, there is no refund.

Third, sign our deal and our company will use your logo forever.

The MBAs and lawyers involved in deals with these types of clauses have an ideal rationalization: We are just doing our jobs.

Yes, these individuals are. Just following orders. Where have I heard that before?

Stephen E Arnold, April 28, 2022

NCC April Users Might Accept Corrections to Fake News, if Facebook Could be Bothered

April 28, 2022

Facebook (aka Meta) has had a bumpy road of late, but perhaps a hypothetical tweak to the news feed could provide a path forward for the Zuckbook. We learn from Newswise that a study recently published in the Journal of Politics suggests that “Corrections on Facebook News Feed Reduces Misinformation.” The paper was co-authored by George Washington University’s Ethan Porter and Ohio State University’s Thomas J. Wood and funded in part by civil society non-profit Avaaz. It contradicts previous research that suggested such an approach could backfire. The article from George Washington University explains:

“Social media users were tested on their accuracy in recognizing misinformation through exposure to corrections on a simulated news feed that was made to look like Facebook’s news feed. However, just like in the real world, people in the experiment were free to ignore the information in the feed that corrected false stories also posted on the news feed. Even when given the freedom to choose what to read in the experiment, users’ accuracy improved when fact-checks were included with false stories. The study’s findings contradict previous research that suggests displaying corrections on social media was ineffective or could even backfire by increasing inaccuracy. Instead, even when users are not compelled to read fact-checks in a simulation of Facebook’s news feed, the new study found they nonetheless became more factually accurate despite exposure to misinformation. This finding was consistent for both liberal and conservative users with only some variation depending on the topic of the misinformation.”

Alongside a control group of subjects who viewed a simulated Facebook feed with no corrections, researchers ran two variants of the experiment. In the first, they placed corrections above the original false stories (all of which had appeared on the real Facebook at some point). In the second, the fake news was blurred out beneath the corrections. Subjects in both versions were asked to judge the stories’ veracity on a scale of 1 – 5. See the write-up for more on the study’s methodology. One caveat—researchers acknowledge potential influences from friends, family, and other connections were outside the scope of the study.

If Facebook adopted a similar procedure on its actual news feed, perhaps it could curb the spread of fake news. But does it really want to? We suppose it must weigh its priorities—reputation and legislative hassles vs profits. Hmm.

Cynthia Murrell, April 28, 2022

NCC April A Golden Oldie: YouTube Will Do Its Bestest

April 28, 2022

As tech companies receive continued pressure to contain misinformation on their platforms, MakeUseOf ponders, “Is YouTube Doing Enough to Tackle Misinformation?” The short answer—no. After all, removing content means removing ad revenue. Writer Aya Masango observes:

“Although YouTube has been working to tackle misinformation, the company realizes the importance of evolving to ensure that it stays ahead of those measures and that it continues to remain effective in that pursuit. And although that is the case, YouTube is still facing some challenges in tackling misinformation. In a YouTube blog post, the company’s Chief Product Officer, Neal Mohan, admitted that the platform is still struggling with thwarting misinformation before it goes viral, addressing cross-platform sharing of misinformation, and advancing misinformation efforts on a global scale. As noted by Mohan, ‘… As misinformation narratives emerge faster and spread more widely than ever, our approach needs to evolve to keep pace.’ This shows that YouTube is aware that it still has a long way to go in its efforts to tackle the spread of misinformation on its platform.”

Since Mohan is so interested in doing the right thing, Masango offers three suggestions for him and his company: First she advises partnering with independent fact checkers, pointing to an informative open letter from The International Fact-Checking Network. The company should also set up native teams in foreign lands, where YouTube’s misinformation management is especially weak, and bring local expertise to bear. Finally, the write-up calls for banning channels that persist in peddling misinformation. Since that would mean fewer adds sold, however, we suspect the company considers that obvious measure a last resort.

Cynthia Murrell, April 28, 2022

Who Reads Dumped Once Confidential Documents?

April 27, 2022

I read “They’ve Leaked Terabytes of Russian Emails, But Who’s Reading?” The write up strikes me as a paean for open information flow in Russia and perhaps other nation states. There is a “way to go” for the distributed Denial of Secrets crowd.

I noted this passage in the original article:

In the “Russia” category, the leaks now include a huge cross-section of Russian society, including banks, oil and gas companies, and the Russian Orthodox Church. Relative to some of the other leaked content sourced by DDoSecrets, the Blagoveshchensk emails represent only a mid-sized leak. The smallest data set (a list of the personal details for 120,000 Russian soldiers in Ukraine) is a mere 22MB while the largest (20 years of emails from a Russian state-owned broadcaster) is a whopping 786GB.

Then there is the implicit question, “Who sees this stuff?”

May I offer a few possibilities?

  1. Individuals at NATO
  2. Nation states involved in the Five Eyes
  3. Intelligence analysts within the European Union
  4. Big data mavens looking for content with which to train smart software
  5. Curious individuals with access to translate dot google dot com.

There may be others, but the straw man question? Either hand waving or stumbling.

Stephen E Arnold, April 27, 2022

Next Page »

  • Archives

  • Recent Posts

  • Meta