Enterprise Search: Blasting Away at Feet, Walls, and Partners

January 18, 2021

I read a very good write up called “Is Elasticsearch No Longer Open Source Software?” The write up contains a helpful summary of the history of Elastic and its Lucene-based search solution. Plus the inhospitable territory of open source licensing gets a review as well. To boil down the write up does not do it justice, so navigate to the source document and read it first hand.

I noted a couple of passages which I found suggestive.

First, here’s a comment which strikes me as relevant to the Bezos bulldozer’s approach to low or no cost, high utility software:

if you want to provide Elasticsearch on a SaaS basis, you have to release any code that you use to do this: in Amazon’s case this could mean all the management layers that go into providing Elasticsearch on Amazon Web Services (AWS), so I doubt this is going to happen.

My view is that Elastic and its management team want to put some sand in the bulldozer’s diesel fuel. The question is, “WWAD” or “What will Amazon do?” Some of the options available to Amazon are likely to be interesting. The specific series of actions Amazon pursues will be particularly thrilling.

Second, another passage I circled was:

Smaller SaaS providers without Amazon’s resources will have to decide whether to do a deal with Elastic or Amazon to continue to offer a hosted Elasticsearch.

Based on my limited understanding of the legal hoo-hah with open source legal nuances, I think a customer will have to make a choice. Ride the bulldozer or go with the Son of Compass search. (Yep, that would be Elastic.)

For me, my meanderings through open source and enterprise search sparked these thoughts:

  1. In a competitive arena, open source will become closed. Too much money is at stake for the “leaders”
  2. Open source provides a low cost, low friction way to add functionality or enable an open source “play.” Once up and running, the company using open source wants to make sure the costs of R&D, bug fixes, and other enhancements are “free”; that is, not an expense to the company using open source software.
  3. Forks or code released to open source are competitive moves motivated by financial and marketing considerations.

Open source, open code, open anything: Sounds too good to be true. For some situations, enterprise search’s DNA will surface and the costs can be tricky enough to make an accountant experience heart burn. And the lawyers? Those folks send invoices. The users? Search is a utility. The companies appropriating and making their solution proprietary? Mostly happy campers. And the open source “developers”? Yikes.

Stephen E Arnold, January 18, 2021

DarkCyber for January 12, 2021, Now Available

January 12, 2021

DarkCyber is a twice-a-month video news program about online, the Dark Web, and cyber crime. You can view the video on Beyond Search or at this YouTube link.

The program for January 12, 2021, includes a featured interview with Mark Massop, DataWalk’s vice president. DataWalk develops investigative software which leapfrogs such solutions as IBM’s i2 Analyst Notebook and Palantir Gotham. In the interview, Mr. Massop explains how DataWalk delivers analytic reports with two or three mouse clicks, federates or brings together information from multiple sources, and slashes training time from months to several days.

Other stories include DarkCyber’s report about the trickles of information about the SolarWinds’ “misstep.” US Federal agencies, large companies, and a wide range of other entities were compromised. DarkCyber points out that Microsoft’s revelation that bad actors were able to view the company’s source code underscores the ineffectiveness of existing cyber security solutions.

DarkCyber highlights remarkable advances in smart software’s ability to create highly accurate images from poor imagery. The focus of DarkCyber’s report is not on what AI can do to create faked images. DarkCyber provides information about how and where to determine if a fake image is indeed “real.”

The final story makes clear that flying drones can be an expensive hobby. One audacious drone pilot flew in restricted air zones in Philadelphia and posted the exploits on a social media platform. And the cost of this illegal activity. Not too much. Just $182,000. The good news is that the individual appears to have avoided one of the comfortable prisons available to authorities.

One quick point: DarkCyber accepts zero advertising and no sponsored content. Some have tried, but begging for dollars and getting involved in the questionable business of sponsored content is not for the DarkCyber team.

Finally, this program begins our third series of shows. We have removed DarkCyber from Vimeo because that company insisted that DarkCyber was a commercial enterprise. Stephen E Arnold retired in 2017, and he is now 77 years old and not too keen to rejoin the GenX and Millennials in endless Zoom meetings and what he calls “blatant MBA craziness.” (At least that’s what he told me.)

Kenny Toth, January 12, 2021

Factoids from Best Paper Awards in Computer Science

January 6, 2021

I noted “Best Paper Awards in Computer Science Since 1996.” The year caught my attention because that was the point in time at which software stagnation gained traction. See “The Great Software Stagnation” for the argument.

The Best Papers tally represents awards issued to the “best papers”. Hats off to the compiler Jeff Huang and his sources and helpers.

I zipped through the listings which contained dozens upon dozens of papers I knew absolutely zero about. I will probably be pushing up daisies before I work through these write ups.

I pulled out several observations which answered questions of interest to me.

First, the data illustrate the long tail thing. Stated another way, the data reveal that if an expert wants to win a prestigious award, it matters which institution issues one’s paycheck:

Second, what are the most prestigious “names” to which one should apply for employment in computer science? Here’s the list of the top 25. The others are interesting but not the Broadway stars of the digital world:

1Microsoft56.4
2University of Washington50.5
3Carnegie Mellon University47.1
4Stanford University43.3
5Massachusetts Institute of Technology40.2
6University of California, Berkeley29.2
7University of Michigan20.6
8University of Illinois at Urbana–Champaign18.5
9Cornell University17.4
10Google16.8
11University of Toronto15.8
12University of Texas at Austin14.5
13IBM13.7
14University of British Columbia12.4
15University of Massachusetts Amherst11.2
16Georgia Institute of Technology10.3
17École Polytechnique Fédérale de Lausanne10.1
18University of Oxford9.6
19University of California, Irvine9.4
20Princeton University9.1
21University of Maryland8.9
22University of California, San Diego8.7
23University of Cambridge8.6
24University of Wisconsin–Madison8
25Yahoo7.9

Note that Microsoft, the once proud Death Star of the North, is number one. For comparison, the Google is number 10. But the delta in average “bests” is an intriguing 39.6 papers. The ever innovative IBM is number 13, and the estimable Yahoo Oath Verizon confection is number 25.

I did not spot a Chinese University. A quick scan of the authors reveals that quite a few Chinese wizards labor in the research vineyards at these research-oriented institutions. Short of manual counting and analysis of names, I decided to to calculate authors by nationality. I think that’s a good task for you, gentle reader.

What about search as a research topic in this pool? I used a couple of online text analysis tools like Writewords, a tool on my system, and the Madeintext service. The counts varied slightly, which is standard operating procedure for counting tools like these. The 10 most frequently used words in the titles of the award winning papers are:

data 63 times
based 56 times
learning 53 times
using 49 times
design 45 times
analysis 38 times
software 36 times
time 36 times
search 35 times
Web 30 times

The surprise is that “search” was, based on my analysis of the counts I used, was the ninth most popular word in the papers’ titles. Who knew? Almost as surprising was “social” ranking a miserable 46th. Search, it seems, remains an area of interest. Now if that interest can be transformed into sustainable revenue and sufficient profit to fund research, bug fixes, and enhancements — life would be better in my opinion.

Stephen E Arnold, January 5, 2020

Bang: Write Like a Stable Hemmingway

November 19, 2020

Do you want to write like Ernest Hemmingway? You can. Navigate to this link. Click on edit and be guided to the promised land of a famous author. You remember Mr. Hemmingway, right? The cats? The drinking? The poster with the word “Endurance” in big type. Big like a despairing fish.

I wrote this passage using the Hemingway app:

The App Is a Fish

Coders in distress. Improve communication. Land the fish. The sun blazed across my insight. The result? Blindness. Will it swim away? Will I remain in the dark like the creatures of the sea.

I clicked the button, a clean, sharp edged button. Here’s my score:

image

Grade 1. Smart software is fine, like a sword thrust through a bull on a hot afternoon in Madrid. Does that hurt, Jake?

Nope.

Stephen E Arnold, November 19, 2020

Gartner Predictions: Fresh from the Patisserie

October 20, 2020

I spotted “Gartner Reveals the Top Strategic Tech Trends for 2021.” The write up is an information croquembouche. Here’s what Wikipedia offers as a typical confection whipped up by trained chefs:

image

This is a croquembouche. A tower of sugar-filled balls, filled with custard. Caramel enlivens the gourmet experience.

What are those delicate balls of goodness? Maybe empty calories or evidence of the wisdom for the saying, “A moment on the lips, a lifetime on the hips?” The write up states without one reference to a poire à la Beaujolaise or tasty teurgoule. I had to content myself with the jargon and buzzword equivalent of pièce montée.

Here are some examples. Please, consult the original article or the menu available directly from Gartner for the complete list:

  • Artificial intelligence engineering, perfect for those who have mastered plain old AI
  • Anywhere operations, the bane of real estate professionals with empty buildings and clients who are missing their lease payments. Just WFH and do “operations” from one’s bedroom.
  • Cybersecurity mesh. I have zero idea what this means, but there will be reports, speeches at WFH conferences, and maybe a podcast or two from the merry band of brownie makers.
  • The IoB or Internet of Behaviors. Yep, that’s where the Rona makes its entrance. Remarkable.

To wrap up, what’s in a croquembouche, a cream puff tower. For starters one needs:

  • 30 eggs (raised by a mid tier farmer in New Jersey)
  • 4 sticks of butter (from cows who produce milk while consultants’ sales pitches are played in the barn)
  • 5 cups of sugar. So far no government health warnings are required.

Perfect those cream puff towers of knowledge and deep thoughts. Who wants seconds?

Stephen E Arnold, October 20, 2020

Security in the Cradle of High-Technology Yip Yap

June 30, 2020

DarkCyber spotted this story:

How Hackers Extorted $1.14m from University of California, San Francisco

One would think that UCSF, an educational institution with tech savvy professionals located in the cradle of the US high-technology industry would have effective security systems in place. Wouldn’t one?

The write up reports:

The Netwalker criminal gang attacked University of California San Francisco (UCSF) on 1 June. IT staff unplugged computers in a race to stop the malware spreading. And an anonymous tip-off enabled BBC News to follow the ransom negotiations in a live chat on the dark web.

The article is one of those “how to be a bad actor” write ups which DarkCyber often finds discomfiting. Do these “real” news people want to provide information, or is there an inner desire to step outside the chummy walls of reporting? DarkCyber does not know.

The BBC points out:

Most ransomware attacks begin with a booby-trapped emaiI and research suggests criminal gangs are increasingly using tools that can gain access to systems via a single download. In the first week of this month alone, Proofpoint’s cyber-security analysts say they saw more than one million emails with using a variety of phishing lures, including fake Covid-19 test results, sent to organizations in the US, France, Germany, Greece, and Italy.

DarkCyber has a few questions; to wit:

  1. What vendors’ products are safe guarding UCSF?
  2. Who is in charge of anti phishing solutions at UCSF?
  3. What specific gaps exist at UCSF?
  4. What is the total amount of money UCSF spends on cyber security?
  5. How much “value” has been lost due to direct payment and down time, staff time, and running around not knowing what’s going on time?
  6. How about some quotes from the cyber security providers’ marketing material regarding the systems’ anti-phishing effectiveness?

Skip the how to, please. Focus on the facts that create the vulnerability. Just a thought.

Stephen E Arnold, June 30, 2020

Turkey Day: Forgetting a Murderer?

November 28, 2019

Who knows if one can forget a murder or a murderer? If the information is not available, then the murderer may not be a murderer. The logic seems a bit hippy dippy, almost millennial, but it is turkey day with time to ponder “German Ex-Con Wins Right to Have Any Murders He May Have Committed Forgotten” reports:

Although the case stretches back to the early Eighties, the issue really emerged when German magazine Der Spiegel published some archive articles about the case in 1999. In 2002, Gunther The Ripper was released from jail, and in 2009 became aware that the articles were floating about. Gunther argued that the news articles were inhibiting his “ability to develop his personality,” and went to federal court.

If a murder were committed and the victim a child, will the parents forget? What if this story is accurate and the murderer wants to work coaching a youth football team, would the alleged murderer forget he may have killed before?

Ah, forget it.

Stephen E Arnold, November 28, 2019

Free Music Samples

August 27, 2019

Short honk: Looking for free music samples? A collection of samples is available on “Free Sound Samples.” Queries via search engines for samples produces some wonky results. Worth noting.

Stephen E Arnold, August 27, 2018

Grover and Real Fake News

August 27, 2019

The Next Web reported, “This Terrifying AI Generates Fake Articles from Any News Site.” Now, the point here is to create an AI that can easily detect fake news, but researchers at the Allen Institute for Artificial Intelligence began with one that could generate such content. Basically, it takes one to know one. We learn:

“A team of researchers at the institute recently developed Grover, a neural network capable of generating fake news articles in the style of actual human journalists. In essence, the group is fighting fire with fire because the better Grover gets at generating fakes, the better it’ll be at detecting them. … Most fake news is generated by humans and then spread on social media. But the rise of robust systems such as OpenAI’s controversial GPT-2 point toward a future where AI-generated articles are close enough to the real thing to obfuscate nearly any issue. While it’s easy enough to search a website to see if an article is legitimate, not everyone is going to do that. And if an article goes viral, no matter how false it is, some people will be convinced.”

Writer Tristan Greene shares some passages Grover wrote, so see the article if you wish to read those. They are pretty convincing, especially if one just skims the text (as many readers do).. One example aptly mimics President Obama’s writing/ speaking style, while another seems to spook Greene with how well it captures his own writing essence. The article concludes with this link, where each of us can take Grover for a test drive. Modern life is fun.

Cynthia Murrell, August 27, 2019

Online Fraud in Asia

July 1, 2019

Data are often difficult to locate. Once located, verfication is a great deal of work. Nevertheless, you may find the “numbers” in “Examining Online Fraud in Southeast Asia (Infographic),” a useful reference point. Some data are in paragraphs like this one:

In 2018, the region’s internet economy hit US$72 billion in 2018 – double what it was in 2015. Southeast Asia is well on its way to exceed Google’s prediction of hitting US$200 billion by 2025, with ecommerce players such as Lazada, Shopee, and Tokopedia expanding their efforts in the region to meet the demands of consumers.

Others appear in graphics. Here a single item:

fraud

DarkCyber will comment on the methods used by fraudsters in an upcoming DarkCyber video.

Stephen E Arnold, July 1, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta