Amazon: Elasticsearch Bounced and Squished

October 14, 2019

DarkCyber noted “AWS Elasticsearch: A Fundamentally-Flawed Offering.” The write up criticizes Amazon’s implementation of Elasticsearch. Amazon hired some folks from Lucidworks a few years ago. But under the covers, Lucene thrums along within Amazon and a large number of other search-and-retrieval companies, including those which present themselves as policeware. There are many reasons: [a] good enough, [b] no one company fixes the bugs, [c] good enough, [d] comparatively cheap, [e] good enough. Oh, one other point: Not under the control of one company like those good, old fashioned solutions like STAIRS III, Fulcrum (remember that?), or Delphes (the francophone folks).

This particular write up is unlikely to earn a gold star from Amazon’s internal team. The Spun.io essay states:

I’m currently working on a large logging project that was initially implemented using AWS Elasticsearch. Having worked with large-scale mainline Elasticsearch clusters for several years, I’m absolutely stunned at how poor Amazon’s implementation is and I can’t fathom why they’re unable to fix or at least improve it.

I think the tip off is the phrase “how poor Amazon’s implementation is…”

The section Amazon Elasticsearch Operation provides some color to make vivid the author’s viewpoint; for example:

On Amazon, if a single node in your Elasticsearch cluster runs out of space, the entire cluster stops ingesting data, full stop. Amazon’s solution to this is to have users go through a nightmare process of periodically changing the shard counts in their index templates and then reindexing their existing data into new indices, deleting the previous indices, and then reindexing the data again to the previous index name if necessary. This should be wholly unnecessary, is computationally expensive, and requires that a raw copy of the ingested data be stored along with the parsed record because the raw copy will need to be parsed again to be reindexed. Of course, this also doubles the storage required for “normal” operation on AWS. [Emphasis in the original essay.]

The wrap up for the essay is clear from this passage:

I cannot fathom how Amazon decided to ship something so broken, and how they haven’t been able to improve the situation after over two years.

DarkCyber’s team formulated several observations. Let’s look at these in the form of questions and trust that some young sprites will answer them:

  1. Will Amazon make its version of Elasticsearch proprietary?
  2. Are these changes designed to “pull” developers deeper into the AWS platform, making departure more difficult or impossible for some implementations?
  3. Are the components the author of the essay finds objectionable designed to generate more revenue for Amazon?

Stephen E Arnold, October 14, 2019

Real Life Q and A for Information Access Allegedly Arrives

October 14, 2019

DarkCyber noted “Promethium Tool Taps Natural Language Processing for Analytics.” The write up, which may be marketing oriented, asserts:

software, called Data Navigation System, was designed to enable non-technical users to make complex SQL requests using plain human language and ease the delivery of data.

The company developing the system is Promethium, founded in 2018, may have delivered what users have long wanted: Ask the computer a question and get a usable, actionable answer. If the write up is accurate, Promethium has achieved with $2.5 million in funding a function that many firms have pursued.

The article reports:

After users ask a question, Promethium locates the data, demonstrates how it should be assembled, automatically generates the SQL statement to get the correct data and executes the query. The queries run across all databases, data lakes and warehouses to draw actionable knowledge from multiple data sources. Simultaneously, Promethium ensures that data is complete while identifying duplications and providing lineage to confirm insights. Data Navigation System is offered as SaaS in the public cloud, in the customer’s virtual private cloud or as an on-premises option.

More information is available at the firm’s Web site.

Stephen E Arnold, October 14, 2019

A List of Enterprise Search Vendors

October 7, 2019

DarkCyber does not follow the enterprise search sector. In fact, two of the flagships from the 2000s found themselves caught in embarrassing financial missteps. Why? It certainly suggests that making big bucks from a search and retrieval service is difficult.

We came across a Web site called Trust Radius. This site has a section devoted to enterprise search. What we found interesting is that the site lists what seem to be the key players in the sector today. With most LE and intel policeware platforms relying on open source search like Lucene, DarkCyber was quite surprised with the line up of systems and the information provided by Trust Radius.

Here’s the list of vendors in alphabetical order, a method of presenting information which is not in favor with some whiz kids:

3RDi Search

Aderant Handshake (knowledge management for law firms)

Agree Ya Site Administrator

Algolia

Amazon Cloud Search (Lucene)

Apache Lucene

Apache Solr

Expert Systems Cogito Discover

Constructor.io Search

Coveo

Customer Matrix (customer support)

Dassault Systems Exalead (Exalead)

Dieselpoint

Elasticsearch (Elastic)

Fabasoft Mindbreeze

Fabasoft Mindbreeze Inspire

Google Search Appliance (discontinued)

IBM Watson (once Omnifind)

IBM Watson Discovery for Salesforce

IBM Watson Explorer

IManage Insight (Interwoven, Autonomy, HP, now a standalone)

Inbenta Enterprise Search

Lookeen Desktop Search (listed as Enterprise Search however)

Lucidworks Fusion ($100 million in funding)

Maana

Microfocus IDOL (Autonomy to HP to HPE to Microfocus)

Microsoft Azure (Fast Search & Transfer)

Microsoft Bing Search

Perceptive Search (ISYS Search Software to Lexmark to Highland)

Rocket NXT Enterprise Search (Aerotext)

Rockset

Searchify

Search Spring (product search)

Search Tap

Search Unify

Sinequa

SLI Systems (e commerce)

Swiftype

Synacor Video Search & Discovery

TeraText Searchable Archive for Files and Email (SAIC)

Zakta

What DarkCyber finds interesting is the omission of outfits like Oracle Endeca, Antidot, and Blossom. Also, of this listing of 41 “search systems” there are multiple enterprise search products from single companies like IBM and Microsoft. There are also e-commerce search systems and systems which do not handle enterprise content because the service supports desktops. There are two “no longer around” products and a weird blend of search utilities with text processing features. In short, this list is illustrative of the chaos, confusion, and craziness that makes some information technology professionals to buy a solution that just delivers key word and some option features.

DarkCyber believes that Amazon’s approach is likely to gain traction. That’s bad news for most of the companies on this list, particularly search vendors who manage to confuse individuals or the smart software used to create this list at Trust Radius.

It seems that the message from this list is that search is a bit of a dog’s breakfast—just as it has been for decades.

Stephen E Arnold, October 7, 2019

 

 

 

Today in Subjective Search: What Are You Not Allowed to Know

October 2, 2019

When you review information, is that information comprehensive, complete, and objectively displayed?

No.

No.

No.

Let’s look at three examples.

First, Boris Johnson allegedly uses certain words to skew search results. This is the allegation of Remoaning Myrtle. You can find the assertion at this link. Does this mean that wordsmithing now fiddles search results on Bing, Google, and Yandex? Interesting question about an interesting person’s ability to use language as a weapon.

Second, Twitter has introduced new filters. “Twitter Rolls Out Filter for Potentially Offensive DMs” reports:

Twitter is quickly acting on plans to filter potentially offensive direct messages. It’s rolling out the filter to all users on Android, iOS and the web. As during the test, there isn’t much mystery to how this works. If a message contains questionable language or is likely spam, it’ll be tucked away in an “additional messages” folder.

Third, “YouTube Moderation Bots Punish Videos Tagged as ‘Gay’ or ‘Lesbian,’ Study Finds” bluntly asserts:

A new investigation from a coalition of YouTube creators and researchers is accusing YouTube of relying on a system of “bigoted bots” to determine whether certain content should be demonetized, specifically LGBTQ videos.

DarkCyber finds it interesting that shaping or alleged shaping of search results is now garnering attention. Researchers looking for historical information may discover that “old” information is either unindexed or not online. Investigators and analysts looking for facts like Cisco’s acquisition of certain firms requires manual review of SEC documents. Individuals looking for information about CMS contractors conducting medical fraud information may find that these data are very, very difficult to locate.

Why?

Reasons vary.

It is important for those who assert that “my team consists of expert online researchers” may be fooling themselves.

Stephen E Arnold, October 2, 2019

Dumbing Down Search and Making More Money?

September 27, 2019

Google makes changes that benefit Google. Forbes Magazine, the capitalist tool, however, does not understand this simple fact about the world’s largest online advertising outfit.

“Google Makes It More Difficult To Find Old Images” points out that the ad giant made it more difficult for 99 percent of Google Image search users to locate “old” images. Most of Google’s advanced search features don’t get much click love.

As a result, why make the feature available? The benefit of making Google Image Search dumbed is related to several factors tangential; my thought is:

  • Legal hassles related to making images findable
  • Cost reduction. If content is not searched, why spend money verifying links and storing pointers
  • Ads. Clicking a Web page for an image can display a current ad. Clicking an old picture like the one below is unlikely to provide an ad payout for the GOOG.

image

There are some options:

  • Use Google search operators like those on this list
  • Include a date in the image search string; for example, IBM mainframe 1964
  • Use the Google advanced image search form which is at this link.

What’s Forbes’ take?

I reached out to Google for comment on this story. I have yet to hear back and will update this article if I do.

Yep, the capitalist tool.

Stephen E Arnold, September 27, 2019

Google and Right to Be Forgotten: Selective Indexing Gets a Green Light

September 25, 2019

DarkCyber noted this BBC article: “Google Wins Landmark Right to Be Forgotten Case.” The main point seems to be that references under the “right to be forgotten” umbrella apply only in Europe. The BBC stated:

There has been a lot of interest in the case since, had the ruling gone the other way, it could have been viewed as an attempt by Europe to police a US tech giant beyond the EU’s borders.

Several observations may be warranted:

  • Google can indeed filter search results; thus, objective results are unlikely
  • The index pointers are blocked, which means that those in another country can view proscribed links and maybe – just maybe — a Google super user can view what’s in the Google indexes
  • The “algorithms” which are allegedly working automatically may not; therefore, human adjustments to modify search results are probably available to certain search engineers.

If these observations are more than hypotheticals, will the index tuning have an impact on other legal matters in which Google is involved? Query reshaping and search results filtering are a fact of Google life.

Stephen E Arnold, September 25, 2019

Alternatives To Google Products

September 15, 2019

Google remains a dominant feature in millions of lives, whether you use the search engine, email, free office suite, or any of the other Google physical or digital products. While some individuals have totally given themselves over to the Google cult, there remain stalwart dissenters such as the SGT Report have not: “The Complete List Of Alternatives To Google Products.”

The list for Google product alternatives was made because:

“With growing concerns over online privacy and securing personal data, more people than ever are considering alternatives to Google products. After all, Google’s business model essentially revolves around data collection and advertisements, both of which infringe on your privacy. More data means better (targeted) ads and more revenue. The company pulled in over $116 billion in ad revenue last year alone – and that number continues to grow. But the word is getting out. A growing number of people are seeking alternatives to Google products that respect their privacy and data.”

The main reason people use Google is as a search engine. There are a wide variety of alternatives and they note that these alternate search engines do filter their results from Google or Bing. Apparently there is only one search engine with its own crawler: Mojeek from the UK.

The list continues with more alternatives to Gmail, Chrome, Google Drive, Google Calendar, Google Docs/Slides/Sheets, Google Photos, Google Translate, Google Maps, Google Analytics, and the Google Play Store. There are even alternatives to YouTube, but the majority of these are hit and miss with their content. The Google Play Store his rivalry by F-Droid, an installable catalog of free and open source software. The only problem is the applications are only for Android. Curses to Apple!

These alternatives are great, but they do have their weaknesses. Google has its evils, privacy issues among them are the worst. However, you have to admit it does make good products. Just stay away from the speakers and use Firefox.

Whitney Grace, September 15, 2019

A Plea for Bing: Use It

September 14, 2019

Microsoft wants more people to use Bing and Microsoft wants them to use it now! Microsoft is desperate for more Bing users that they their trademarked search engine into the new Windows 10 update. Read the story at Win Buzzer, “Microsoft Builds Bing Search into Windows 10 20H1 Lock Screen.”

The Bing implementation is touted as a new search featured imbedded in the Windows lock screen, The feature was released with the new Windows 10 20H1 Preview Build 18932, but it remains hidden and can only be accessed with a tool. One tool is the Mach2. The integration of Bing into the lock screen is good design. The idea is giving users the option to conduct an Internet search without having to unlock their entire PC. It is for those, “Oh yeah, I need to look that up” moments. It is not stated where results will appear. If they are on the lock screen, it is a genius move, but if the results are only available by unlocking the PC it is stupid.

Since Microsoft placed Bing on the Start menu, it gets as much as 50% of its traffic through that direct link as the official Bing Web site. This is funny:

“At the moment, we just can’t see how the Bing feature on the lock screen would be useful. Of course, Microsoft may have some wider lock screen plans that we don’t know about yet.Whether this is Microsoft making a play to compete further with Google is unclear, but it probably won’t work. Bing is the default search tool on Windows PCs, but users continue to actively choose Google Search over it. Adding Bing to the lock screen will likely not change that. However, it will be interesting to see how Microsoft handles this new feature in the coming months.”

Apparently the author Luke Jones never has to figure out the name of that actor in that one movie or the name of that place where he ate lunch three weeks ago next to the good bakery. Ah, Luke Jones may want to consult a librarian.

Whitney Grace, MLS, September 14, 2019

Elastic Stack Goes Into Cyber Security

September 11, 2019

The open source search company Elasticsearch has augmented its offerings with new security technology. ZDNet delves into Elasticsearch’s new endeavor in the article, “Elastic Takes the First Steps Toward Building Out Its SIEM Solution.” Elastic Stack is Elasticsearch’s open source analytics tool and it received a new update: Elastic NV. Elastic NV is a data model and UI for Security information and Event Management (SIEM).

Elasticsearch has a lot of competition, so the company decided that making its log, search, and analytics stack more utilitarian would expand its client base. The SIEM update is an appealing security solution:

“The SIEM features lay the foundations for a more fleshed-out solution going forward with the new Elastic Common Schema, an open source specification for field naming conventions and data types; think of the new common schema as a Rosetta Stone for the different types of logs, metrics, and other contextual data that is used for analyzing security events. Additionally, the 7.2 release adds a dedicated user interface for security events, featuring a timeline viewer to store evidence of an attack, pin and annotate relevant events, and provide query filtering capabilities.”

While appealing the Elastic SIEM offerings are still skeletal, but Elastic acquired Endgame-a company that designs endpoint security solutions. Elastic will probably include it in a future SIEM update.

Search is also more powerful in Elastic NV. Search used to be limited to the Elastic cloud, but it can now be used on-site end systems. Elastic is extending its services also to make a scalable search-based solution to provide insights into detecting potential threats.

Will other enterprise search vendors follow Elastic?

Whitney Grace, September 11, 2019

Yale Image Search: Innovation and Practicality

September 5, 2019

Yale University, according to Open Culture, has made available 170,000 photographs which document the Depression. Well, not just the Depression. The review conducted by DarkCyber revealed photos into the 1940s.

What sets this image collection apart is its interface. Unlike the near impossible presentations of other august institutions, Yale has hit upon:

  • A map based approach
  • A “search by photographer”
  • Useful basic photo information.

There’s even a functional, clear search component with old fashioned fields. (Google, why not check it out? Not all good ideas originate near Standford.)

image

Kudos to Yale. DarkCyber hopes that other online image archives learn from what Yale has rolled out. A little “me too” from Internet Archive, the American Memory project, and assorted museums would be welcomed here in Harrod’s Creek. (One river shore photo looked a great deal like Tibby the Dog’s favorite playground.)

Stephen E Arnold, September 5, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta