Dumais on Search: Bell Labs Roots Are Thriving

October 23, 2019

We just love a genuine Search guru, and Dr. Susan Dumais is one of the best. The illustrious Dr. Dumais is now a Microsoft Technical Fellow and Deputy Lab Director of MDR AI. If you wanted to know the history of information retrieval, she would be the one to hear tell about it—and now you can, courtesy of the Microsoft Research Podcast. Both the 38-minute podcast itself and a transcript are posted at, “HCI, IR and the Search for Better Search with Dr. Susan Dumais.” The good doctor describes what motivates her in her work:

“I think there are two commonalities and themes in my work. One is topical. So, as you said, I’m really interested in understanding problems from a very user-centric point-of-view. I care a lot about people, their motivations, the problems they have. I also care about solving those problems with new algorithms, new techniques and so on. So, a lot of my work involves this intersection of people and technology, thinking about how work practices co-evolve with new technological developments. And so thematically, that’s an area that I really like. I like this ability to go back and forth between understanding people, how they think, how they reason, how they learn, how they find information, and finding solutions that work for them. In the end, if something doesn’t work for people, it doesn’t work. In addition to topically, I approach problems in a way that is motivated, oftentimes, by things that I find frustrating. We may talk a little bit later about my work in latent semantic indexing, but that grew out of a frustration with trying to learn the Unix operating system. Work I’ve done on email spam, grew out of a frustration in mitigating the vast amount of junk that I was getting. So, I tend to be motivated by problems that I have now, or that I anticipate that our customers, and people will have in general, given the emerging technology trends.”

She and host Gretchen Huizinga go on to discuss the evolution of search technology over the last twenty years, beginning with the first HTML page crawlers that indexed but a couple thousand queries per day. They also cover Dumais’ work over the years to build bridges, provide context in search, and bring changing content into the equation. We hope you will check out the intriguing and informative interview for yourself, dear reader.

Cynthia Murrell, October 23, 2019

Algolia: Cash Funding Hits $184 Million

October 15, 2019

Exalead was sucked into Dassault Systèmes. Then former Exaleaders abandoned ship. Algolia benefited from some Exalead experience. But unlike Exalead, Algolia embraced venture funding with cash provided by Accel, Point Nine Capital, Storm Ventures, and Y Combinator, among others.

DarkCyber noted “Algolia Finds $110M from Accel and Salesforce for Its Search-As-a-Service, Used by Slack, Twitch and 8K Others.” The write up reports that the company has “closed a Series C of $110 million, money that it plans to invest in R&D around its search technology, including doubling down on voice, and further global expansion in Europe, North America and Asia Pacific.”

The write up adds:

Having Salesforce as a strategic backer in this round is notable: the CRM giant currently does not have a native search product in its wide range of cloud-based services for enterprises, instead opting for endorsed integrations with third parties, such as Algolia competitor Coveo. The plan will be to further integrate with Salesforce although no products to speak of as of yet.

The challenge will be to go where few search and retrieval systems have gone before.

Some people have forgotten the disappointments and questionable financial tricks promising search vendors delivered to stakeholders and customers.

With venture firms looking for winners, returns of 20 percent will not deliver what the sources of the funds expect. The good old days of a 17X return may have cooled, but generating an 8X or 12X return may be a challenge.

Why?

In the course of our researching and writing the enterprise search report in 2003 to 2006 and out and our subsequent work, several “themes” or “learnings” surfaced:

  1. Good enough search is now the order of the day; that is, an organization-wide search system does not meet the needs of many operating units. Examples range from the legal department to research and development to engineering and the drawings plus data embedded in product manufacturing systems to information under security umbrellas with real time data and video content objects. Therefore, the “one solution” approach dissipates like morning fog.
  2. Utility search from outfits like Amazon are “good enough.” This means that a developer using Amazon blockchain services and workflow tools may use the search functions available from Amazon. Maybe Amazon will buy Algolia, but for the foreseeable future, search is a tag-along function, not a driver of the big money apps which Amazon is aiming toward.
  3. Search, regardless of vendor, must spend significant sums to enrich the functions of the system. Natural language processing, predictive analytics, entity extraction, and other desired functions are moving targets. Adding and tuning these capabilities becomes expensive. And it the experiences of Autonomy and Fast Search & Transfer are representative, the costs become difficult to control.

DarkCyber hopes that Algolia can adapt to these research factoids. If not, search and retrieval may be rushing toward a disconnect between revenues, sustainable profits, and investor expectations.

The wheel of fortune is spinning. Where will it stop? On a winner or a loser? This is a difficult question to answer, and one which Attivio, BA-Insight, Coveo, Elastic, IBM Watson, Lucidworks, Microsoft, Sinequa, Voyager Search, and others have been trying to answer with millions of dollars, thousands of engineering hours, and massive investments in marketing. I am not including the search vendors positioned as policeware and intelware; for example, BAE NetReveal, Diffeo, LookingGlass, Palantir Technologies, and Shadowdragon, among others.

Worth monitoring the trajectory of Algolia.

Stephen E Arnold, October 15, 2019

Amazon: Elasticsearch Bounced and Squished

October 14, 2019

DarkCyber noted “AWS Elasticsearch: A Fundamentally-Flawed Offering.” The write up criticizes Amazon’s implementation of Elasticsearch. Amazon hired some folks from Lucidworks a few years ago. But under the covers, Lucene thrums along within Amazon and a large number of other search-and-retrieval companies, including those which present themselves as policeware. There are many reasons: [a] good enough, [b] no one company fixes the bugs, [c] good enough, [d] comparatively cheap, [e] good enough. Oh, one other point: Not under the control of one company like those good, old fashioned solutions like STAIRS III, Fulcrum (remember that?), or Delphes (the francophone folks).

This particular write up is unlikely to earn a gold star from Amazon’s internal team. The Spun.io essay states:

I’m currently working on a large logging project that was initially implemented using AWS Elasticsearch. Having worked with large-scale mainline Elasticsearch clusters for several years, I’m absolutely stunned at how poor Amazon’s implementation is and I can’t fathom why they’re unable to fix or at least improve it.

I think the tip off is the phrase “how poor Amazon’s implementation is…”

The section Amazon Elasticsearch Operation provides some color to make vivid the author’s viewpoint; for example:

On Amazon, if a single node in your Elasticsearch cluster runs out of space, the entire cluster stops ingesting data, full stop. Amazon’s solution to this is to have users go through a nightmare process of periodically changing the shard counts in their index templates and then reindexing their existing data into new indices, deleting the previous indices, and then reindexing the data again to the previous index name if necessary. This should be wholly unnecessary, is computationally expensive, and requires that a raw copy of the ingested data be stored along with the parsed record because the raw copy will need to be parsed again to be reindexed. Of course, this also doubles the storage required for “normal” operation on AWS. [Emphasis in the original essay.]

The wrap up for the essay is clear from this passage:

I cannot fathom how Amazon decided to ship something so broken, and how they haven’t been able to improve the situation after over two years.

DarkCyber’s team formulated several observations. Let’s look at these in the form of questions and trust that some young sprites will answer them:

  1. Will Amazon make its version of Elasticsearch proprietary?
  2. Are these changes designed to “pull” developers deeper into the AWS platform, making departure more difficult or impossible for some implementations?
  3. Are the components the author of the essay finds objectionable designed to generate more revenue for Amazon?

Stephen E Arnold, October 14, 2019

Real Life Q and A for Information Access Allegedly Arrives

October 14, 2019

DarkCyber noted “Promethium Tool Taps Natural Language Processing for Analytics.” The write up, which may be marketing oriented, asserts:

software, called Data Navigation System, was designed to enable non-technical users to make complex SQL requests using plain human language and ease the delivery of data.

The company developing the system is Promethium, founded in 2018, may have delivered what users have long wanted: Ask the computer a question and get a usable, actionable answer. If the write up is accurate, Promethium has achieved with $2.5 million in funding a function that many firms have pursued.

The article reports:

After users ask a question, Promethium locates the data, demonstrates how it should be assembled, automatically generates the SQL statement to get the correct data and executes the query. The queries run across all databases, data lakes and warehouses to draw actionable knowledge from multiple data sources. Simultaneously, Promethium ensures that data is complete while identifying duplications and providing lineage to confirm insights. Data Navigation System is offered as SaaS in the public cloud, in the customer’s virtual private cloud or as an on-premises option.

More information is available at the firm’s Web site.

Stephen E Arnold, October 14, 2019

A List of Enterprise Search Vendors

October 7, 2019

DarkCyber does not follow the enterprise search sector. In fact, two of the flagships from the 2000s found themselves caught in embarrassing financial missteps. Why? It certainly suggests that making big bucks from a search and retrieval service is difficult.

We came across a Web site called Trust Radius. This site has a section devoted to enterprise search. What we found interesting is that the site lists what seem to be the key players in the sector today. With most LE and intel policeware platforms relying on open source search like Lucene, DarkCyber was quite surprised with the line up of systems and the information provided by Trust Radius.

Here’s the list of vendors in alphabetical order, a method of presenting information which is not in favor with some whiz kids:

3RDi Search

Aderant Handshake (knowledge management for law firms)

Agree Ya Site Administrator

Algolia

Amazon Cloud Search (Lucene)

Apache Lucene

Apache Solr

Expert Systems Cogito Discover

Constructor.io Search

Coveo

Customer Matrix (customer support)

Dassault Systems Exalead (Exalead)

Dieselpoint

Elasticsearch (Elastic)

Fabasoft Mindbreeze

Fabasoft Mindbreeze Inspire

Google Search Appliance (discontinued)

IBM Watson (once Omnifind)

IBM Watson Discovery for Salesforce

IBM Watson Explorer

IManage Insight (Interwoven, Autonomy, HP, now a standalone)

Inbenta Enterprise Search

Lookeen Desktop Search (listed as Enterprise Search however)

Lucidworks Fusion ($100 million in funding)

Maana

Microfocus IDOL (Autonomy to HP to HPE to Microfocus)

Microsoft Azure (Fast Search & Transfer)

Microsoft Bing Search

Perceptive Search (ISYS Search Software to Lexmark to Highland)

Rocket NXT Enterprise Search (Aerotext)

Rockset

Searchify

Search Spring (product search)

Search Tap

Search Unify

Sinequa

SLI Systems (e commerce)

Swiftype

Synacor Video Search & Discovery

TeraText Searchable Archive for Files and Email (SAIC)

Zakta

What DarkCyber finds interesting is the omission of outfits like Oracle Endeca, Antidot, and Blossom. Also, of this listing of 41 “search systems” there are multiple enterprise search products from single companies like IBM and Microsoft. There are also e-commerce search systems and systems which do not handle enterprise content because the service supports desktops. There are two “no longer around” products and a weird blend of search utilities with text processing features. In short, this list is illustrative of the chaos, confusion, and craziness that makes some information technology professionals to buy a solution that just delivers key word and some option features.

DarkCyber believes that Amazon’s approach is likely to gain traction. That’s bad news for most of the companies on this list, particularly search vendors who manage to confuse individuals or the smart software used to create this list at Trust Radius.

It seems that the message from this list is that search is a bit of a dog’s breakfast—just as it has been for decades.

Stephen E Arnold, October 7, 2019

 

 

 

Today in Subjective Search: What Are You Not Allowed to Know

October 2, 2019

When you review information, is that information comprehensive, complete, and objectively displayed?

No.

No.

No.

Let’s look at three examples.

First, Boris Johnson allegedly uses certain words to skew search results. This is the allegation of Remoaning Myrtle. You can find the assertion at this link. Does this mean that wordsmithing now fiddles search results on Bing, Google, and Yandex? Interesting question about an interesting person’s ability to use language as a weapon.

Second, Twitter has introduced new filters. “Twitter Rolls Out Filter for Potentially Offensive DMs” reports:

Twitter is quickly acting on plans to filter potentially offensive direct messages. It’s rolling out the filter to all users on Android, iOS and the web. As during the test, there isn’t much mystery to how this works. If a message contains questionable language or is likely spam, it’ll be tucked away in an “additional messages” folder.

Third, “YouTube Moderation Bots Punish Videos Tagged as ‘Gay’ or ‘Lesbian,’ Study Finds” bluntly asserts:

A new investigation from a coalition of YouTube creators and researchers is accusing YouTube of relying on a system of “bigoted bots” to determine whether certain content should be demonetized, specifically LGBTQ videos.

DarkCyber finds it interesting that shaping or alleged shaping of search results is now garnering attention. Researchers looking for historical information may discover that “old” information is either unindexed or not online. Investigators and analysts looking for facts like Cisco’s acquisition of certain firms requires manual review of SEC documents. Individuals looking for information about CMS contractors conducting medical fraud information may find that these data are very, very difficult to locate.

Why?

Reasons vary.

It is important for those who assert that “my team consists of expert online researchers” may be fooling themselves.

Stephen E Arnold, October 2, 2019

Dumbing Down Search and Making More Money?

September 27, 2019

Google makes changes that benefit Google. Forbes Magazine, the capitalist tool, however, does not understand this simple fact about the world’s largest online advertising outfit.

“Google Makes It More Difficult To Find Old Images” points out that the ad giant made it more difficult for 99 percent of Google Image search users to locate “old” images. Most of Google’s advanced search features don’t get much click love.

As a result, why make the feature available? The benefit of making Google Image Search dumbed is related to several factors tangential; my thought is:

  • Legal hassles related to making images findable
  • Cost reduction. If content is not searched, why spend money verifying links and storing pointers
  • Ads. Clicking a Web page for an image can display a current ad. Clicking an old picture like the one below is unlikely to provide an ad payout for the GOOG.

image

There are some options:

  • Use Google search operators like those on this list
  • Include a date in the image search string; for example, IBM mainframe 1964
  • Use the Google advanced image search form which is at this link.

What’s Forbes’ take?

I reached out to Google for comment on this story. I have yet to hear back and will update this article if I do.

Yep, the capitalist tool.

Stephen E Arnold, September 27, 2019

Google and Right to Be Forgotten: Selective Indexing Gets a Green Light

September 25, 2019

DarkCyber noted this BBC article: “Google Wins Landmark Right to Be Forgotten Case.” The main point seems to be that references under the “right to be forgotten” umbrella apply only in Europe. The BBC stated:

There has been a lot of interest in the case since, had the ruling gone the other way, it could have been viewed as an attempt by Europe to police a US tech giant beyond the EU’s borders.

Several observations may be warranted:

  • Google can indeed filter search results; thus, objective results are unlikely
  • The index pointers are blocked, which means that those in another country can view proscribed links and maybe – just maybe — a Google super user can view what’s in the Google indexes
  • The “algorithms” which are allegedly working automatically may not; therefore, human adjustments to modify search results are probably available to certain search engineers.

If these observations are more than hypotheticals, will the index tuning have an impact on other legal matters in which Google is involved? Query reshaping and search results filtering are a fact of Google life.

Stephen E Arnold, September 25, 2019

Alternatives To Google Products

September 15, 2019

Google remains a dominant feature in millions of lives, whether you use the search engine, email, free office suite, or any of the other Google physical or digital products. While some individuals have totally given themselves over to the Google cult, there remain stalwart dissenters such as the SGT Report have not: “The Complete List Of Alternatives To Google Products.”

The list for Google product alternatives was made because:

“With growing concerns over online privacy and securing personal data, more people than ever are considering alternatives to Google products. After all, Google’s business model essentially revolves around data collection and advertisements, both of which infringe on your privacy. More data means better (targeted) ads and more revenue. The company pulled in over $116 billion in ad revenue last year alone – and that number continues to grow. But the word is getting out. A growing number of people are seeking alternatives to Google products that respect their privacy and data.”

The main reason people use Google is as a search engine. There are a wide variety of alternatives and they note that these alternate search engines do filter their results from Google or Bing. Apparently there is only one search engine with its own crawler: Mojeek from the UK.

The list continues with more alternatives to Gmail, Chrome, Google Drive, Google Calendar, Google Docs/Slides/Sheets, Google Photos, Google Translate, Google Maps, Google Analytics, and the Google Play Store. There are even alternatives to YouTube, but the majority of these are hit and miss with their content. The Google Play Store his rivalry by F-Droid, an installable catalog of free and open source software. The only problem is the applications are only for Android. Curses to Apple!

These alternatives are great, but they do have their weaknesses. Google has its evils, privacy issues among them are the worst. However, you have to admit it does make good products. Just stay away from the speakers and use Firefox.

Whitney Grace, September 15, 2019

A Plea for Bing: Use It

September 14, 2019

Microsoft wants more people to use Bing and Microsoft wants them to use it now! Microsoft is desperate for more Bing users that they their trademarked search engine into the new Windows 10 update. Read the story at Win Buzzer, “Microsoft Builds Bing Search into Windows 10 20H1 Lock Screen.”

The Bing implementation is touted as a new search featured imbedded in the Windows lock screen, The feature was released with the new Windows 10 20H1 Preview Build 18932, but it remains hidden and can only be accessed with a tool. One tool is the Mach2. The integration of Bing into the lock screen is good design. The idea is giving users the option to conduct an Internet search without having to unlock their entire PC. It is for those, “Oh yeah, I need to look that up” moments. It is not stated where results will appear. If they are on the lock screen, it is a genius move, but if the results are only available by unlocking the PC it is stupid.

Since Microsoft placed Bing on the Start menu, it gets as much as 50% of its traffic through that direct link as the official Bing Web site. This is funny:

“At the moment, we just can’t see how the Bing feature on the lock screen would be useful. Of course, Microsoft may have some wider lock screen plans that we don’t know about yet.Whether this is Microsoft making a play to compete further with Google is unclear, but it probably won’t work. Bing is the default search tool on Windows PCs, but users continue to actively choose Google Search over it. Adding Bing to the lock screen will likely not change that. However, it will be interesting to see how Microsoft handles this new feature in the coming months.”

Apparently the author Luke Jones never has to figure out the name of that actor in that one movie or the name of that place where he ate lunch three weeks ago next to the good bakery. Ah, Luke Jones may want to consult a librarian.

Whitney Grace, MLS, September 14, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta