Comments about Web Search: Prompted by a Hacker News Thread

November 13, 2020

I spotted a Web search related threat on Hacker News. You can locate the comments at this link. Several observations:

Metasearch. Confusion seems to exist between a dedicated Web search system like Bing, Google, and Yandex and metasearch systems like DuckDuckGo and Startpage. Dedicated Web search systems require considerable effort, but there is less appreciation for the depth of the crawl, the index updating cycle, and similar factors.
Competitors to Google. The comments present a list of search systems which are relatively well known. Omitted are some other services; for example, iSeek, Swisscows, and 50kft.
Bias. The comments do not highlight some of the biases of Web search systems; for example, when are pages reindexed, what pages are on a slow or never update cycle, blacklisted, or processed against a stop word list.

So what?

Many profess to be experts at finding information online. The comments suggest that perception is different from reality.
Locating content on publicly accessible Web sites is more difficult than at any other time in my professional career in the online information sector.
Locating relevant information is increasingly time consuming because predictive, personalized, and wisdom of crowd results don’t work; for example, run this query on any of the search engines:

Voyager search

Did your results point to the Voyager Labs’s system, the UK HR company’s search engine, a venture capital firm, or a Lucene repackager in Orange County? What about Voyager patents? What about Voyager customers?

How can one disambiguate when the index scope is unknown, entity extraction is almost non existent, and deduplication almost laughable? Real time? Ho ho ho.

One can do this work manually. Who wants to volunteer for that. The most innovative specialized search vendors try to automate the process. Some of these systems are helpful; most are not.

Is search getting better? Rerun that Voyager search. See for yourself.

Without field codes, Boolean, and a mechanism to search across publicly accessible content domains, Web search reveals its shortcomings to those who care to look.

Not many look, including professionals at some of the better known Web search outfits.

Stephen E Arnold, November 13, 2020

Written by Stephen E. Arnold · Filed Under News, Reference tool, Search | Comments Off on Comments about Web Search: Prompted by a Hacker News Thread

Voyager Search Tapped for USDA Search and Discovery Project

November 4, 2020

Low-profile enterprise search company Voyager Search just made an important deal with a high-profile government agency. AIThority announces, “New Light Technologies and Voyager Search Team Win New Contracts with the U.S. Department of Agriculture to Implement Data Search and Discovery Solutions.” Voyager’s partner in the project, New Light Technologies (NLT), is a consulting firm working in the areas of cloud tech, cybersecurity, software development, data analytics, geospatial tech, and scientific R&D. The write-up reports:

“Access to accurate information is crucial to the department’s mission to support sustainable agriculture production and protection of natural resources. Both NLT and Voyager Search bring many years of experience developing award-winning federal data integration and dissemination platforms and will build federated data search solutions to index and link disparate cloud-based and on-prem data sources, including large repositories of imagery and geospatial data files that are used for a variety of analytical reporting and data dissemination systems, such as the Global Agricultural Information Network, Global Agricultural & Disaster Assessment System, Crop Explorer, and the Geospatial Data Gateway. Leveraging NLT and Voyager Search’s Professional Services Department and Vose technology which provides robust spatial search capabilities, the team’s solution will enable users to search for data, content, and documents by who, what, when, and where. Together, the team is providing the technology and services to advance a modern data architecture for the department that will support improved information flow, security, and analysis as well as power the Artificial Intelligence (AI) and Machine Learning (ML) of the future.”

“Voyager” is a popular name for a business, so do not confuse Voyager Search with other enterprises like digital innovation firm Voyager, manufacturer Voyager Industries, or even the Voyager Company that pioneered DC-ROM production back in the day. Vose is the name of Voyager Search’s platform that will be used for the USDA project, but the company also offers Server, essentially Vose for larger implementations, and ODN (Open Data Network), a searchable global-content catalog. Both products build on Vose’s “smart spatial search” technology. Based in Redlands, California, Voyager Search was founded in 2008.

Cynthia Murrell, November 4, 2020

Written by Stephen E. Arnold · Filed Under Enterprise search, Government, News, Search | Comments Off on Voyager Search Tapped for USDA Search and Discovery Project

Google Reveals Its Aspiration: Everything

October 30, 2020

An online publication called Gadgets360 published “Google Renames the Chromebook Search Button to the Everything Button.” The lowly capitalization lock key has been identified as expendable. By repurposing a way to create CAPS, Google has performed two vital services:

Easier access to search
A way to reveal its aspiration: To be “everything” to a human user.

The article states:

Google is renaming a button on Chrome OS PC keyboards to ‘Everything Button. … Google said that the new name for the Launcher button was chosen to reflect user feedback; the search giant hoped that the inclusion of the new name for the button will help highlight that Chromebook laptops have a dedicated button on their keyboards. Clicking on the Everything Button will open up a search bar through which you can search for things on Google, as well as for apps and files on the Chrome OS machine.

Interesting. What about confusion with the freeware application called Everything. David Carpenter at Voidtools.com has offered his useful information retrieval software for several years. Google is indeed innovative and proving that it is “everything” a me-too outfit would want to be.

Stephen E Arnold, October 30, 2020

Written by Stephen E. Arnold · Filed Under Google, Innovation, News, Search | Comments Off on Google Reveals Its Aspiration: Everything

Newspaper Search: Another Findability Challenge

October 13, 2020

Here is an interesting project any American-history enthusiast could get lost in for hours: Newspaper Navigator. I watched the home page’s 15-minute video, which gives both an explanation of the search tool’s development and a demo. Then I played around with the tool for a bit. Here’s what I learned.

Created by Ben Lee, the Library of Congress’ 2020 Innovator in Residence, The Newspaper Navigator is built on the Library of Congress’s Chronicling America, a search portal that allows one perform keyword searches on 16 million pages of historical US newspapers using optical character recognition. That is a great resource—but how to go about an image search for such a collection? That’s where Newspaper Navigator comes in.

Lee used thousands of annotations of the collection’s visual content, created by volunteers in the Library’s Beyond Words crowdsourcing initiative of 2017, to train a machine learning model to recognize visual content. (He released the dataset, which can be found here. He also created hundreds of prepackaged downloadable datasets organized by year and type, like maps, photos, cartoons, etcetera.) The Newspaper Navigator search interface allows users to plumb 1.5 million high-confidence, public-domain photos from newspapers published between 1900-1963. The app allows for standard search, but the juicy bit is the ability to search by visual similarity using machine learning.

Lee walks us through two demo searches—one that begins with the keyword “baseball” and with “sailboat.” One can filter by location and time frame, then hover over results to get more info on the image itself and the paper in which it appeared. Select images to build a Collection, then tap into the AI prowess via the “Train my AI Navigators” button. The AI uses the selected images to generate a page of similar images, each with a clickable + or – button. Clicking these tells the tool which images are more and which are less like what is desired. Click “Train my AI Navigators” again to generate a more refined page, and repeat until only (or almost only) the desired type of image appears. When that happens, clicking the Save button creates a URL to take one right back to those results later.

Lee notes that machine learning is not perfect, and some searches lend themselves to refinement better than others. He suggests starting again and retraining if results start refining themselves in the wrong direction.

The video acknowledges the potential marginalization issues in any machine learning project. Click on the Data Archaeology tab to read about Lee’s investigation of the Navigator dataset and app from the perspective of bias.

I suggest curious readers play around with the search app for themselves. Lee closes by inviting users to share their experiences through LC-Labs@loc.gov or on twitter @LC_Labs, #NewspaperNavigator.

Cynthia Murrell, October 13, 2020

Written by Stephen E. Arnold · Filed Under News, Search | Comments Off on Newspaper Search: Another Findability Challenge

Does Search Breed Fraud?

October 11, 2020

The question “Does search breed fraud?” is an interesting one. As far as I know, none of the big time MBA case studies address the topic. If any academic discipline knows about fraud, I believe it is those very same big time MBA programs.

“South Korean Search Giant Fined US $23 Million for Manipulating Results” reveals that Naver has channeled outfits with a penchant for results fiddling. The write up states:

The Korea Fair Trade Commission, the country’s antitrust regulator, ruled Naver altered algorithms on multiple occasions between 2012 and 2015 to raise its own items’ rankings above those of competitors.

Naver responded, according to the write up, with this statement:

“The core value of search service is presenting an outcome that matches the intentions of users,” it said in a statement, adding: “Naver has been chosen by many users thanks to our focus on this essential task.”

The pressure to generate revenue is significant. Engineers, who may be managed loosely or steered by the precepts of high school science club thought processes, can make tiny changes with significant impact. As a result, the manipulation can arise from a desire to get promoted, be cool, or land a bonus.

The implications can be profound. Google may be less evil because fiddling is an emergent behavior.

Stephen E Arnold, October 11, 2020

Written by Stephen E. Arnold · Filed Under cybercrime, News, Search | Comments Off on Does Search Breed Fraud?

An Oath from the Past: Yahoo Web Scale Semantic Search

October 9, 2020

I spotted a link to “Yahoo: Web Scale Semantic Search.” You remember Yahoo, don’t you. This is the outfit with the data breaches, the clueless business model, and the sale to the Baby Bell Verizon. The executives too are memorable: Marissa, Alex, Terry, and the Peanut Butter memo man.

The link displayed a presentation by Edgar Meij, a laborer in Yahoo Labs. The topic was an X ray view from Mt. Olympus intended to reveal Web scale semantic search.

The slide deck requires 62 clicks to traverse. There are many riches in the presentation. I want to highlight three of these, and invite you to make your own determination of these insights.

First, there is a “text” accompanying the deck. It contains a riot of jargon and buzzwords. In fact, I have saved the text, despite a portion being truncated, as a glossary of Web search jive talk; for example “s a sequence of terms s 2 s drawn from the set S, s ? Multinomial(?s) e a set of entities e 2 e.” (I knew you would experience the same thrill I did when I read this line.) True to Slideshare’s attention to detail, the text for slides 32 to 62 has been removed. Great loss indeed.

Second, Yahoo cares about knowledge. Consider this diagram:

The idea is that one acquires knowledge (I assume this means scraping and indexing Web site content), knowledge integration (creating a big index), and knowledge consumption (maybe finding something when a user or system sends a query to the search subsystem). The key point is “knowledge” is important. How about that? Yahoo search was focusing on knowledge? Is that why Yahoo floundered in search for many, many years before bowing to failure?

Third, Yahoo’s approach to semantic search requires humans. Here’s proof:

When Yahoo announced Vin Diesel was dead, he was alive. So much for smart software.

Why am I mentioning this blast from the past.

Knowledge was talked about in my interview/discussion with Dr. Stavros Macrakis. We tackled the difference between Web search and enterprise search. This Yahoo deck illustrates that talk about knowledge is one thing. Delivering useful results to a user is quite another.

Jargon in search and retrieval has made more progress than search technology itself. That’s why the Yahoo deck could have been crafted yesterday by one of the search vendors still chasing a huge market in the era of Lucene/Solr and “good enough” information access.

Stephen E Arnold, October 9, 2020

Written by Stephen E. Arnold · Filed Under News, Search | Comments Off on An Oath from the Past: Yahoo Web Scale Semantic Search

Comparison of Elasticsearch, Solr, and Sphinx

October 8, 2020

Search and retrieval underpins most policeware and intelware systems. Open source search software has made life more challenging for vendors of proprietary enterprise search solutions. There are versions of an “in depth” enterprise search analysis like this available for thousands of dollars from marketers like https://www.adroitmarketresearch.com sporting this title:

Enterprise Search Market Demand Analysis and Projected huge Growth by 2025| IBM Corp, Coveo Corp., Polyspot & Sinequa Inc., Expert System Inc., HP Autonomy, Lucidworks, Esker Software Corp., Dassault Systemes Inc., Perceptive Software Inc., and Marklogic Inc.

Notice that none of the search vendors in “Elasticsearch vs. Solr vs. Sphinx: Best Open Source Search Platform Comparison” appears in the Adroit Market Research report. That’s important for one reason: Open source search has driven vendors of proprietary systems into a corner. What’s even more intriguing is that some vendors of enterprise search like Attivio and IBM Corp. use open source search technology but take pains to avoid revealing the plumbing under the house trailer.

The comparison is, for now, available without charge online, courtesy of Greenice. This firm, based in Ukraine, is what I would describe as a DevOps consulting and services company. It’s a mash up of advisory, coding, and technical deliverables.

The comparison contains some useful information; for example:

Inclusion of examples of the search systems’ visualization capabilities
Examples of organizations using each of the three systems compared
Presentation of the analyst’s perception of strengths and weaknesses of each system
References to machine learning in the context of the three systems.

What caught my attention is the disconnect between the expensive and somewhat over enthusiastic for fee study about search and this free analysis.

Many of the problems in search are a result of what may be described as “over enthusiastic marketing.” This approach to jazzing up what can be accomplished by information retrieval technology has resulted in at least one jail sentence for an enterprise search entrepreneur and may be followed by jail time for other companies’ executives who practice razzmataz sales techniques.

The principal value of the free comparison is that it does a good job of walking through basic information without the Madison Avenue hucksterism. Net net: A free write up with some helpful information.

Stephen E Arnold, October 8, 2020

Written by Stephen E. Arnold · Filed Under News, Open source, Search | Comments Off on Comparison of Elasticsearch, Solr, and Sphinx

DarkCyber for October 6, 2020, Now Available

October 6, 2020

The October 6, 2020, DarkCyber covers one security-related story and offers a special feature about the differences between Web search and enterprise search. The loss of 250 million user accounts in December 2019 illustrated the flaws in the Microsoft approach to online security. What was the company’s response? The firm researched the event and prepared an after-action report. The document makes clear that Microsoft’s approach to security allowed bad actors to obtain access to proprietary data. Furthermore, the report provides one more example that high-visibility cyber security systems may not work as advertised. What’s the difference between Web search and enterprise search? Dr. Stavros Macrakis and Stephen E Arnold explore this subject. Dr. Macrakis worked at Lycos, Google, and other high-profile search firms. Arnold is the author of Successful Enterprise Search Management and The New Landscape of Search. The extracts from their discussion provide fresh insights into the challenges of information retrieval in today’s mobile-centric world. You can view the program on YouTube.

Kenny Toth, October 6, 2020

Written by Stephen E. Arnold · Filed Under DarkCyber, News, Search | Comments Off on DarkCyber for October 6, 2020, Now Available

Google and Search Results: A Stay at Home Mother Explains

October 1, 2020

DarkCyber has a sneaking suspicion that Google wants to deliver the answers to users’ queries in a manner which:

Prevents a user from obtaining non-Google “approved” information
Requires zero latency between presenting an answer to a query and a click on an advertiser’s message
Appeals to a statistically significant percentage of users who accept the precept “Google makes one’s research easy”.

Other people do not agree with DarkCyber; for example, Google executives testifying before Congress or Googlers who are paid to explain how wonderful Google really, really is.

“Google Wants to Eliminate Search Engine. Introducing Semantic Search” is an interesting and possibly disconcerting write up. One of the DarkCyber researchers noted for me this passage:

The experts at Google want to eliminate the one thing that Google does best – searching.

Since Google is perceived as search, what’s up? What’s up is that Google wants to deliver the “correct” answer directly to a thumb typing user or an impressionable child using a Chromebook and Google approved information to learn.

The write up explains in cheery stay-at-home mom panache:

With semantic searching, the algorithm working behind the search engine will understand the meaning of the search term and hence provide meaningful results, saving users a lot of hassle and a lot of time. In short, the new search is going to allow users to smart search for everything on the web.

Yep, smart search. Everything. The Web.

Sounds perfect, particularly for Google and its ad-centric approach to services.

Plus, users benefit because search engine optimization will no longer force the ever-smart Google search system to display irrelevant results:

Google is just preventing website owners to dig out the most-searched for keywords and then bulk them on to their websites.

DarkCyber finds the “just” an interesting word. Google just wants to make users better informed. How thoughtful. Research becomes little more than accepting what Google determines is optimal. Why read? Why compare? Why analyze? Google knows best: Best in terms of controlling access to information, shaping perceptions, and selling ads. Yes, that “best” may mean that an advertiser paid to get the click.

The DarkCyber researcher put an exclamation mark next to this passage:

In order to calm website owners down, Google has provided that the new algorithm is going to consist of an improved form of the same algorithm which will provide an opportunity to work towards legitimate optimization instead of spamming.

Yes, be calm. Accept what is delivered.

Stephen E Arnold, October 1, 2020

Written by Stephen E. Arnold · Filed Under Google, News, Search | Comments Off on Google and Search Results: A Stay at Home Mother Explains

Microsoft Bing: Assertions Versus Actual Search Results

September 25, 2020

DarkCyber read “Introducing the Next Wave of AI at Scale innovations in Bing.” The write up explains a number of innovations. These enhancements will make finding information via Bing easier, better, faster, and generally more wonderful.

The main assertions DarkCyber noted are:

Smarter suggestions. The idea is that one does not know how to create a search query. Bing will know what the user wants.

More ideas. Bing will display questions other people (presumably just like me) ask. Bing keeps track and shows the popular questions. Yep, popular.

Translations. Send a query with mixed languages, and Bing will answer in your language. No more of that copying and pasting into Google Translate or Freetranslations.org.

Highlighting. This is Bing’s yellow marker. The system will highlight what you need to read. The method? “A zero-shot fashion.” No, DarkCyber does not know what this means. But one can ask Bing, right?

Let’s give Bing a whirl and run the same query against Googzilla.

Here’s a DarkCyber Bing query related to research we are now doing:

Black Sage open source

And here’s the result:

Black Sage is an integrator engaged in the development of counter unmanned aerial systems. The firm’s marketing collateral emphasizes that its platform is open. DarkCyber wants to know if the system uses open source methods for compromising a targeted UAS (drone). Bing focuses on a publishing company.

Now Google:

The first result from the Google is a pointer to the company. The remainder of the results are crazy and wacky like the sneakers Mr. Brin wore to Washington about a decade ago to meet elected officials. Crazy? Nope, Sillycon Valley.

DarkCyber uses both Bing and Google. Why did Google produce something sort of related to our query and Bing missed the corn hole entirely?

The answer is that Bing does not process a user’s search history as effectively as the Google. All the fancy words from Microsoft cannot alter a search result. DarkCyber is amused by Google and Microsoft. We are skeptical of each system.

Key points:

Microsoft is chasing technology instead of looking for efficient ways to tailor results to a user.
Microsoft wants to prove that its approach is more knowledge-centric. Google just wants to sell ads. Giving people something they have already seen is fine with Mother Google.
Microsoft, like Google, has lost sight of the utility of providing “stupid mode” and “sophisticated mode” for users. Let users select how a query should be matched to the content in the index.

To sum up, Google has a global share of Web search in the 85 percent range. Bing is an also participated player. Perhaps a less academic approach, deeper index, and functional user controls would be helpful?

Stephen E Arnold, September 25, 2020

Written by Stephen E. Arnold · Filed Under Google, Microsoft, News, Search | Comments Off on Microsoft Bing: Assertions Versus Actual Search Results

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Comments about Web Search: Prompted by a Hacker News Thread

Voyager Search Tapped for USDA Search and Discovery Project

Google Reveals Its Aspiration: Everything

Newspaper Search: Another Findability Challenge

Does Search Breed Fraud?

An Oath from the Past: Yahoo Web Scale Semantic Search

Comparison of Elasticsearch, Solr, and Sphinx

DarkCyber for October 6, 2020, Now Available

Google and Search Results: A Stay at Home Mother Explains

Microsoft Bing: Assertions Versus Actual Search Results

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta