Do Four Peas Make a Useful Digital Pod?

December 24, 2019

The Four P’s of Information

This has the problem with data since at least the turn of this century—Forbes posts a “Reality Check: Still Spending More Time Gathering Instead of Analyzing.” Writer and Keeeb CTO Sid Probstein reminds us:

“Numerous studies of ‘knowledge worker’ productivity have shown that we spend too much time gathering information instead of analyzing it. In 2001, IDC published its venerable white paper, ‘The High Cost of Not Finding Information,’ noting that knowledge workers were spending two and a half hours a day searching for information. Since then, we have seen the rise of the cloud, ubiquitous computing, connectivity and everything else that was science fiction when we were kids becoming a reality — including the imminent emergence of AI. Yet in 2012, a decade after the IDC report, a study conducted by McKinsey found that knowledge workers still spend 19% of their time searching for and gathering information, and a 2018 IDC study found that ‘data professionals are losing 50% of their time every week’ — 30% searching for, governing and preparing data plus 20% duplicating work. Clearly, all the technology advances have not flipped the productivity paradigm; it seems like we still spend more time searching for information that exists rather than analyzing and creating new knowledge.”

Probstein believes much of the problem lies in data silos. There are four subsets of the data silo issue, we’re told, but most proposed solutions fail to address all of them. They are the “four P’s” of information: Public Data (info that is searchable across the World Wide Web), Private Data (information behind login pages or firewalls), Paid Data (like industry research, datasets, and professional information), and Personal Data (our own notes, bookmarks, and saved references). See the article for more about each of these areas. Bridging these silos remains a challenge for knowledge workers, but it seems businesses may be taking the issue more seriously. Will we soon be making better use of all that data? Do four peas make a pod? Not yet.

Cynthia Murrell, December 24, 2019

Written by Stephen E. Arnold · Filed Under Marketing, News, Search | Comments Off on Do Four Peas Make a Useful Digital Pod?

Supersearcher: Secrets Revealed

December 12, 2019

Most of the people I encounter today are quick to tell me, “I am an excellent researcher.” Confidence can be useful. When I read articles like “Become a Google Super Searcher with These 17 Tips,” I marvel at how the concept of a super searcher has degraded. I have met some super searchers; for example, Barbara Quint, Marydee Ojala, Reva Basch, and others. You may ask, “Who are these people?” Sorry. I won’t help you. Use the tips in the article to locate information about these real super searchers.

What are the secrets revealed in the Spec? Let me highlight five of these insights for you. You may find them helpful. I reserve comment. Here we go, but you will have to use your search skills to locate the other secrets if the link and the paywall block you, gentle reader:

Know the difference between a link to a source and an ad.
Be suspicious.
Use other search engines in addition to Google.
Wikipedia may not be a verified source.
Arrange your search terms.

Remember. There are 12 more tips. And you can use the Twitter CEO’s favorite search engine which recycles results from other search systems.

Stephen E Arnold, December 12, 2019

Written by Stephen E. Arnold · Filed Under News, Search | 1 Comment

Has Google Trashed Christmas for Kids?

December 6, 2019

Christmas? Ruined by the Google? I don’t believe it, but Metro UK may.

I learned in “Google Ruins Christmas for 1.1 Million Children Every Year Claim Teachers.” If the story is online, isn’t it true?

The write up states:

Research carried out by Exam Papers Plus suggests that each year over a million children are typing into Google whether or not Father Christmas is real.

How is this possible?

1,116,500 children ask Google “Is Santa Real” each year.

Google’s smart search system obviously knows the answer. Kids who do research are informed of the truth delivered by an objective, ad supported online service.

One tip: Don’t make a video for children that espouses untruths or put links in comments sections of video for children. A lump of coal may be placed in one’s stocking. Not just any coal. The lignite stuff.

Stephen E Arnold, December 6, 2019

Written by Stephen E. Arnold · Filed Under Google, News, Search | Comments Off on Has Google Trashed Christmas for Kids?

Microsoft Search: Still Playing an Old Eight Track Cassette?

November 20, 2019

How many times has DarkCyber heard about Microsoft’s improved search? Once, twice? Nope, dozens upon dozens. Whether it was the yip yap about Fast Search & Transfer, Colloquis and its natural language processing, Powerset and its semantic search system, Semantic Machines for natural voice functions, or the home brew solutions from hither and yon in the Microsoft research and development empire. There’s Outlook search and Bing search and probably a version of LinkedIn’s open source search kicking around too.

But that’s irrelevant in today’s “who cares about the past?” datasphere. DarkCyber noted “Here’s How Microsoft Is Looking to Make Search Smarter and More Natural.” What is smart search? An abrogation of user intentions? What is more natural? Boolean logic, field codes, date and time metadata, and similar artifacts of a long lost era seem okay for the DarkCyber team.

The write up explains in its own surrealistic way:

Microsoft’s ultimate goal with Microsoft Search is to provide answers not just to simple queries, but also more personalized, complex ones, such as “Can I bring my pet to work?”. The Microsoft Graph API, semantic knowledge understanding from Bing, machine-reading comprehension and the Office 365 storage and services substrate all are playing a role in bringing this kind of search to Microsoft’s apps.

Yeah, okay. But enterprise SharePoint users still complain that current content cannot be located. The current tools are blind to versions of content residing on departmental servers or parked in a cloud account owned by the legal department. And what about the prices just quoted by an enterprise sales professional? Sorry. You are out of luck, but Microsoft is… trying.

Now grab this peek into the future of Microsoft search:

Turing in Bing already has helped Microsoft to understand semantics via searching by concept instead of keyword. Natural-language processing also has helped with understanding query intent, she noted. Semantic understanding means users don’t have to expect exact word matches. (When searching for Coke, matches with “canned soda,” also could be part of the set of results generated, for example.) The Turing researchers are employing machine reading, as well, to help with contextual search/results.

The chaotic and often misfiring Microsoft search technologies do one thing well: Generate revenue for the legions of certified Microsoft partners.

Users? Yeah, Microsoft may help you too. In the meantime, the lawyers will manage their own contract drafts and eDiscovery materials. The engineers will stick with the tools baked into AutoCAD type systems? The marketers will do what marketers in many companies do? Stuff data on USBs, into the Google cloud, or copy the files to a shared folder on a former employee’s desktop. Yes, it happens.

Microsoft and search. Getting better. Here’s a snippet about Powerset (CNET, 2008)

Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset’s Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.

Time for a new eight track tape?

Stephen E Arnold, November 20, 2019

Written by Stephen E. Arnold · Filed Under Microsoft, News, Search | Comments Off on Microsoft Search: Still Playing an Old Eight Track Cassette?

Amazon Product Search: A Challenge for the GOOG

November 18, 2019

Amazon is gaining ground in the search-based advertising arena. ZDNet reports, “Amazon Search Ad Business to Whittle Away at Google Market Share Through 2021, Says eMarketer.” Citing a recent eMarketer report, writer Larry Dignan tells us that, though Google will remain top dog by a wide margin for the foreseeable future, Amazon is positioned to increase its share. He writes:

“The report finds that Google will continue to dominate search advertising, but its share will fall over time. Amazon is expected to show search ad revenue growth of 29.5% in 2019, 30.7% in 2020 and 26.2% in 2021. Amazon’s advertising business has surged past Microsoft to be No. 2 behind Google, which has 73.1% of the search ad market. Amazon will end 2019 with 12.9%, followed by Microsoft at 6.5%. Verizon Media and Yelp round out the top five with market share of about 2%.In addition, Amazon’s advertising business is closely watched among Wall Street analysts. The search ad business falls into Amazon’s ‘other’ revenue category and many analysts expect it to be a break out business like Amazon Web Services. Google’s market share in the search advertising market is expected to drop to 70.5% by 2021, according to eMarketer estimates.”

Amazon, you see, has a unique advantage—many active shoppers begin their product searches there, so they are already poised to make a purchase. Dignan adds that other retail sites like Wal-Mart, Target, and eBay are also nipping at Google’s search-ad market share.

Cynthia Murrell, November 18, 2019

Written by Stephen E. Arnold · Filed Under Amazon, Business strategy, Google, News, Search | Comments Off on Amazon Product Search: A Challenge for the GOOG

Parsing Document: A Shift to Small Data

November 14, 2019

DarkCyber spotted “Eigen Nabs $37M to Help Banks and Others Parse Huge Documents Using Natural Language and Small Data.” The folks chasing the enterprise search pot of gold may need to pay attention to figuring out specific problems. Eigen uses search technology to identify the important items in long documents. The idea is “small data.”

The write up reports:

The basic idea behind Eigen is that it focuses what co-founder and CEO Lewis Liu describes as “small data”. The company has devised a way to “teach” an AI to read a specific kind of document — say, a loan contract — by looking at a couple of examples and training on these. The whole process is relatively easy to do for a non-technical person: you figure out what you want to look for and analyze, find the examples using basic search in two or three documents, and create the template which can then be used across hundreds or thousands of the same kind of documents (in this case, a loan contract).

Interesting, but the approach seems similar to identify several passages in a text and submitting these to a search engine. This used to be called “more like this.” But today? Small data.

With the cloud coming back on premises and big data becoming user identified small data, what’s next? Boolean queries?

DarkCyber hopes so.

Stephen E Arnold, November 14, 2019

Written by Stephen E. Arnold · Filed Under Natural language processing, News, Search | 1 Comment

Curious about Semantic Search the SEO Way?

November 12, 2019

DarkCyber is frequently curious about search: Semantic, enterprise, meta, multi-lingual, Boolean, and the laundry list of buzzwords marshaled to allow a person to find an answer.

If you want to get a Zithromax Z-PAK of semantic search talk, navigate to ‘Semantic Search Guide.” One has to look closely at the url to discern that this “objective” write up is about search engine optimization or SEO. DarkCyber affectionately describes SEO as the “relevance” killer, but that’s just our old-fashioned self refusing to adapt to the whizzy new world.

The link will point to a page with a number of links. These include:

Target audience and contributions
The knowledge graph explained
The evolution of search
Using Google’s entity search tool
Getting a Wikipedia listing

DarkCyber took a look at the “Evolution of Search” segment. We found it quirky but interesting. For example, we noted this passage:

Now we turn to the heart of full-text search. SEOs tend to dwell on the indexing part of search or the retrieval part of the search, called the Search Engine Results Pages (SERPs, for short). I believe they do this because they can see these parts of the search. They can tell if their pages have been crawled, or if they appear. What they tend to ignore is the black box in the middle. The part where a search engine takes all those gazillion words and puts them in an index in a way that allows for instant retrieval. At the same time, they are able to blend text results with videos, images and other types of data in a process known as “Universal Search”. This is the heart of the matter and whilst this book will not attempt to cover all of this complex subject, we will go into a number of the algorithms that search engines use. I hope these explanations of sometimes complex, but mostly iterative algorithms appeal to the marketer inside you and do not challenge your maths skills too much. If you would like to take these ideas in in video form, I highly recommend a video by Peter Norvig from Google in 2011: https://www.youtube.com/watch?v=yvDCzhbjYWs

Oh, well. This is one way to look at universal search. But Google has silos of indexes. The system after 20 plus years does not federate results across indexes. Semantic search? Yeah, right. Search each index, scan results, cut and paste, and then try to figure out the dates and times. Semantic search does not do time particularly well.

Important. Not to the SEO. Search babble may be more compelling.

If this approach is your cup of tea, inLinks has the hot water you need to understand why finding information is not what it seems.

Stephen E Arnold, November 12, 2019

Written by Stephen E. Arnold · Filed Under News, Search, Semantic | Comments Off on Curious about Semantic Search the SEO Way?

The Key to Millions: Enterprise Search?

November 11, 2019

I thought the world was crazier than ever when enterprise search became the focal point of a multi-billion dollar deal and a multi-year lawsuit. The open source search movement picked up steam as companies shifted their attention from proprietary search and retrieval solutions to those maintained by a “community.” Search became a utility which many information technology professionals found a Bermuda Triangle for careers.

Why?

Our research prior to the publication of the three volumes of the Enterprise Search Report I wrote and our subsequent work on next generation search solutions revealed these problems:

Enterprise search implies one size fits all. Information retrieval needs vary by business unit, department, and individuals. When one pokes around a large organization, one finds numerous search and information access systems. One size? Nope.
Users look for information in the enterprise search system and cannot locate it. The reasons vary, but the universal gripe is, “I can’t locate the document I just saved.” The notion of real time is not one that fits into more organization’s information infrastructure. Cost is one big reason. What looks good in a demo does not work in the “real world” of a company.
Silos. The implications of “enterprise” suggest that a significant amount of information will be available to a user of the search system. Nothing could be further from the reality. Legal keeps some documents under lock and key. Personnel? The same approach. Research? No data goes out of the lab or the researcher’s workstation. On and on.
Changes that are not captured. The top sales professional changes his presentation right before giving a talk to seal a big deal. The changes are not indexed because the sales professional has to do the contract. Missing info? Yes.
Untracked digital information. Enterprise search has not been either quick nor adept at handling social media posts (authorized or unauthorized), interviews, videos produced in lieu of a written report, and similar information objects. Try to find key facts from these content collections. Give up yet.

I could extend this list, but I don’t have the energy. Few are interested in what caused Entopia to go out of business. No one I have spoken with in the last five years cares about why Fast Search & Transfer self destructed. No one cares.

I read “Want to Earn Millions? Launch an AI Based Enterprise Search Startup.” That’s a path to fame and riches. The write up states:

Enterprise search engines based on artificial intelligence systems are taking off fast. Cognitive search systems using NLP can include structured data contained in databases and even nontraditional enterprise information like pictures, video, sound, and machine information, for example, from the internet of things (IoT) gadgets, to bring contextual results in the actual business context.

Sounds good. How about this?

For startups and venture investing, the trend is clear. One prime example of this trend is the world’s leading space agency- NASA has enormous data ever since it was created in 1958. Now, the agency is working to make its data increasingly accessible for rocket designers and researchers. It is redesigning search and analytics abilities utilizing AI and natural language processing (NLP) systems created by a company known as Sinequa which is collaborating with the agency to deploy a worldwide knowledge management ability.

Amazing. Technologies like RECON’s which NASA helped move forward because engineers could not locate key documents is looking at technology which has wobbled from search to intelligence and back again.

A quick reality check, gentle reader, please.

One can download open source search and retrieval software and get decent results. But there are firms which have goosed the “money” in enterprise search to astronomical levels:

Algolia, $100 million
Coveo, $200 million
LucidWorks, $150 million
ThoughtSpot, $248 million.

Now let’s think about Autonomy. At its height, the company reported revenues of about $800 million. HP paid $10.3 billion. After a short period of time, HP realized its massive sales and marketing system could not generate enough new sales and sustainable revenue to keep the Autonomy business an alleged winner.

How will these companies pitching enterprise search generate sufficient revenue to pay back their investors, fund research and development, add filters and other components needed to deal with today’s content flows, and support their existing systems as licensees try to make search work like investigative software?

The answer is, “The odds are quite unappealing.”

Enterprise search has been available for half a century with some of the old school systems still available from OpenText in the guise of BRS Search
Dissatisfaction with enterprise search systems generally runs about 50 to 70 percent in most organizations with such a system
Costs of keeping an enterprise search and retrieval system continue to creep up despite the advent of managed services like those available from Amazon and others

Where are the customers?

That’s the question the article ignores.

Customers are likely to be just as tough to convince to use an enterprise solution as they have been for decades.

Net net: Enterprise search may not be the spring chicken the write up describes. Enterprise search has a history. And history is about to repeat itself. When the Autonomy matter is resolved, there may be be a new search drama to follow.

Keep in mind that Google couldn’t make enterprise search work. But these cash stuffed outfits can? Maybe? Well, probably not.

Stephen E Arnold, November 11, 2019

Written by Stephen E. Arnold · Filed Under Financial, News, Search | Comments Off on The Key to Millions: Enterprise Search?

Google: Bert Search Is Here. Where Is Ernie Advertising?

November 10, 2019

Google wants to stay at the top of search, so they are constantly developing new technology to keep their search algorithms ahead the competition. Fast Company shares the latest on Google’s search technology in the article, “Google Just Got Better At Understanding Your Trickiest Searches.” Search queries power all of Google searches and the problem for search algorithms is understanding which words in the query are the most important. Another issue is that the algorithms need to understand how the words relate to one another. The relationship between keywords and their intent is subtle, particularly with all the subtle meanings in the English language.

Google’s newest search algorithm endeavor is dubbed BERT, short for Bidirectional Encoder Representations from Transformers. What does that mean?

“We non-AI scientists don’t have to worry about what encoders, representations, and transformers are. But the gist of the idea is that BERT trains machine language algorithms by feeding them chunks of text that have some of the words removed. The algorithm’s challenge is to guess the missing words—which turns out to be a game that computers are good at playing, and an effective way to efficiently train an algorithm to understand text. From a comprehension standpoint, it helps “turn keyword-ese into language,” said Google search chief Ben Gomes.”

Apparently the more text fed into a search, the better BERT can interpret its meaning. Google search scientists tested BERT by feeding the algorithm an endless stream of text from the search engine results. The “bidirectional” in BERT’s name comes from how the algorithm interprets data. Traditional search algorithms read English search queries from left to right, while BERT’s bidirectional reads the queries from unconventional ways.

The average user will not recognize that BERT has altered their search results, but it will be beneficial to them. BERT will not have the same reaching impact as universal search and knowledge graph, but it does give Google a competitive advantage.

The Wall Street Journal did some Google related sleuthing. The focus is advertising. You can read the story and look at the very millennial diagram in “How Google Edged Out Rivals and Built the World’s Dominant Ad Machine: A Visual Guide.” You will have to pay to learn what the diagram shown below means. You will also have to do some homework to figure out how advertising and search / retrieval are connected. That’s important to some. But that diagram is remarkable. It uses Google colors too.

Whitney Grace, November 10, 2019

Written by Stephen E. Arnold · Filed Under Google, News, Search | 1 Comment

Search System Bayard

November 1, 2019

Looking for an open source search and retrieval tool written in Rust and built on top of Tantivy (Lucene?). Point your browser to Github and grab the files. The read me file highlights these features:

Full-text search/indexing
Index replication
Bringing up a cluster
Command line interface.

DarkCyber has not tested it, but a journalist contacted us on October 31, 2019, and was interested in the future of search. I pointed out that there are free and open source options.

What people want to buy, however, is something that does not alienate two thirds of the search system’s users the first day the software is deployed.

Surprised? You may not know what you don’t know, but, gentle reader, you are an exception.

Stephen E Arnold, November 1, 2019

Written by Stephen E. Arnold · Filed Under News, Open source, Search | Comments Off on Search System Bayard

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Do Four Peas Make a Useful Digital Pod?

Supersearcher: Secrets Revealed

Has Google Trashed Christmas for Kids?

Microsoft Search: Still Playing an Old Eight Track Cassette?

Amazon Product Search: A Challenge for the GOOG

Parsing Document: A Shift to Small Data

Curious about Semantic Search the SEO Way?

The Key to Millions: Enterprise Search?

Google: Bert Search Is Here. Where Is Ernie Advertising?

Search System Bayard

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta