Has Google Trashed Christmas for Kids?
December 6, 2019
Christmas? Ruined by the Google? I don’t believe it, but Metro UK may.
I learned in “Google Ruins Christmas for 1.1 Million Children Every Year Claim Teachers.” If the story is online, isn’t it true?
The write up states:
Research carried out by Exam Papers Plus suggests that each year over a million children are typing into Google whether or not Father Christmas is real.
How is this possible?
1,116,500 children ask Google “Is Santa Real” each year.
Google’s smart search system obviously knows the answer. Kids who do research are informed of the truth delivered by an objective, ad supported online service.
One tip: Don’t make a video for children that espouses untruths or put links in comments sections of video for children. A lump of coal may be placed in one’s stocking. Not just any coal. The lignite stuff.
Stephen E Arnold, December 6, 2019
Microsoft Search: Still Playing an Old Eight Track Cassette?
November 20, 2019
How many times has DarkCyber heard about Microsoft’s improved search? Once, twice? Nope, dozens upon dozens. Whether it was the yip yap about Fast Search & Transfer, Colloquis and its natural language processing, Powerset and its semantic search system, Semantic Machines for natural voice functions, or the home brew solutions from hither and yon in the Microsoft research and development empire. There’s Outlook search and Bing search and probably a version of LinkedIn’s open source search kicking around too.
But that’s irrelevant in today’s “who cares about the past?” datasphere. DarkCyber noted “Here’s How Microsoft Is Looking to Make Search Smarter and More Natural.” What is smart search? An abrogation of user intentions? What is more natural? Boolean logic, field codes, date and time metadata, and similar artifacts of a long lost era seem okay for the DarkCyber team.
The write up explains in its own surrealistic way:
Microsoft’s ultimate goal with Microsoft Search is to provide answers not just to simple queries, but also more personalized, complex ones, such as “Can I bring my pet to work?”. The Microsoft Graph API, semantic knowledge understanding from Bing, machine-reading comprehension and the Office 365 storage and services substrate all are playing a role in bringing this kind of search to Microsoft’s apps.
Yeah, okay. But enterprise SharePoint users still complain that current content cannot be located. The current tools are blind to versions of content residing on departmental servers or parked in a cloud account owned by the legal department. And what about the prices just quoted by an enterprise sales professional? Sorry. You are out of luck, but Microsoft is… trying.
Now grab this peek into the future of Microsoft search:
Turing in Bing already has helped Microsoft to understand semantics via searching by concept instead of keyword. Natural-language processing also has helped with understanding query intent, she noted. Semantic understanding means users don’t have to expect exact word matches. (When searching for Coke, matches with “canned soda,” also could be part of the set of results generated, for example.) The Turing researchers are employing machine reading, as well, to help with contextual search/results.
The chaotic and often misfiring Microsoft search technologies do one thing well: Generate revenue for the legions of certified Microsoft partners.
Users? Yeah, Microsoft may help you too. In the meantime, the lawyers will manage their own contract drafts and eDiscovery materials. The engineers will stick with the tools baked into AutoCAD type systems? The marketers will do what marketers in many companies do? Stuff data on USBs, into the Google cloud, or copy the files to a shared folder on a former employee’s desktop. Yes, it happens.
Microsoft and search. Getting better. Here’s a snippet about Powerset (CNET, 2008)
Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset’s Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.
Time for a new eight track tape?
Stephen E Arnold, November 20, 2019
Amazon Product Search: A Challenge for the GOOG
November 18, 2019
Amazon is gaining ground in the search-based advertising arena. ZDNet reports, “Amazon Search Ad Business to Whittle Away at Google Market Share Through 2021, Says eMarketer.” Citing a recent eMarketer report, writer Larry Dignan tells us that, though Google will remain top dog by a wide margin for the foreseeable future, Amazon is positioned to increase its share. He writes:
“The report finds that Google will continue to dominate search advertising, but its share will fall over time. Amazon is expected to show search ad revenue growth of 29.5% in 2019, 30.7% in 2020 and 26.2% in 2021. Amazon’s advertising business has surged past Microsoft to be No. 2 behind Google, which has 73.1% of the search ad market. Amazon will end 2019 with 12.9%, followed by Microsoft at 6.5%. Verizon Media and Yelp round out the top five with market share of about 2%.In addition, Amazon’s advertising business is closely watched among Wall Street analysts. The search ad business falls into Amazon’s ‘other’ revenue category and many analysts expect it to be a break out business like Amazon Web Services. Google’s market share in the search advertising market is expected to drop to 70.5% by 2021, according to eMarketer estimates.”
Amazon, you see, has a unique advantage—many active shoppers begin their product searches there, so they are already poised to make a purchase. Dignan adds that other retail sites like Wal-Mart, Target, and eBay are also nipping at Google’s search-ad market share.
Cynthia Murrell, November 18, 2019
Parsing Document: A Shift to Small Data
November 14, 2019
DarkCyber spotted “Eigen Nabs $37M to Help Banks and Others Parse Huge Documents Using Natural Language and Small Data.” The folks chasing the enterprise search pot of gold may need to pay attention to figuring out specific problems. Eigen uses search technology to identify the important items in long documents. The idea is “small data.”
The write up reports:
The basic idea behind Eigen is that it focuses what co-founder and CEO Lewis Liu describes as “small data”. The company has devised a way to “teach” an AI to read a specific kind of document — say, a loan contract — by looking at a couple of examples and training on these. The whole process is relatively easy to do for a non-technical person: you figure out what you want to look for and analyze, find the examples using basic search in two or three documents, and create the template which can then be used across hundreds or thousands of the same kind of documents (in this case, a loan contract).
Interesting, but the approach seems similar to identify several passages in a text and submitting these to a search engine. This used to be called “more like this.” But today? Small data.
With the cloud coming back on premises and big data becoming user identified small data, what’s next? Boolean queries?
DarkCyber hopes so.
Stephen E Arnold, November 14, 2019
Curious about Semantic Search the SEO Way?
November 12, 2019
DarkCyber is frequently curious about search: Semantic, enterprise, meta, multi-lingual, Boolean, and the laundry list of buzzwords marshaled to allow a person to find an answer.
If you want to get a Zithromax Z-PAK of semantic search talk, navigate to ‘Semantic Search Guide.” One has to look closely at the url to discern that this “objective” write up is about search engine optimization or SEO. DarkCyber affectionately describes SEO as the “relevance” killer, but that’s just our old-fashioned self refusing to adapt to the whizzy new world.
The link will point to a page with a number of links. These include:
- Target audience and contributions
- The knowledge graph explained
- The evolution of search
- Using Google’s entity search tool
- Getting a Wikipedia listing
DarkCyber took a look at the “Evolution of Search” segment. We found it quirky but interesting. For example, we noted this passage:
Now we turn to the heart of full-text search. SEOs tend to dwell on the indexing part of search or the retrieval part of the search, called the Search Engine Results Pages (SERPs, for short). I believe they do this because they can see these parts of the search. They can tell if their pages have been crawled, or if they appear. What they tend to ignore is the black box in the middle. The part where a search engine takes all those gazillion words and puts them in an index in a way that allows for instant retrieval. At the same time, they are able to blend text results with videos, images and other types of data in a process known as “Universal Search”. This is the heart of the matter and whilst this book will not attempt to cover all of this complex subject, we will go into a number of the algorithms that search engines use. I hope these explanations of sometimes complex, but mostly iterative algorithms appeal to the marketer inside you and do not challenge your maths skills too much. If you would like to take these ideas in in video form, I highly recommend a video by Peter Norvig from Google in 2011: https://www.youtube.com/watch?v=yvDCzhbjYWs
Oh, well. This is one way to look at universal search. But Google has silos of indexes. The system after 20 plus years does not federate results across indexes. Semantic search? Yeah, right. Search each index, scan results, cut and paste, and then try to figure out the dates and times. Semantic search does not do time particularly well.
Important. Not to the SEO. Search babble may be more compelling.
If this approach is your cup of tea, inLinks has the hot water you need to understand why finding information is not what it seems.
Stephen E Arnold, November 12, 2019
The Key to Millions: Enterprise Search?
November 11, 2019
I thought the world was crazier than ever when enterprise search became the focal point of a multi-billion dollar deal and a multi-year lawsuit. The open source search movement picked up steam as companies shifted their attention from proprietary search and retrieval solutions to those maintained by a “community.” Search became a utility which many information technology professionals found a Bermuda Triangle for careers.
Why?
Our research prior to the publication of the three volumes of the Enterprise Search Report I wrote and our subsequent work on next generation search solutions revealed these problems:
- Enterprise search implies one size fits all. Information retrieval needs vary by business unit, department, and individuals. When one pokes around a large organization, one finds numerous search and information access systems. One size? Nope.
- Users look for information in the enterprise search system and cannot locate it. The reasons vary, but the universal gripe is, “I can’t locate the document I just saved.” The notion of real time is not one that fits into more organization’s information infrastructure. Cost is one big reason. What looks good in a demo does not work in the “real world” of a company.
- Silos. The implications of “enterprise” suggest that a significant amount of information will be available to a user of the search system. Nothing could be further from the reality. Legal keeps some documents under lock and key. Personnel? The same approach. Research? No data goes out of the lab or the researcher’s workstation. On and on.
- Changes that are not captured. The top sales professional changes his presentation right before giving a talk to seal a big deal. The changes are not indexed because the sales professional has to do the contract. Missing info? Yes.
- Untracked digital information. Enterprise search has not been either quick nor adept at handling social media posts (authorized or unauthorized), interviews, videos produced in lieu of a written report, and similar information objects. Try to find key facts from these content collections. Give up yet.
I could extend this list, but I don’t have the energy. Few are interested in what caused Entopia to go out of business. No one I have spoken with in the last five years cares about why Fast Search & Transfer self destructed. No one cares.
I read “Want to Earn Millions? Launch an AI Based Enterprise Search Startup.” That’s a path to fame and riches. The write up states:
Enterprise search engines based on artificial intelligence systems are taking off fast. Cognitive search systems using NLP can include structured data contained in databases and even nontraditional enterprise information like pictures, video, sound, and machine information, for example, from the internet of things (IoT) gadgets, to bring contextual results in the actual business context.
Sounds good. How about this?
For startups and venture investing, the trend is clear. One prime example of this trend is the world’s leading space agency- NASA has enormous data ever since it was created in 1958. Now, the agency is working to make its data increasingly accessible for rocket designers and researchers. It is redesigning search and analytics abilities utilizing AI and natural language processing (NLP) systems created by a company known as Sinequa which is collaborating with the agency to deploy a worldwide knowledge management ability.
Amazing. Technologies like RECON’s which NASA helped move forward because engineers could not locate key documents is looking at technology which has wobbled from search to intelligence and back again.
A quick reality check, gentle reader, please.
One can download open source search and retrieval software and get decent results. But there are firms which have goosed the “money” in enterprise search to astronomical levels:
- Algolia, $100 million
- Coveo, $200 million
- LucidWorks, $150 million
- ThoughtSpot, $248 million.
Now let’s think about Autonomy. At its height, the company reported revenues of about $800 million. HP paid $10.3 billion. After a short period of time, HP realized its massive sales and marketing system could not generate enough new sales and sustainable revenue to keep the Autonomy business an alleged winner.
How will these companies pitching enterprise search generate sufficient revenue to pay back their investors, fund research and development, add filters and other components needed to deal with today’s content flows, and support their existing systems as licensees try to make search work like investigative software?
The answer is, “The odds are quite unappealing.”
- Enterprise search has been available for half a century with some of the old school systems still available from OpenText in the guise of BRS Search
- Dissatisfaction with enterprise search systems generally runs about 50 to 70 percent in most organizations with such a system
- Costs of keeping an enterprise search and retrieval system continue to creep up despite the advent of managed services like those available from Amazon and others
Where are the customers?
That’s the question the article ignores.
Customers are likely to be just as tough to convince to use an enterprise solution as they have been for decades.
Net net: Enterprise search may not be the spring chicken the write up describes. Enterprise search has a history. And history is about to repeat itself. When the Autonomy matter is resolved, there may be be a new search drama to follow.
Keep in mind that Google couldn’t make enterprise search work. But these cash stuffed outfits can? Maybe? Well, probably not.
Stephen E Arnold, November 11, 2019
Google: Bert Search Is Here. Where Is Ernie Advertising?
November 10, 2019
Google wants to stay at the top of search, so they are constantly developing new technology to keep their search algorithms ahead the competition. Fast Company shares the latest on Google’s search technology in the article, “Google Just Got Better At Understanding Your Trickiest Searches.” Search queries power all of Google searches and the problem for search algorithms is understanding which words in the query are the most important. Another issue is that the algorithms need to understand how the words relate to one another. The relationship between keywords and their intent is subtle, particularly with all the subtle meanings in the English language.
Google’s newest search algorithm endeavor is dubbed BERT, short for Bidirectional Encoder Representations from Transformers. What does that mean?
“We non-AI scientists don’t have to worry about what encoders, representations, and transformers are. But the gist of the idea is that BERT trains machine language algorithms by feeding them chunks of text that have some of the words removed. The algorithm’s challenge is to guess the missing words—which turns out to be a game that computers are good at playing, and an effective way to efficiently train an algorithm to understand text. From a comprehension standpoint, it helps “turn keyword-ese into language,” said Google search chief Ben Gomes.”
Apparently the more text fed into a search, the better BERT can interpret its meaning. Google search scientists tested BERT by feeding the algorithm an endless stream of text from the search engine results. The “bidirectional” in BERT’s name comes from how the algorithm interprets data. Traditional search algorithms read English search queries from left to right, while BERT’s bidirectional reads the queries from unconventional ways.
The average user will not recognize that BERT has altered their search results, but it will be beneficial to them. BERT will not have the same reaching impact as universal search and knowledge graph, but it does give Google a competitive advantage.
The Wall Street Journal did some Google related sleuthing. The focus is advertising. You can read the story and look at the very millennial diagram in “How Google Edged Out Rivals and Built the World’s Dominant Ad Machine: A Visual Guide.” You will have to pay to learn what the diagram shown below means. You will also have to do some homework to figure out how advertising and search / retrieval are connected. That’s important to some. But that diagram is remarkable. It uses Google colors too.
Whitney Grace, November 10, 2019
Search System Bayard
November 1, 2019
Looking for an open source search and retrieval tool written in Rust and built on top of Tantivy (Lucene?). Point your browser to Github and grab the files. The read me file highlights these features:
- Full-text search/indexing
- Index replication
- Bringing up a cluster
- Command line interface.
DarkCyber has not tested it, but a journalist contacted us on October 31, 2019, and was interested in the future of search. I pointed out that there are free and open source options.
What people want to buy, however, is something that does not alienate two thirds of the search system’s users the first day the software is deployed.
Surprised? You may not know what you don’t know, but, gentle reader, you are an exception.
Stephen E Arnold, November 1, 2019
Metasearch Engine Changes Hands
October 28, 2019
In 1998 a Wall Street professionals founded Ixquick. As I recall, the developer was David Bodnick. Like other search developers, selling was better than pumping ads and trying to compete in the world of the digital library card catalog. Ixquick’s buyer was Surfboard Holding BV.
Metasearch engines like DuckDuckGo sends queries to other search engines and present a list of semi-deduplicated results. Dogpile and Vivisimo were other metasearch engines. The Ixquick twist was privacy. I don’t want to go into the notion of privacy in an ad supported search system in this item.
DarkCyber noted a Reddit post that reveals System1 (Privacy One Group) now owns the service. Note the word privacy. As I said, I am not going to explain for the umpteenth time why free Web search or free services of any type may have a different notion of privacy than someone in Harrod’s Creek, Kentucky.
Should I explain the issues related to metasearch systems? Nope. Just like the privacy thing. No one understands and no one cares.
Stephen E Arnold, October 28, 2019
Google NLP Search: Fortune Loves It. Simple Queries Reveal Shortcomings
October 25, 2019
I read “Google Says Its Latest Tech Tweak Provides Better Search Results. Here’s How.” DarkCyber enjoys Fortune Magazine’s how to explanations. They are just. So. Wonderful.
We learned:
Google’s goal is to make it easier for users, who often don’t know how to enter queries for the information they want. Since its search engine debuted in 1997, Google has focused on getting its technology to better understand natural language to produce relevant results even in cases where users enter a misspelled word or a query that is off target. With the latest change, Google will also now consider the sequential order in which words are placed in a search, instead of returning results based on a “mixed bag” of keywords.
Yes, but what about tuning search to advertising? What about ignoring bound phrases? What about Boolean logic? What about words like “terminal” which have different, often difficult to disambiguate meanings?
Fortune jumps over these questions.
Try this query on the “new” Google?
What companies compete with Subsentio?
What about this one?
Amazon law enforcement products
Not what I had in mind. I was thinking about QLDB and digital currency deanonymization.
Sorry, Google. Not yet. Personalization does not work either, by the way. (You know. Examine the search history, etc. etc.)
Fortune, check out where Google’s ad revenue comes from. Just a small clue to put Google search in its context.
Stephen E Arnold, October 25, 2019