January 17, 2017
This week’s HonkinNews takes a look at Yahoo’s post Verizon name. No, our suggestion of yabba dabba hoo or was it “hoot” was not ignored by Yahoo’s marketing wizards. We also highlight Alphabet Google’s erasure of two letters from its “alphabet.” Goners are “S” and “T”. Palantir is hiring a people centric person. The fancy title may have an interesting spin. Two enterprise search vendors kick off 2017 with a blizzard of buzzwords. The depth of the cacaphones is remarkable because search by any other name would return results with questionable precision and recall. The featured story is the Mitre’s Corporation Jason Report. If you have an interest in artificial intelligence and warfighting, the report provides some insight into what the US Department of Defense may be considering. But the highlight of the unclassified document is a helpful description of Google’s TPU. The seven minute program is at this link. For fans of XQuery, we have a bit of input for you too. Proprietary XQuery too. The program is produced in old fashioned black and white and enhanced with theme music from the days of the Stutz Bearcat. From the hotbed of search and content processing, HonkinNews is different. We’re presenting information other big time outfits ignore. Mitre is a variant of Massachusetts Institute of Technology Research. There you go. Live from Harrod’s Creek.
Kenny Toth, January 17, 2017
January 10, 2017
This week’s HonkinNews introduces the concept of cacaphones. Check out the snippets of images. Tasty. We also discuss LucidWorks effort to generate revenue. The firm’s most recent dog paddle is with the USS IBM Watson’s life preserver. If you did not know, predictive analytics has given up the ghost. Don’t mourn, however. A better approach to analytics is driving the digital analysis Hummer now. Our favorite government search and content processing system is not sufficient for the US Air Force. BAE Systems will build custom software to “bridge gaps” and perform other feats of digital magic. Enjoy.
Kenny Toth, January 10, 2017
January 3, 2017
Google: The Digital Gutenberg presents findings from Stephen E Arnold’s monograph about the Google system from 2007 to 2009. Topics covered in the video include how Google has become a digital version of the old Bell Telephone Yellow Pages.
Like the print Yellow Pages, changing the business model is very difficult. As a result, Google remains a one-trick pony riding advertising and saddled with an approach which depends on the fast eroding desktop search model. Google’s behavior — which some insist on calling monopolistic — is under attack by regulators in Europe. Can Google adapt?
Kenny Toth, January 3, 2017
December 27, 2016
The seven minute video — Google: The Calculating Predator Legacy — presents findings from Stephen E Arnold’s monograph about the Google system from 2004 to 2007. The company changed from a friendly Web search system into an enterprise focused on revenues and profit as a publicly traded company.
Topics covered in the video include the Google computing platform, key acquistions like Keyhold and Transformic, the two pivot points for Google’s cost and technology advantages, and the business strategy of the “new” Google, Version 2.0.
Look for Part 3: Google: The Digital Gutenberg on January 3, 2017.
Kenny Toth, December 27, 2016
December 26, 2016
I know that some millennials are not familiar with the Duesenberg automobile. Why would that generation care about an automobile manufacturer that went out of business in 1937. My thought is that the Duesenberg left one nifty artifact: The word doozy which means something outstanding.
I thought of the Duesenberg “doozy” when I read “Unstructured Data Search Engine Has Roots in HPC.” HPC means high performance computing. The acronym suggests a massively parallel system just like the one to which the average mobile phone user has access. The name of the search engine is “Duse,” which here in Harrod’s Creek is pronounced “doozy.”
According to the write up:
One company hoping to tap into the morass of unstructured data is DataFission. The San Jose, California firm was founded in 2013 with the goal of productizing a scale-out search engine , called the Digital Universe Search Engine, or DUSE, that it claims can index just about any piece of data, and make it searchable from any Web-enabled device.
The key to Duse is pattern matching. This is a pretty good method; for example, Brainware used trigrams to power its search system. Since the company disappeared into Lexmark, I am not sure what happened to the company’s system. I think the n-gram patent is owned by a bank located near an abandoned Kodak facility.
The method of the system, as I understand it, is:
- Index content
- Put index into compressed tables
- Allow users to search the index.
The users can “search” by entering queries or dragging “images, videos, or audio files into Duse’s search bar or programmatically via REST APIs.”
What differentiates Duse? The write up states:
The secret sauce lies in how the company indexes the data. A combination of machine learning techniques, such as principal component analysis (PCA), clustering, and classification algorithms, as well as graph link analysis and “nearest neighbor” approach help to find associations in the data.
Dr. Harold Trease, the architect of the Duse system, says:
We generate a high-dimensional signature, a high-dimensional feature vector, that quantifies the information content of the data that we read through,” he says. “We’re not looking for features like dogs or cats or buildings or cars. We’re quantifying the information content related to the data that we read. That’s what we index and put in a database. Then if you pull out a cell phone and take a picture of the dog, we convert that to one of these high-dimensional signatures, and then we compare that to what’s in the database and we find the best matches.
If we index a billion images, we’d end up with a billion points in this search space, and we can look at that search space it has structure to it, and the structure is fantastic. There’s all kinds these points and clusters and strands that connect things. It makes little less sense to humans, because we don’t see things like that. But to the code, it makes perfect sense.
The company’s technology dates from the 1990s and the search technology was part of the company’s medical image analysis and related research.
The write up reports:
The software itself, which today exists as a Python-based Apache Spark application, can be obtained as software product or fully configured on a hardware appliance called DataHunter.
For more information about the company, navigate to this link.
Stephen E Arnold, December 26, 2016
December 20, 2016
For December 20, 2016, a seven minute video about Stephen E Arnold’s The Google Legacy is available. Published in 2004, this monograph is no longer in print. The traditional publisher stumbled into a French wine vat and the disappeared. The Google Legacy explains how decisions made between 1998 and 2004 blazed a trail that other digital pioneers would follow. You can view the free program at this link.
Kenny Toth, December 20, 2016
December 13, 2016
This week’s HonkinNews reveals that Verizon still pines for the Yahoot. We report that Yippy and MC+A have parted company. MC+A unveils a new approach to enterprise search which goes beyond the now defunct and marginalized Google Search Appliance. The specter of change in the Army’s Distributed Common Ground System may translate to high powered professionals looking for new jobs. To help these capable individuals out, we reveal that Digital Reasoning, a 16 year old start up, is nosing into the FinTech market. For those without such juicy prospects, we highlight three actions an unemployed DCGS expert can take to produce some extra holiday cash. Words are dead. As proof, we point to a British university’s innovation that allows a person to search by drawing little sketches. There’s more as well. The seven minute program is available at this link.
Beginning on December 20, 2016, the first of three seven minute videos will become available. The program is “The Google Legacy.” The program highlights some of the findings from Stephen E Arnold’s monograph “The Google Legacy,” first published in 2004. Today’s Google is rooted in the technology implemented between Backrub and Google’s decision to embrace the Yahoo – Overture – GoTo “pay to play” business model.
On December 27, the second video becomes available. This seven minute program presents the key findings from Arnold’s second Google monograph “Google Version 2.0: The Calculating Predator.” Google’s love affair with analytics and Big Data presages the “revolution” which other companies are just beginning to adopt. The word “predator” in the title of the 2007 study, which is also no longer in print, reveals that Googzilla is a hungry beastie. Revenues are the important thing.
On January 3, 2017, the third and final video in this three part series goes live. “Google: The Digital Gutenberg” focuses on the role of Google as a creator of digital content. From machine generated dossiers of entities to the simple reports pumped out for AdWords’ customers, Google is the largest digital publisher in the world. That’s what responding to more than one trillion queries a year will deliver to one’s data center: Information and revenue.
Since announcing these videos, we have received requests for the monographs. Alas, the publisher who converted Arnold’s proprietary research into post embargo print copies has gone missing. We did locate the original slide decks used for briefings about each of these three monographs. If you are interested in a webinar (for fee, of course) write me at benkent2020 at yahoo dot com. The information is still relevant, particularly if one is trying to understand why Google has become a “me too” outfit now wrestling with innovation, its business model, and competitors intent on putting Googzilla in a zoo.
Kenny Toth, December 13, 2016
December 6, 2016
This week’s HonkinNews takes a look at Paltnetir’s appetite for money. We talk about an easy way to search a competitor’s marketing collateral. Yandex heads to Iran with VK.com likely to follow. Microsoft embraces quantum computing but we caution Microsoft not to confuse a qubit with the video game. There’s more too. We reveal the Web search market share of Excite.com. Exciting. Here’s the link.
Kenny Toth, December 6, 2016
November 29, 2016
This week’s HonkinNews covers IBM Watson QAM (not a yam and not part of an internal combustion engine). We also report that Palantir Technologies has stereophonic input to the Trump Transition Team. You will also learn about EasyAsk’s amazing guarantee regarding eCommerce revenues. The show includes another dispatch from the front lines of the artificial intelligence wars. Google is on the offensive. Hitachi aims to become the first Japanese company to notch a perfect score in the enterprise search high diving content. If you thought Willie Shakespeare worked alone, we rain on your parade courtesy of text analytics researchers who identify Kit Marlowe’s digital fingerprints on Henry VI. Imagine. Theater types collaborating. We thought Hollywood types invented this approach to content. This program gives the dates for the three videos about Stephen E Arnold’s The Google Trilogy. You can access the video at this link.
Kenny Toth, November 29, 2016
November 22, 2016
This week’s HonkinNews talks about a Thanksgiving surprise for Verizon AOL professionals. Shakespeare’s idea about what to do with lawyers is revisited with a 21st century twist: Hit the delete key. Is there disappearing information in the Google index? We report on one interesting unfindable. Video search may improve, but for now, it’s a meh. HonkinNews points out that “fake news” is now a thing and offers three variations on spoofing the “experts”. We identify the two French enterprise search vendors who have transformed themselves into artificial intelligence vendors in a remarkable demonstration of their flexibility. IBM Watson’s new initiative homes to get “into” a host of new revenue opportunities. You can view the program at this link. The special Google Trilogy programs filmed live in Harrod’s Creek become available starting on December 20, 2016. There are three seven minute videos in the series which summarizes the principal findings from The Google Legacy (2005), Google Version 2.0 (2007), and Google: The Digital Gutenberg (2009). The monographs are out of print but the information remains timely as Alphabet Google spells out its future.
Kenny Toth, November 22, 2016