December 26, 2016
I know that some millennials are not familiar with the Duesenberg automobile. Why would that generation care about an automobile manufacturer that went out of business in 1937. My thought is that the Duesenberg left one nifty artifact: The word doozy which means something outstanding.
I thought of the Duesenberg “doozy” when I read “Unstructured Data Search Engine Has Roots in HPC.” HPC means high performance computing. The acronym suggests a massively parallel system just like the one to which the average mobile phone user has access. The name of the search engine is “Duse,” which here in Harrod’s Creek is pronounced “doozy.”
According to the write up:
One company hoping to tap into the morass of unstructured data is DataFission. The San Jose, California firm was founded in 2013 with the goal of productizing a scale-out search engine , called the Digital Universe Search Engine, or DUSE, that it claims can index just about any piece of data, and make it searchable from any Web-enabled device.
The key to Duse is pattern matching. This is a pretty good method; for example, Brainware used trigrams to power its search system. Since the company disappeared into Lexmark, I am not sure what happened to the company’s system. I think the n-gram patent is owned by a bank located near an abandoned Kodak facility.
The method of the system, as I understand it, is:
- Index content
- Put index into compressed tables
- Allow users to search the index.
The users can “search” by entering queries or dragging “images, videos, or audio files into Duse’s search bar or programmatically via REST APIs.”
What differentiates Duse? The write up states:
The secret sauce lies in how the company indexes the data. A combination of machine learning techniques, such as principal component analysis (PCA), clustering, and classification algorithms, as well as graph link analysis and “nearest neighbor” approach help to find associations in the data.
Dr. Harold Trease, the architect of the Duse system, says:
We generate a high-dimensional signature, a high-dimensional feature vector, that quantifies the information content of the data that we read through,” he says. “We’re not looking for features like dogs or cats or buildings or cars. We’re quantifying the information content related to the data that we read. That’s what we index and put in a database. Then if you pull out a cell phone and take a picture of the dog, we convert that to one of these high-dimensional signatures, and then we compare that to what’s in the database and we find the best matches.
If we index a billion images, we’d end up with a billion points in this search space, and we can look at that search space it has structure to it, and the structure is fantastic. There’s all kinds these points and clusters and strands that connect things. It makes little less sense to humans, because we don’t see things like that. But to the code, it makes perfect sense.
The company’s technology dates from the 1990s and the search technology was part of the company’s medical image analysis and related research.
The write up reports:
The software itself, which today exists as a Python-based Apache Spark application, can be obtained as software product or fully configured on a hardware appliance called DataHunter.
For more information about the company, navigate to this link.
Stephen E Arnold, December 26, 2016
December 20, 2016
For December 20, 2016, a seven minute video about Stephen E Arnold’s The Google Legacy is available. Published in 2004, this monograph is no longer in print. The traditional publisher stumbled into a French wine vat and the disappeared. The Google Legacy explains how decisions made between 1998 and 2004 blazed a trail that other digital pioneers would follow. You can view the free program at this link.
Kenny Toth, December 20, 2016
December 13, 2016
This week’s HonkinNews reveals that Verizon still pines for the Yahoot. We report that Yippy and MC+A have parted company. MC+A unveils a new approach to enterprise search which goes beyond the now defunct and marginalized Google Search Appliance. The specter of change in the Army’s Distributed Common Ground System may translate to high powered professionals looking for new jobs. To help these capable individuals out, we reveal that Digital Reasoning, a 16 year old start up, is nosing into the FinTech market. For those without such juicy prospects, we highlight three actions an unemployed DCGS expert can take to produce some extra holiday cash. Words are dead. As proof, we point to a British university’s innovation that allows a person to search by drawing little sketches. There’s more as well. The seven minute program is available at this link.
Beginning on December 20, 2016, the first of three seven minute videos will become available. The program is “The Google Legacy.” The program highlights some of the findings from Stephen E Arnold’s monograph “The Google Legacy,” first published in 2004. Today’s Google is rooted in the technology implemented between Backrub and Google’s decision to embrace the Yahoo – Overture – GoTo “pay to play” business model.
On December 27, the second video becomes available. This seven minute program presents the key findings from Arnold’s second Google monograph “Google Version 2.0: The Calculating Predator.” Google’s love affair with analytics and Big Data presages the “revolution” which other companies are just beginning to adopt. The word “predator” in the title of the 2007 study, which is also no longer in print, reveals that Googzilla is a hungry beastie. Revenues are the important thing.
On January 3, 2017, the third and final video in this three part series goes live. “Google: The Digital Gutenberg” focuses on the role of Google as a creator of digital content. From machine generated dossiers of entities to the simple reports pumped out for AdWords’ customers, Google is the largest digital publisher in the world. That’s what responding to more than one trillion queries a year will deliver to one’s data center: Information and revenue.
Since announcing these videos, we have received requests for the monographs. Alas, the publisher who converted Arnold’s proprietary research into post embargo print copies has gone missing. We did locate the original slide decks used for briefings about each of these three monographs. If you are interested in a webinar (for fee, of course) write me at benkent2020 at yahoo dot com. The information is still relevant, particularly if one is trying to understand why Google has become a “me too” outfit now wrestling with innovation, its business model, and competitors intent on putting Googzilla in a zoo.
Kenny Toth, December 13, 2016
December 6, 2016
This week’s HonkinNews takes a look at Paltnetir’s appetite for money. We talk about an easy way to search a competitor’s marketing collateral. Yandex heads to Iran with VK.com likely to follow. Microsoft embraces quantum computing but we caution Microsoft not to confuse a qubit with the video game. There’s more too. We reveal the Web search market share of Excite.com. Exciting. Here’s the link.
Kenny Toth, December 6, 2016
November 29, 2016
This week’s HonkinNews covers IBM Watson QAM (not a yam and not part of an internal combustion engine). We also report that Palantir Technologies has stereophonic input to the Trump Transition Team. You will also learn about EasyAsk’s amazing guarantee regarding eCommerce revenues. The show includes another dispatch from the front lines of the artificial intelligence wars. Google is on the offensive. Hitachi aims to become the first Japanese company to notch a perfect score in the enterprise search high diving content. If you thought Willie Shakespeare worked alone, we rain on your parade courtesy of text analytics researchers who identify Kit Marlowe’s digital fingerprints on Henry VI. Imagine. Theater types collaborating. We thought Hollywood types invented this approach to content. This program gives the dates for the three videos about Stephen E Arnold’s The Google Trilogy. You can access the video at this link.
Kenny Toth, November 29, 2016
November 22, 2016
This week’s HonkinNews talks about a Thanksgiving surprise for Verizon AOL professionals. Shakespeare’s idea about what to do with lawyers is revisited with a 21st century twist: Hit the delete key. Is there disappearing information in the Google index? We report on one interesting unfindable. Video search may improve, but for now, it’s a meh. HonkinNews points out that “fake news” is now a thing and offers three variations on spoofing the “experts”. We identify the two French enterprise search vendors who have transformed themselves into artificial intelligence vendors in a remarkable demonstration of their flexibility. IBM Watson’s new initiative homes to get “into” a host of new revenue opportunities. You can view the program at this link. The special Google Trilogy programs filmed live in Harrod’s Creek become available starting on December 20, 2016. There are three seven minute videos in the series which summarizes the principal findings from The Google Legacy (2005), Google Version 2.0 (2007), and Google: The Digital Gutenberg (2009). The monographs are out of print but the information remains timely as Alphabet Google spells out its future.
Kenny Toth, November 22, 2016
November 21, 2016
Watching surveillance videos without sound? Wish you could read lips? Don’t have time to learn how to read lips? Alphabet Google’s DeepMind has a solution for anyone in this predicament. “Google’s DeepMind AI Can Lip-Read TV Shows Better Than a Pro” reveals:
A project by Google’s DeepMind and the University of Oxford applied deep learning to a huge data set of BBC programs to create a lip-reading system that leaves professionals in the dust.
How accurate is the system? The write up states:
The professional annotated just 12.4 per cent of words without any error. But the AI annotated 46.8 per cent of all words in the March to September data set without any error. And many of its mistakes were small slips, like missing an ‘s’ at the end of a word. With these results, the system also outperforms all other automatic lip-reading systems.
I think this means that the Google method is almost four times more accurate. Software is faster and does not require health care, vacation days, and coddling.
The write up sidesteps law enforcement use of the system by emphasizing “improved hearing aids, silent dictation in public spaces, and speech recognition in noisy environments.”
There are other applications, however.
Stephen E Arnold, November 28, 2016
November 15, 2016
The weekly Beyond Search news video is available at this link. Stories include Mr. Thiel goes to Washington, the “best” entity extraction software and the not-so-best systems. You will learn the latest about the Yahoot security consequences, and more. The video also includes information about the US government’s open source code Web site. Stephen E Arnold points out that the Darpa Dark Web open source code is not included in the Code.gov offerings. Never fear. The Darpa listing does appear in the forthcoming Dark Web Notebook. If you want a copy of this new Beyond Search study, write benkent2020 at yahoo dot com and reserve your password protected PDF today.
Over the New Year’s break three, free special seven minute programs will air on December 20, December 27, and January 3, 2017, HonkinNews will run a weekly seven minute video. Each video presents the principal takeaways from Stephen E Arnold’s Google Trilogy: The Google Legacy (2004), Google Version 2 (2007), and Google: The Digital Gutenberg (2009). The information remains timely even though Alphabet Google is in a somewhat excited state of shifting in order to generate revenue as the volume of searches from the desktop declines squishing Google’s online ad methods for old fashioned Internet access.
Kenny Toth, November 15, 2016
November 8, 2016
This week HonkinNews comments about Microsoft’s mobile phone adventure. You will learn about geo spatial analytics’ companies that may have an impact in certain secret applications. Palantir makes news again. There is more. You can view the seven minute video at this link https://youtu.be/UWCk4n_AC0Y.
Kenny Toth, November 8, 2016
November 3, 2016
I am not sure if the Alphabet Google thing will be down with this new video search system over the long haul. If you want a different way to locate academic videos, you will want to explore MicroSearch’s system. MicroSearch says that it is “a boutique search engine company, providing private, secure video and document cloud storage as well as custom search services.”
According to the write up, the system aggregates university videos and:
includes a video player that shows the video playing on the left and a transcript tracking with the video on the right. Clicking into another sentence in the transcript jumps the user to that part of the video.
I highlighted this passage:
The service also includes a search tool that allows the user to search on transcript contents, title, description, duration, category, tags, YouTube channel and year uploaded. The same fields are available as metadata, when search results are displayed and downloaded as an Excel export file. An advanced search feature lets the user enter a few letters into the transcription field and then click on an Index button next to the field to obtain a window that displays all of the terms with that series of letters.
Our test queries suggested that the system is less wonky than Google’s video search. The fact that Google is splitting its text index into one part for mobile and one part for traditional desktop search makes clear that search at Google is a work in progress. With a new search system for a segment of YouTube videos, one can conclude that YouTube video search is not a home run for some users.
Perhaps more attention on search and less on Loon balloons might solve the problem. On the other hand, Alphabet Google can simply block developers of “better mousetraps” and move forward with its online advertising programs and projects like solving death. Search is for revenue and maybe not for finding relevant content?
Stephen E Arnold, November 3, 2016