A Taxonomy Vendor: Still Chugging Along
January 15, 2020
Semaphore Version 5 from Smartlogic coming soon.
An indexing software company— now morphed into a semantic AI outfit — Smartlogic promises Version 5 of its enterprise platform, Semaphore, will be available any time now.
The company modestly presents the announcement below the virtual fold in the company newsletter, “The Semaphore—Smartlogic’s Quarterly Newsletter—December 2019.” The General Access release should be out by the end of January. We’re succinctly informed because in indexing succinct is good:
“Semaphore 5 embodies innovative technologies and strategies to deliver a unified user experience, enhanced interoperability, and flexible integration:
*A single platform experience – modules are tightly integrated.
*Intuitive and simplified installation and administration – software can be download and configured with minimal clicks. An updated landing page allows you to quickly navigate modules and monitor status.
*Improved coupling of classification and language services, as well as improved performance.
*Updated the linguistic model and fact extraction capabilities.
*New – Document Semantic Analyzer – a performant content analyzer that provides detailed classification and language services results.
*New branding that aligns modules with capabilities and functionality.
“Semaphore 5 continues to focus around 3 core areas – Model & collaborate; fact extraction, auto-classification & language services; and integrate & visualize – in a modular platform that allows you to add capabilities as your business needs evolve. As you upgrade to Semaphore 5, you will be able to take advantage of the additional components and capabilities incorporated in your licensed modules.”
Semaphore is available on-premise, in the cloud, or a combination. Smartlogic (not to be confused with the custom app company Smartlogic) was founded in 2006 and is based in San Jose, California. The company owns SchemaLogic. Yep, we’re excited too. Maybe NLP, predictive analytics, and quantum computing technology will make a debut in this release. If not in software, perhaps in the marketing collateral?
Cynthia Murrell, January 15, 2020
An Interesting Hypothesis about Google Indexing
January 15, 2020
We noted “Google’s Crawl-Less Index.” The main idea is that something has changed in how Google indexes. We circled in yellow this statement from the article:
[Google’ can do this now because they have a popular web browser, so they can retire their old method of discovering links and let the users do their crawling.
The statement needs context.
The speculation is that Google indexes a Web page only when a user visits a page. Google notes the behavior and indexes the page.
What’s happening, DarkCyber concludes, is that Google no longer brute force crawls the public Web. Indexing takes place when a signal (a human navigating to a page) is received. Then the page is indexed.
Is this user-behavior centric indexing a reality?
DarkCyber has noted these characteristics of Google’s indexing in the last year:
- Certain sites are in the Google indexes but are either not updated or updated selectively; for example, the Railway Pension Retiriement Board, MARAD, and similar sites
- Large sites like the Auto Channel no longer have backfiles indexed and findable unless the user resorts to Google’s advanced search syntax. Then the results display less speedily than more current content probably due to the Google caches not having infrequently accessed content in a cache close to that user
- Current content for many specialist sites is not available when it is published. This is a characteristic of commercial sites with unusual domains like dot co and for some blogs.
What’s going on? DarkCyber believes that Google is trying to reduce the increasing and very difficult to control costs associated with indexing new content, indexing updated content (the deltas), and indexing the complicated content which Web sites generate in chasing the dream of becoming number one for a Google query.
Search efficiency, as we have documented in our write ups, books, and columns about Google, boils down to:
- Maximizing advertising value. That’s one reason why query expansion is used. Results match more ads and, thus, the advertiser’s ads get broader exposure.
- Getting away from the old school approach of indexing the billions of Web pages. 90 percent of these Web pages get zero traffic; therefore, index only what’s actually wanted by users. Today’s Google is not focused on library science, relevance, precision, and recall.
- Cutting costs. Cost control at the Google is very, very difficult. The crazy moonshots, the free form approach to management, the need for legions of lawyers and contract workers, the fines, the technical debt of a 20 year old company, the salaries, and the extras—each of these has to be controlled. The job is difficult.
Net net: Even wonder why finding specific information is getting more difficult via Google? Money.
PS: Finding timely, accurate information and obtaining historical content are more difficult, in DarkCyber’s experience, than at any time since we sold our ThePoint service to Lycos in the mid 1990s.
Stephen E Arnold, January 15, 2020
Amazon: Maybe a Restraining Order to Halt JEDI Deal?
January 15, 2020
We noted “Amazon to Seek Order to Block Microsoft From Working on US DoD’s JEDI Contract.” The story appears to have originated with Thomson Reuters, so we assume its ethical and accurate and other good Thomsony stuff.
Here’s the passage we circled in true blue marker:
Amazon.com will ask a judge to temporarily block Microsoft from working on a $10 billion cloud contract from the Pentagon, a court filing showed on Monday [January 13, 2020]. Amazon, which was seen as a favorite for the contract, plans to file a motion for a temporary restraining order on January 24 and a federal court will issue its decision on February 11, according to the filing.
After years on the trail, if true, Amazon may be paying a visit to the Last Chance Saloon. The interaction may go something like this:
Barista or baristo: What will you have, partner?
Amazonian: One JEDI, please.
Barista or baristo: You are out of luck. The last one went to those nice people over there. They’ve been fussing with a Windows 10 laptop for nigh on one hour.
Amazonian: What else you got?
Barista or baristo: The next big shipment don’t arrive until October 1, 2020. Wanna wait, partner?
Amazonian: Nope. [Sound of a Bezos bulldozer starting up and grinding toward the Middle East.]
Stephen E Arnold, January 15, 2020
Education: Is the Future in the Hands of Google Type Companies
January 15, 2020
I spotted a news item which would not be fodder for either this blog or our DarkCyber video program. Then one of the research team emailed me a link to an apparently unrelated article. Then it struck me: The future of education is probably going to be ceded to big companies and sources of revenue which may have interesting avocations.
Let me explain.
The first news item reports that “US Colleges Struggling with Low Enrollment Are Closing at Increasing Rate.” The article, from a source with which I am not familiar, asserts:
For 185 years this college campus in Vermont was teeming with students. Now it sits empty. In January, the school announced it would be closing. ‘I’ve had a very long professional career. It’s the hardest thing I’ve ever had to do – to stand in front – in our auditorium with 400 people and telling principally students, but faculty and staff, that we wouldn’t be opening this fall,” said Bob Allen, President at Green Mountain College.
Sure enough. The institution is a goner.
Then the article which I spotted but decided was not suitable for this blog. Its title? “UVM Gets $1 Million from Google for Open Source Research.” The write up from the delightfully named WCAX asserts:
The unrestricted gift is to support open-source research. Open source is a type of computer software, where source code is released under a license, and the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose.
We know that august institutions like the Massachusetts Institute of Technology will deal with individuals of questionable character when the cash pay off is big enough.
Let’s assume these items are accurate. Now let’s look into a future in which universities become increasingly desperate for money.
Who will provide the dough?
Answer: People who have the money and have a need.
Why? Let me suggest a few reasons:
- Access to lower cost talent
- Opportunity to recycle research into commercial products
- Force students to “like” big companies. See “‘Techlash’: Positive Perceptions of Facebook, Google Crumble on Campuses.“
So who owns what the grant money generates, particularly if the output is open source? What happens if Amazon uses Google funded open source as part of its platform? Who determines how the money is used or, in the case of MIT, how its origin is obfuscated? Is academic R&D a more efficient way to generate innovation?
Net net: The financial situation is likely to lead to the equivalent of corporate naming rights to NFL football stadia. And if you don’t like, don’t attend.
Stephen E Arnold, January 15, 2020
Amazon and New, Quite Real Twitch Opportunity
January 14, 2020
In my lectures, I discuss Twitch. I won’t go into the examples of Twitch content in this blog. You can look for me at one of my law enforcement lectures this year.
I do want to call attention to “Twitch’s Non Gamers Are Finally Having Their Moment.” The write up includes an interesting factoid, which – like most Wired information – is super credible. Here’s the statement:
A new report from stream management site StreamElements indicates that in December, Twitch viewers watched 81 million hours of “Just Chatting,” Twitch’s category for streamers who do exactly that, plus any number of other grab-bag activities. That was a solid 7 million hours more than the first game listed, League of Legends, and 23 million more than the second, Fortnite. The popularity of “Just Chatting” is bleeding into January, too, and according to StreamElements, nongaming may be Twitch’s number two category in 2020.
Several observations:
- Microsoft and the GOOG are working hard to poach gamers from Twitch. This seems like a contentious issue for Amazon, and it will be interesting to see how the Bezos legal eagles respond to the talent drain. Maybe terminate their Prime accounts?
- The surge in Just Chatting viewing points to Twitch becoming the go to source for in real life streaming programs. Most programs are experimental, but a few of them – for example, BadBunny and the Raj thing – are starting to develop into a shotgun marriage of radio talk, live listener feedback, and visual content.
- Traditional content producers like the people who create TV game shows and wanna bes like Apple and Netflix, look a bit old fashioned when compared to content generated by Awkwards_Travel, who may be the future of travel information.
There are downsides. If you are interested in our Amazon briefing which expands on the Twitch innovations and their downside, write darkcyber333 at yandex dot com.
Net net: Twitch started with egames, but it is now on a path to create something which complements games and creates a fresh approach to video.
Stephen E Arnold, January 14, 2020
DuckDuckGo Lands for European Search Users
January 14, 2020
I read “DuckDuckGo Beats Microsoft Bing In Google’s New Android Search Engine Ballot.” There have been numerous reports about this decision.
Digital Information World is a representative write up in today’s world of Google EU analysis. DarkCyber noted:
The introduction of this “choice screen” seems to be a clear response to the antitrust ruling from the European Union during last March and how Google was fined $5 billion by EU regulators. According to them, Google was playing illegally in tying up the search engine to its browser for mobile OS.
Okay. But how does a search engine get listed? We learned:
you can expect Google to not show search engines which are popular but the ones whose providers are willing to pay well.
The write up includes a run down of what search options will be displayed in each EU country. The ones we spotted are:
- DuckDuckGo
- GMX
- Info.com
- Privacy Wall
- Qwant
- Yandex.
Bing is a no show as are Giburu, iSeek, Mojeek, Yippy, and others. It is worth noting that some of these outfits are metasearch engines. This means that the systems send queries to Bing, Google, and other services and aggregate the results. Dogpile and Vivisimo were metasearch engines. DuckDuckGo and Ixquick (StartPage) are metasearch engines`. The reason metasearch is available boils down to cost. It is very expensive to index the public Web.
The DarkCyber team formulated a few hypotheses about the auction, the limitations on default search engines, and the dominance of Google search in Europe; for example, Google accounts for more than 95 percent of the search traffic in Denmark. The same situation exists in Germany and other EU countries.
Will these choices make any difference? Sure, for small outfits like DuckDuckGo any increase in traffic is good news. But will the choices alter Google’s lock on search queries from Europe?
Not a chance.
Does anyone in the EU government know? Probably not. Do these people care? Not to much.
Remember one of my Laws of Information: Online generates natural monopolies. Here’s another Law: User behavior is almost impossible to change once mental memory locks in.
So Google gets paid and keeps on trucking.
Stephen E Arnold, January 14, 2020
Qatalyst Autonomy Presentation 2
January 14, 2020
DarkCyber spotted a link to a second presentation apparently prepared by Qatalyst Partners prior to Hewlett Packard’s purchase of Autonomy in 2011. This second slide deck covers:
- Historical trading performance and related financial data
- Shareholder ownership
- Comparative financial data; for example, Google, Oracle, HP, and other firms.
If you want to check out the first Qatalyst Autonomy presentation, you can find that document at this link. You may be able to locate other Autonomy documents via some scouting around on the Vdocuments.mx site.
These documents are almost a decade old, but they provide useful information for anyone considering an investment in or purchase of an organization engaged in enterprise search and text analysis software.
Documents like these provide some of the factual foundation we use in our reports and analyses. It is far easier to talk about the revenue potential of search and text processing. It is far more difficult to generate sustainable revenue and growing profits.
Why?
The reasons include:
- Ignoring the highly particularized nature of search and text analysis; that is, one size fits all doesn’t, so expensive, one off tailoring is required
- Making a search or text analysis sale is time consuming. The reasons range from “we have been burned before” to “this got the previous information people fired.”
- Keeping the search and text analysis system up and running is expensive.
- Staying competitive is very expensive. Innovation is easy to talk about but difficult to deliver.
- Growth requires acquisitions, and these just add to the cost of dealing with the technical debt the acquirer has to generate money to pay.
Net net: Documents like these are useful and often difficult to obtain.
Stephen E Arnold, January 14, 2020
DarkCyber for January 14, 2020, Now Available
January 14, 2020
The DarkCyber for January 14, 2020, is now available. The program includes stories about ToTok, cyber trends in 2020, and information about the new Amazon Blockchain Policeware report. You can view the video on Vimeo at this link: https://vimeo.com/384343454.
We want to thank the people who commented on our interview with Robert David Steele. We posted this video on December 31, 2019. If you missed that program, you can view it at this link: https://vimeo.com/382165736.
Kenny Toth, January 14, 2020
Enterprise Search and the AI Autumn
January 13, 2020
DarkCyber noted this BBC write up: “Researchers: Are We on the Cusp of an AI Winter?” Our interpretation of the Beeb story can be summarized this way:
“Yikes. Maybe this stuff doesn’t work very well?”
The Beeb explains in Queen’s English based on quotes of experts:
Gary Marcus, an AI researcher at New York University, said: “By the end of the decade there was a growing realization that current techniques can only carry us so far.”
He [Gary Marcus and AI wizard at NYU] thinks the industry needs some “real innovation” to go further. “There is a general feeling of plateau,” said Verena Rieser, a professor in conversational AI at Edinburgh’s Herriot Watt University. One AI researcher who wishes to remain anonymous said we’re entering a period where we are especially skeptical about AGI.
Well, maybe.
But the enterprise search cheerleaders have not gotten the memo. The current crop of “tap your existing unstructured information” companies assert that artificial intelligence infuses their often decades old systems with zip.
The story is being believed by venture outfits. The search for the next big thing is leading to making sense of unstructured text. After all, the world is awash in unstructured text. Companies have to solve this problem or red ink and extinction are just around the corner.
Net net: AI is a collection of tools, some useful, some not too useful. Enterprise search vendors are looking for a way to make sales to executives who don’t know or don’t care about past failures to index unstructured text on a company wide basis with a single system.
Stephen E Arnold, January 13, 2020
Search Your Computer
January 13, 2020
On January 10, 2020, one of the DarkCyber team needed to locate a file on a Windows 10 machine. Windows 10 search was okay, but it generated false drops and took too long.
DarkCyber tried to get its copy of ISYS Desktop Search 8 to work, but that was a non starter. We had given up on Copernic a couple of versions ago. The DTSearch trial had expired as had a couple of New Age search systems vendors had provided to us to test; for example, X1, Vound and Perfect Search, among others. Elastic was overkill. Yikes.
We then checked our files for “desktop search” and located links to these articles:
- Microsoft Windows 10 Search Indexer Diagnostics
- WizFile Is an Ultra-Fast Windows Search Tool
- UltraSearch, Fast Windows File Finder
- Everything Desktop Search
- FileSearchy Is a Fast Windows Search Alternative
- Windows Search Replacement Fileseek
- SearchMyFiles, A Versatile Desktop Search for Windows
We found a couple of these programs useful. In fact, the Everything software, version 1.4 did the trick for us.
We wanted to thank Martin Brinkmann for his articles which provided useful links and helpful information to us. Good job!
Stephen E Arnold, January 13, 2020