Index and Search: The Threat Intel Positioning
December 24, 2015
The Dark Web is out there. Not surprisingly, there are a number of companies indexing Dark Web content. One of these firms is Digital Shadows. I learned in “Cyber Threat Intelligence and the Market of One” that search and retrieval has a new suit of clothes. The write up states:
Cyber situational awareness shifts from only delivering generic threat intelligence that informs, to also delivering specific information to defend against adversaries launching targeted attacks against an organization or individual(s) within an organization. Cyber situational awareness brings together all the information that an organization possesses about itself such as its people, risk posture, attack surface, entire digital footprint and digital shadow (a subset of a digital footprint that consists of exposed personal, technical or organizational information that is often highly confidential, sensitive or proprietary). Information is gathered by examining millions of social sites, cloud-based file sharing sites and other points of compromise across a multi-lingual, global environment spanning the visible, dark and deep web.
The approach seems to echo the Palantir “platform” approach. Palantir, one must not forget, is a 2015 version of the Autonomy platform. The notion is that content is acquired, federated, and made useful via outputs and user friendly controls.
What’s interesting is that Digital Shadows indexes content and provides a search system to authorized users. Commercial access is available via tie up in the UK.
My point is that search is alive and well. The positioning of search and retrieval is undergoing some fitting and tucking. There are new terms, new rationale for business cases (fear is workable today), and new players. Under the surface are crawlers, indexes, and search functions.
The death of search may be news to the new players like Digital Shadows, Palantir, and Recorded Future, among numerous other shape shifters.
Stephen E Arnold, December 24, 2015
GoDaddy and Search
December 23, 2015
Years ago, I understood that GoDaddy, the domain name outfit, purchased a search company doing business as Innerprise. The press release issued in September 2004 said:
[GoDaddy] will incorporate Innerprise’s search products, including Enterprise Search 2004 and Innerprise Hosted Search, into the GoDaddy product catalog, augmenting the Company’s complete line of Web development tools including domain name registration, hosting, email systems, SSL certificates, and other complementary products and services that assist customers in building and maintaining a presence on the Internet.
I learned in “Why GoDaddy Built Its Search Engine from Scratch”:
GoDaddy, seeking to improve customer service, built a custom search engine that generates domain names on the fly for its small business customers. Building it wasn’t the best option, the company’s executives say. It was the only option.
The write up points out:
Custom software development is the preferred approach among online businesses.
How did GoDaddy meet its need for a search system:
… The engineering feat required GoDaddy to create search crawlers that can traverse hundreds of international registries, including in South Africa and Indonesia, generating tens of thousands of potential domain names in near real-time. The company also built machine-learning algorithms, in conjunction with open source Hadoop data processing software, to help surface the best domain names it can.
The write up does not reference the Innerprise solution. There is no hint of the cost of the system. The message is that an enterprising Yahoo alum can build a search engine from scratch, and you should too. There’s a new project for your New Year’s resolutions list.
Stephen E Arnold, December 23, 2015
RankBrain, the Latest AI from Google, Improves Search Through Understanding and Learning
December 23, 2015
The article on Entrepreneur titled Meet RankBrain, the New AI Behind Google’s Search Results introduces the AI that Google believes will aid the search engine in better understanding the queries it receives. RankBrain is capable of connecting related words to the search terms based on context and relevance. The article explains,
“The real intention of this AI wasn’t to change visitors’ search engine results pages (SERPs) — rather, it was to predict them. As a machine-learning system, RankBrain actually teaches itself how to do something instead of needing a human to program it…According to Jack Clark, writing for Bloomberg on the topic: “[Rankbrain] uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand.”
Google scientist Greg Corrado spoke of RankBrain actually exceeding his expectations. In one experiment, RankBrain beat a team of search engineers in predicting which pages would rank highest. (The engineers were right 70% of the time, RankBrain 80%.) The article also addresses concerns that many vulnerable brands relying on SEOs may have. The article ventures to guess that it will be mainly newer brands and services that will see a ranking shift. But of course, with impending updates, that may change.
Chelsea Kerwin, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Clamps down on Surprise Costs in BigQuery
December 23, 2015
The article titled Google Promises to Rein in Runaway Query Costs on Fortune discusses the obstacles facing Google’s BigQuery data tool. Google hopes to make BigQuery a major resource for big companies considering cloud technology, but unpredictable costs are getting in the way of the “low-cost big data analytics option” marketing that Google has deployed. Hence, the introduction of “custom quota” and Query Explain,
“Google is now offering potential inquisitors a way to set a “custom quota” to ensure that the number crunching on a specified project does not exceed a pre-set daily limit. In addition, a Query Explain feature promises to lay out, how BigQuery will go about processing the question on the table in advance. That way, in theory, you can see if your questions will be “write, read, or compute heavy” and better anticipate where performance bottlenecks could lurk…”
One might fairly ask why there was any delay in these services, since customers are not known for their fondness of mobile phone type billing surprises. Amazon is also standing next to Google waving at RedShift, a BigQuery competitor in the air. But the simpler pricing and efficiency of BigQuery might be more appealing to many companies, especially with the more controlled processes now available.
Chelsea Kerwin, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Top Trends for Cyber Security and Analytics in 2016
December 23, 2015
With the end of the year approaching, people try to predict what will happen in the New Year. The New Year brings on a sort of fortunetelling, because if companies are able to correctly predict what will happen in 2016 then it serves for positive profit margins and a healthier customer base. The IT industry has its own share of New Year soothsayers and the Executive Biz blog shares that “Booz Allen Cites Top Cyber, Analytics Trends In 2016; Bill Stewart Comments” with possible trends in cyber security and data analytics for the coming year.
Booz Allen Hamilton says that companies will want to merge analytical programs with security programs to receive data sets that show network vulnerabilities; they have been dubbed “fusion centers.”
“ ‘As cyber risk and advanced analytics demand increasing attention from the C-suite, we are about to enter a fundamentally different period,’ said Bill Stewart, executive vice president and leader of commercial cyber business at Booz Allen. ‘The dynamics will change… Skilled leaders will factor these changing dynamics into their planning, investments and operations.’”
The will also be increased risks coming from the Dark Web and risks that are associated with connected systems, such as cloud storage. Booz Allen also hints that companies will need skilled professionals who know how to harness cyber security risks and analytics. That suggestion is not new, as it has been discussed since 2014. While the threat from the Internet and vulnerabilities within systems has increased, the need for experts in these areas as well as better programs to handle them has always been needed. Booz Allen is restating the obvious, the biggest problem is that companies are not aware of these risks and they usually lack the budget to implement preemptive measures.
Whitney Grace, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Gibiru Compromised?
December 22, 2015
I assume, gentle reader, that you are aware of the anonymizing search system called Gibiru. Today (December 22, 2015) I received this notification when I attempted to run a query about Palantir on this search system:
The Kaspersky information link is a 404. I located no substantive information about this possible issue when I poked around online. I had in my files a link to https://anonymous-gibiru.com/ which did not trigger the malicious file warning.
Stephen E Arnold, December 22, 2015
Google Search, 1 – Chinese Search Engines, 0
December 22, 2015
Google is number one in search. The company may not be the top of the Chinese government’s Hit Parade, but the firm’s Web search is the premier’s pajamas.
Need proof? Navigate to “Chinese Search Engines Still No Match for Google.” The write up explains:
At the moment, there is no better search engine than Google. But the company refused to abide by China’s Internet management system, and it decided to withdraw from the mainland in 2010, leaving Chinese Internet users behind. In the past five years, a possible comeback has been investigated, but so far the company’s China focus only seems to be on its Google Play Store for the Android operating system. Google still appears to be reluctant to reboot its search business in China, even though this is what would benefit the majority of domestic Internet users the most. The major consideration as far as Google’s decision makers are concerned is the company’s interests, rather than users’ needs.
Okay, that’s sort of substantive in a subjective way.
The write up continues:
To ensure that China’s Internet users enjoy better search options, relevant laws and regulations should be implemented to restrict the over-commercialization of domestic search engines; and the fairness and neutrality of the search engines should be strengthened so that the search results they offer are not molded by commercial considerations. For instance, search results should not be allowed to be mixed with content from ad auctions. In this way a separation between regular search results and commercial ad content can be ensured.
There you go.
I once liked www.jike.com. The service now redirects to www.chinaso.com. The results are reasonably useful but there was no jump out at me English language option.
For now, let’s assume that Chinese information retrieval systems are not up to the Google standard. You know what that means: Filtered resells, advertising, and silos of indexed information so one can enjoy the thrill of looking for indexed blog content. Hint: Look on the Google News page. The holiday is here so you, gentle reader, can explore the wonder of the Google News interface.
Stephen E Arnold, December 22, 2015
Microsoft Drops Bing from Pulse, Adds Azure Media Services
December 22, 2015
The article on VentureBeat titled Microsoft Rebrands Bing Pulse to Microsoft Pulse, extends Snapshot API ushers in the question: is Bing a dead-end brand? The article states that the rebranding is meant to emphasize that the resource integrates with MS technologies like Power BI, OneNote, and Azure Media Services. It has only been about year since the original self-service tool was released for broadcast TV and media companies. The article states,
“The launch comes a year after Bing Pulse hit version 2.0 with the introduction of a cloud-based self-service option. Microsoft is today showing a few improvements to the tool, including a greatly enhanced Snapshot application programming interface (API) that allows developers to pull data from Microsoft Pulse into Microsoft’s own Power BI tool or other business intelligence software. Previously it was only possible to use the API with broadcast-specific technologies.”
The news isn’t good for Bing, with Pulse gaining popularity as a crowdsourcing resource among such organizations as CNN, CNBC, the Aspen Institute, and the Clinton Global Initiative. It is meant to be versatile and targeted for broadcast, events, market research, and classroom use. Dropping Bing from the name may indicate that Pulse is moving forward, and leaving Bing in the dust.
Chelsea Kerwin, December 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Search Vendors Under Pressure: Welcome to 2016
December 21, 2015
I read ”Silicon Valley’s Cash Party Is Coming to an End.” What took so long? I suppose reality is less fun than fantasy. Why watch a science documentary when one can get lost in Netflix binging.
The write up reports:
Based on interviews with about two dozen venture capitalists and tech investors, 2016 is shaping up to be a year of reckoning for scores of technology start-ups that have yet to prove out their business models and equally challenging for those that raised money at unjustifiably high prices.
Forget the unicorns. There are some enterprise search outfits which have ingested millions of dollars, have convinced investors that big revenue or an HP-Autonomy scale buy out is just around the corner, and proprietary technology or consulting plus open source will produce gushers of organic revenue. Other vendors have tapped their moms, their nest eggs, and angels who believe in fairies.
I am not there is a General Leia Organa to fight Star Wars: The Revenue Battle for most vendors of search and content processing. Bummer. Despite the lack of media coverage for search and content processing vendors, the number of companies pitching information access is hefty. I track about 200 outfits, but many of these are unknown either because they don’t want to be visible or lack any substantive “newsy” magnetism.
My hunch is that this article suggests that 2016 may be different from the free money era the articles suggests is ending. In 2016, my view is that many vendors will find themselves in a modest tussle with their stakeholders. I worked through some of the search and content processing companies taking cash from folks with deep pockets often filled with other people’s money. (Note that investments totals come from Crunchbase). Here’s a list of search and content processing vendors who may face stakeholder and investor pressure. The more more ingested, the greater the interest investors may have in getting a return:
- Antidot, $3 million
- Attensity, $90 million
- Attivio, $71 million
- BA Insight, $14 million
- Connotate, $12 million
- Coveo, $69 million
- Digital Reasoning, $28 million
- Elastic (formerly Elasticsearch), $104 million
- Lucidworks, $53 million
- MarkLogic, $175 million
- Perfect Search, $4 million
- Palantir, $1.7 billion
- Recommind, $22 million
- Sinequa, $5 million
- Sophia Ambiance, $5 million
- X1, $12 million.
Then there are the acquired search systems which been acquired. One assumes these deals will have to produce sustainable revenues in some form:
- Hewlett Packard with Autonomy
- IBM with Vivisimo
- Dassault Systèmes with Exalead
- Lexmark with Brainware and ISYS Search
- Microsoft with Fast Search
- OpenText with BASIS, BRS, Fulcrum, and Nstein
- Oracle with Endeca, InQuira, and Rightnow
- Thomson Reuters with Solcara
Are there sufficient prospects to generate deals large enough to keep these outfits afloat?
There are search and content processing vendors competing for sales with free and open source options and the vendors with proprietary software:
- Ami Albert
- Content Analyst
- Concept Searching
- dtSearch
- EasyAsk
- Exorbyte
- Fabasoft Mindbreeze
- Funnelback
- IHS Goldfire
- SLI Systems
- Smartlogic
- Sprylogics
- SurfRay
- Thunderstone
- WCC Elise
- Zaizi
These search vendors plus many smaller outfits like Intrafind and Srch2 have to find a way to close deals to avoid the fate of Arikus, Convera, Delphes, Dieselpoint, Entopia, Hakia, Kartoo, NuTech Search, and Siderean Software, among others.
Despite the lack of coverage from mid tier consultants and the “real” journalists, the information access sector is moving along. In fact, when one looks at the software options, search and content processing vendors are easily found.
The problem for 2016 will be making sales, generating sustainable revenues, and paying back stakeholders. For many of these companies, the new year will be one which sees a number of outfits going dark. A few will thrive.
Darned exciting times in findability.
Stephen E Arnold, December 21, 2015
Topology Is Finally on Top
December 21, 2015
Topology’s time has finally come, according to “The Unreasonable Usefulness of Imagining You Live in a Rubbery World,” shared by 3 Quarks Daily. The engaging article reminds us that the field of topology emphasizes connections over geometric factors like distance and direction. Think of a subway map as compared to a street map; or, as writer Jonathan Kujawa describes:
“Topologists ask a question which at first sounds ridiculous: ‘What can you say about the shape of an object if you have no concern for lengths, angles, areas, or volumes?’ They imagine a world where everything is made of silly putty. You can bend, stretch, and distort objects as much as you like. What is forbidden is cutting and gluing. Otherwise pretty much anything goes.”
Since the beginning, this perspective has been dismissed by many as purely academic. However, today’s era of networks and big data has boosted the field’s usefulness. The article observes:
“A remarkable new application of topology has emerged in the last few years. Gunnar Carlsson is a mathematician at Stanford who uses topology to extract meaningful information from large data sets. He and others invented a new field of mathematics called Topological data analysis. They use the tools of topology to wrangle huge data sets. In addition to the networks mentioned above, Big Data has given us Brobdinagian sized data sets in which, for example, we would like to be able to identify clusters. We might be able to visually identify clusters if the data points depend on only one or two variables so that they can be drawn in two or three dimensions.”
Kujawa goes on to note that one century-old tool of topology, homology, is being used to analyze real-world data, like the ways diabetes patients have responded to a specific medication. See the well-illustrated article for further discussion.
Cynthia Murrell, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph