July 11, 2016
If you want to buy some Microsoft smart APIs, now is the time. Navigate to Microsoft Azure and pick your API. On offer are some content processing APS like text search, image search, autosuggest, etc. How much are these goodies? Well, the fee varies with the number of transactions. What’s a “transaction”? Like Amazon AWS, you will find that out as you move forward, gentle reader. Here’s the display for the search API fees:
I know that these low contrast Web pages are just so easy to read. In a nutshell, you will owe the Microsofties by tier. The S1, S2, etc. remind me of IBM’s tiered prices. The number is dependent on how may transaction, which tier, and I assume any other special goodies one requires. Think in terms of blocks of $30.
Enjoy the taxi meter approach. In my experience, these work out really well for those selling services. I love metered, tiered prices with “transactions” left wonderfully fluid. Does the phrase “lock in” resonate? Does the concept of “price lift” have relevance? Have fun budgeting costs over a three to five year span.
Stephen E Arnold, July 11, 2016
July 11, 2016
Tech-security firm BAE Systems has sketched out six cybercriminal types, we learn from “BAE Systems Unmasks Today’s Cybercriminals” at the MENA Herald. We’re told the full descriptions reveal the kinds of havoc each type can wreak, as well as targeted advice for thwarting them. The article explains:
“Threat intelligence experts at BAE Systems have revealed ‘The Unusual Suspects’, built on research that demonstrates the motivations and methods of the most common types of cybercriminal. The research, which is derived from expert analysis of thousands of cyber attacks on businesses around the world. The intention is to help enterprises understand the enemies they face so they can better defend against cyber attack.”
Apparently, such intel is especially needed in the Middle East, where cybercrime was recently found to affect about 30 percent of organizations. Despite the danger, the same study from PwC found that regional companies were not only unprepared for cyber attacks, many did not even understand the risks.
The article lists the six cybercriminal types BAE has profiled:
“The Mule – naive opportunists that may not even realise they work for criminal gangs to launder money;
The Professional – career criminals who ‘work’ 9-5 in the digital shadows;
The Nation State Actor – individuals who work directly or indirectly for their government to steal sensitive information and disrupt enemies’ capabilities;
The Activist – motivated to change the world via questionable means;
The Getaway – the youthful teenager who can escape a custodial sentence due to their age;
The Insider – disillusioned, blackmailed or even over-helpful employees operating from within the walls of their own company.”
Operating in more than 40 countries, BAE Systems is committed to its global perspective. Alongside its software division, the company also produces military equipment and vehicles. Founded in 1999, the company went public in 2013. Unsurprisingly, BAE’s headquarters are in Arlington, Virginia, just outside of Washington DC. As of this writing, they are also hiring in several locations.
Cynthia Murrell, July 11, 2016
July 8, 2016
Another day, another merger. PR Newswire released a story, VirtualWorks and Language Tools Announce Merger, which covers Virtual Works’ purchase of Language Tools. In Language Tools, they will inherit computational linguistics and natural language processing technologies. Virtual Works is an enterprise search firm. Erik Baklid, Chief Executive Officer of VirtualWorks is quoted in the article,
“We are incredibly excited about what this combined merger means to the future of our business. The potential to analyze and make sense of the vast unstructured data that exists for enterprises, both internally and externally, cannot be understated. Our underlying technology offers a sophisticated solution to extract meaning from text in a systematic way without the shortcomings of machine learning. We are well positioned to bring to market applications that provide insight, never before possible, into the vast majority of data that is out there.”
This is another case of a company positioning themselves as a leader in enterprise search. Are they anything special? Well, the news release mentions several core technologies will be bolstered due to the merger: text analytics, data management, and discovery techniques. We will have to wait and see what their future holds in regards to the enterprise search and business intelligence sector they seek to be a leader in.
July 7, 2016
I read “Information about DuckDuckGo’s Partnership with Yahoo.” Yahoo is into search DuckDuckGo style. According to the write up:
our latest partnership with Yahoo enables DuckDuckGo to get access to features you’ve been requesting for years:
Date filters let you filter results from the last day, week and month.
Site links help you quickly get to subsections of sites.
Farewell, Inktomi, AllTheWeb, Google, Microsoft. Yahoo, and home brew craziness. has a new findability future. Now about the size of the index? Will Yahoo’s new owner have a fresh idea? Worth watching.
Stephen E Arnold, July 7, 2016
July 6, 2016
I read “New Search Engine Makes Data Instantly Searchable, Increases Data Retention.” I like that instantly assertion. Keep in mind that for me “instantly” means immediately. Okay. I also noted the “everything.” Bold assertions.
According to the write up:
This search engine makes data instantly searchable and increases data retention. It’s also crafted so that the archived data doesn’t impact the current inflow. Another feature is an anomaly system, with everything available at the user’s fingertips.
What is Rocana besides a search engine? Well, it turns out that the company provides that :
Limitless online access and analysis of all operational data gives CIOs and technology leaders a distinct competitive edge in today’s digital economy.
So we have an “all” tossed in for good measure in the explanation of Rocana. A video explains the niche the company’s search technology targets: operational data and anomaly. The search engine scales to “volumes of data that none of our competitors can achieve.” The company delivers “results that matter.” The company is going “beyond search.” There you go. Instantly. Everything.
Stephen E Arnold, July 6, 2016
July 5, 2016
For research purposes, I surf the Dark Web on a regular basis. It is like skulking around the back alleys of a major city and witnessing all types of crime, but keeping to yourself. I have seen a few Web sites that could be deemed as legal, but most of the content I peruse is illegal: child pornography, selling prescription drugs, and even a hitman service. I have begun to think that everything on the Dark Web is illegal, except Help Net Security tells me that “Dark Web Mapping Reveals That Half Of The Content Is Legal.”
The Centre for International Governance Innovation (CIGI) conducted global survey and discovered that seven in ten (71%) of the surveyors believe the Dark Web needs to be shut down. There is speculation if the participants eve had the right definition about what the Dark Web is and might have confused the terms “Dark Web” and “Dark Net”.
Darksum, however, mapped the Tor end of the Dark Web and discovered some interesting facts:
- “Of the 29,532 .onion identified during the sampling period – two weeks in February 2016 – only 46% percent could actually be accessed. The rest were likely stort-lived C&C servers used to manage malware, chat clients, or file-sharing applications.
- Of those that have been accessed and analyzed with the companies’ “machine-learning” classification method, less than half (48%) can be classified as illegal under UK and US law. A separate manual classification of 1,000 sites found about 68% of the content to be illegal under those same laws.”
Darksum’s goal is to clear up misconceptions about the Dark Web and to better understand what is actually on the hidden sector of the Internet. The biggest hope is to demonstrate the Dark Web’s benefits.
July 1, 2016
Elastic’s Elasticsearch has become one of the go to open source search and retrieval solutions. Based on Lucene, the system has put the heat on some of the other open source centric search vendors. However, search is a tricky beastie.
Navigate to “AWS Elasticsearch Service Woes” to get a glimpse of some of the snags which can poke holes in one’s rip stop hiking garb. The problems are not surprising. One does not know what issues will arise until a search system is deployed and the lucky users are banging away with their queries or a happy administrator discovers that Button A no longer works.
The write up states:
We kept coming across OOM issues due the JVMMemoryPresure spiking and inturn the ES service kept crapping out. Aside from some optimization work, we’d more than likely have to add more boxes/resources to the cluster which then means more things to manage. This is when we thought, “Hey, AWS have a service for this right? Let’s give that a crack?!”. As great as having it as a service is, it certainly comes with some fairly irritating pitfalls which then causes you to approach the situation from a different angle.
One approach is to use templates to deal with the implementation of shard management in AWS Elasticsearch. Sample templates are provided in the write up. The fix does not address some issues. The article provides a link to a reindexing tool called es-tool.
The most interesting comment in the article in my opinion is:
In hindsight I think it may have been worth potentially sticking with and fleshing out the old implementation of Elasticsearch, instead of having to fudge various things with the AWS ES service. On the other hand it has relieved some of the operational overhead, and in terms of scaling I am literally a couple of clicks away. If you have large amounts of data you pump into Elasticsearch and you require granular control, AWS ES is not the solution for you. However if you need a quick and simple Elasticsearch and Kibana solution, then look no further.
My takeaway is to do some thinking about the strengths and weaknesses of the Amazon AWS before chopping through the Bezos cloud jungle.
Stephen E Arnold, July 1, 2016
July 1, 2016
The Tor-enabled search engine DuckDuckGo has received attention recently for being an search engine that does not track users. We found their activity report that shows a one year average of their direct queries per day. DuckDuckGo launched in 2008 and offers an array of options to prevent “search leakage”. Their website defines this term as the sharing of personal information, such as the search terms queried. Explaining a few of DuckDuckGo’s more secure search options, their website states:
“Another way to prevent search leakage is by using something called a POST request, which has the effect of not showing your search in your browser, and, as a consequence, does not send it to other sites. You can turn on POST requests on our settings page, but it has its own issues. POST requests usually break browser back buttons, and they make it impossible for you to easily share your search by copying and pasting it out of your Web browser’s address bar.
Finally, if you want to prevent sites from knowing you visited them at all, you can use a proxy like Tor. DuckDuckGo actually operates a Tor exit enclave, which means you can get end to end anonymous and encrypted searching using Tor & DDG together.”
Cybersecurity and privacy have become hot topics since Edward Snowden made headlines in 2013, which is notably when DuckDuckGo’s exponential growth begins to take shape. Recognition of Tor also became more mainstream around that time, 2013, which is when the Silk Road shutdown occurred, placing the Dark Web in the news. It appears that starting a search engine focused on anonymity in 2008 was not such a bad idea.
Megan Feil, July 1, 2016
June 29, 2016
Navigating the Dark Web can be a hassle, because many of the Web sites are shut down before you have the chance to learn what nefarious content, services, or goods are available. Some of these sites go down on their own, but law enforcement had a part in dismantling them as well. Some Dark Web sites are too big and encrypted to be taken down and sometimes they exchange hands, such as Silk Road and now Hell. Motherboard explains that “Dark Web Hacking Forum ‘Hell’ Appears To Have New Owners.”
The Real Deal, a computer exploit market, claimed to take ownership of Hell, the hacking forum known for spreading large data dumps and stolen data. Real Deal said of their acquisition:
“ ‘We will be removing the invite-only system for at least a week, and leave the “vetting” forum for new users,’ one of The Real Deal admins, who also used the handle The Real Deal, told Motherboard in an encrypted chat. ‘It’s always nice to have a professional community that meets our market’s original niche, hopefully it will bring some more talent both to the market and to the forums,’ the admin continued. ‘And it’s no secret that we as admins would enjoy the benefit of ‘first dibs’ on buying fresh data, resources, tools, etc.’”
The only part of Hell that has new administrators is the forum due to the old head had personal reasons that required more attention. Hell is one of the “steadier” Dark Web sites and it played a role in the Adult FriendFinder hack, was the trading place for Mate1 passwords, and hosted breaches from a car breathalyzer maker.
Standard news for the Dark Web, until the next shutdown and relaunch.
June 28, 2016
I scanned a number of write ups about Google’s embrace of machine learning and smart software. I supplement my Google queries with the results of other systems. Some of these have their own index; for example, Yandex.ru and Exalead. Others are metasearch engines will suck in results and do some post processing to help answer the users’ questions. Others are disappointing and I check them out when I have a client who is willing to pay for stone flipping; for example, DuckDuckGo, iSeek, or the estimable Qwant. (I love quirky spelling too.)
I read “RankBrain Third Most Important Factor Determining Google Search Results.” Here’s the quote I noted:
Google is characteristically fuzzy on exactly how it improves search (something to do with the long tail? Better interpretation of ambiguous requests?) but Jeff Dean [former AltaVista wizard] says that RankBrain is “involved in every query,” and affects the actual rankings “probably not in every query but in a lot of queries.” What’s more, it’s hugely effective. Of the hundreds of “signals” Google search uses when it calculates its rankings (a signal might be the user’s geographical location, or whether the headline on a page matches the text in the query), RankBrain is now rated as the third most useful. “It was significant to the company that we were successful in making search better with machine learning,” says John Giannandrea. “That caused a lot of people to pay attention.”Pedro Domingos, the University of Washington professor who wrote The Master Algorithm, puts it a different way: “There was always this battle between the retrievers and the machine learning people,” he says. “The machine learners have finally won the battle.”
I have noticed in the last year, that I am unable to locate certain documents when I use the words and phrases which had served me well before smart software became the cat’s pajamas.
One recent example was my need to locate a case example about a German policeman’s trials and tribulations with the Dark Web. When I first located this document, I was trying to verify an anecdote shared with me after one of my intelligence community lectures.
I had the document in my file and I pulled it up on my monitor. The document in question is the work of an outfit and person labeled “Lars Hilse.” The title of the write up is “Dark Web & Bitcoin: Global Terrorism “Threat Assessment. The document was published in April 2013 with an update issued in November 2013. (That document was the source or maybe confirmed the anecdote about the German policeman and his Dark Web research.)
For my amusement, I wondered if I could use the new and improved Google Web search to locate the document. I display section 4.8 on my screen. The heading of the section is “Extortion (of Law Enforcement Personnel).
I entered the phrase into Google without quotes. Here’s the first page of results:
None of the hits points to the document with the five word phrase.