July 17, 2016
Short honk: Are you a fan of Elasticsearch, the Lucene based open source system giving proprietary vendors of search systems a migraine? If you are, you will want to point your browser at “Elasticsearch-API Info.” The information is presented in a table which lists and annotates Elasticsearch’s APIs from bulk to update. Useful stuff.
Stephen E Arnold, July 17, 2016
July 16, 2016
Just a factoid. There is now a version of Elasticsearch which is integrated with Cassandra. You can get the code for version 2.1.1-14 via Github. Just another example of the diffusion of the Elastic search system.
Stephen E Arnold, July 16, 2016
July 13, 2016
I love the results I get for pop stars, TV shows, and binge watching. To feed the curious minds of online researchers, Google has upped the ante. “Google Licenses LyricFind for Search Results” reports that Google has addressed its miserable search systems for the words in tunes. Consider this lyric:
“My wrist deserve a shout out, I’m like “what up, wrist’?
My stove deserve a shout out, I’m like “what up, stove’?”
According to the write up:
A query for the lyrics to a specific song will pull up the words to much of that song, freeing users from having to click through to another website. Google rolled out the lyrics feature in the U.S. today (June 27), though it has licenses to display the lyrics internationally as well.
I am definitely thrilled. Why worry about the indexing of PowerPoints, PDFs, and other content when I have access to the source of:
I’m that red bull, now let’s fly away.
What’s really flown away? Rag mop.
Stephen E Arnold, July 13, 2016
July 11, 2016
If you want to buy some Microsoft smart APIs, now is the time. Navigate to Microsoft Azure and pick your API. On offer are some content processing APS like text search, image search, autosuggest, etc. How much are these goodies? Well, the fee varies with the number of transactions. What’s a “transaction”? Like Amazon AWS, you will find that out as you move forward, gentle reader. Here’s the display for the search API fees:
I know that these low contrast Web pages are just so easy to read. In a nutshell, you will owe the Microsofties by tier. The S1, S2, etc. remind me of IBM’s tiered prices. The number is dependent on how may transaction, which tier, and I assume any other special goodies one requires. Think in terms of blocks of $30.
Enjoy the taxi meter approach. In my experience, these work out really well for those selling services. I love metered, tiered prices with “transactions” left wonderfully fluid. Does the phrase “lock in” resonate? Does the concept of “price lift” have relevance? Have fun budgeting costs over a three to five year span.
Stephen E Arnold, July 11, 2016
July 11, 2016
Tech-security firm BAE Systems has sketched out six cybercriminal types, we learn from “BAE Systems Unmasks Today’s Cybercriminals” at the MENA Herald. We’re told the full descriptions reveal the kinds of havoc each type can wreak, as well as targeted advice for thwarting them. The article explains:
“Threat intelligence experts at BAE Systems have revealed ‘The Unusual Suspects’, built on research that demonstrates the motivations and methods of the most common types of cybercriminal. The research, which is derived from expert analysis of thousands of cyber attacks on businesses around the world. The intention is to help enterprises understand the enemies they face so they can better defend against cyber attack.”
Apparently, such intel is especially needed in the Middle East, where cybercrime was recently found to affect about 30 percent of organizations. Despite the danger, the same study from PwC found that regional companies were not only unprepared for cyber attacks, many did not even understand the risks.
The article lists the six cybercriminal types BAE has profiled:
“The Mule – naive opportunists that may not even realise they work for criminal gangs to launder money;
The Professional – career criminals who ‘work’ 9-5 in the digital shadows;
The Nation State Actor – individuals who work directly or indirectly for their government to steal sensitive information and disrupt enemies’ capabilities;
The Activist – motivated to change the world via questionable means;
The Getaway – the youthful teenager who can escape a custodial sentence due to their age;
The Insider – disillusioned, blackmailed or even over-helpful employees operating from within the walls of their own company.”
Operating in more than 40 countries, BAE Systems is committed to its global perspective. Alongside its software division, the company also produces military equipment and vehicles. Founded in 1999, the company went public in 2013. Unsurprisingly, BAE’s headquarters are in Arlington, Virginia, just outside of Washington DC. As of this writing, they are also hiring in several locations.
Cynthia Murrell, July 11, 2016
July 8, 2016
Another day, another merger. PR Newswire released a story, VirtualWorks and Language Tools Announce Merger, which covers Virtual Works’ purchase of Language Tools. In Language Tools, they will inherit computational linguistics and natural language processing technologies. Virtual Works is an enterprise search firm. Erik Baklid, Chief Executive Officer of VirtualWorks is quoted in the article,
“We are incredibly excited about what this combined merger means to the future of our business. The potential to analyze and make sense of the vast unstructured data that exists for enterprises, both internally and externally, cannot be understated. Our underlying technology offers a sophisticated solution to extract meaning from text in a systematic way without the shortcomings of machine learning. We are well positioned to bring to market applications that provide insight, never before possible, into the vast majority of data that is out there.”
This is another case of a company positioning themselves as a leader in enterprise search. Are they anything special? Well, the news release mentions several core technologies will be bolstered due to the merger: text analytics, data management, and discovery techniques. We will have to wait and see what their future holds in regards to the enterprise search and business intelligence sector they seek to be a leader in.
July 7, 2016
I read “Information about DuckDuckGo’s Partnership with Yahoo.” Yahoo is into search DuckDuckGo style. According to the write up:
our latest partnership with Yahoo enables DuckDuckGo to get access to features you’ve been requesting for years:
Date filters let you filter results from the last day, week and month.
Site links help you quickly get to subsections of sites.
Farewell, Inktomi, AllTheWeb, Google, Microsoft. Yahoo, and home brew craziness. has a new findability future. Now about the size of the index? Will Yahoo’s new owner have a fresh idea? Worth watching.
Stephen E Arnold, July 7, 2016
July 6, 2016
I read “New Search Engine Makes Data Instantly Searchable, Increases Data Retention.” I like that instantly assertion. Keep in mind that for me “instantly” means immediately. Okay. I also noted the “everything.” Bold assertions.
According to the write up:
This search engine makes data instantly searchable and increases data retention. It’s also crafted so that the archived data doesn’t impact the current inflow. Another feature is an anomaly system, with everything available at the user’s fingertips.
What is Rocana besides a search engine? Well, it turns out that the company provides that :
Limitless online access and analysis of all operational data gives CIOs and technology leaders a distinct competitive edge in today’s digital economy.
So we have an “all” tossed in for good measure in the explanation of Rocana. A video explains the niche the company’s search technology targets: operational data and anomaly. The search engine scales to “volumes of data that none of our competitors can achieve.” The company delivers “results that matter.” The company is going “beyond search.” There you go. Instantly. Everything.
Stephen E Arnold, July 6, 2016
July 5, 2016
For research purposes, I surf the Dark Web on a regular basis. It is like skulking around the back alleys of a major city and witnessing all types of crime, but keeping to yourself. I have seen a few Web sites that could be deemed as legal, but most of the content I peruse is illegal: child pornography, selling prescription drugs, and even a hitman service. I have begun to think that everything on the Dark Web is illegal, except Help Net Security tells me that “Dark Web Mapping Reveals That Half Of The Content Is Legal.”
The Centre for International Governance Innovation (CIGI) conducted global survey and discovered that seven in ten (71%) of the surveyors believe the Dark Web needs to be shut down. There is speculation if the participants eve had the right definition about what the Dark Web is and might have confused the terms “Dark Web” and “Dark Net”.
Darksum, however, mapped the Tor end of the Dark Web and discovered some interesting facts:
- “Of the 29,532 .onion identified during the sampling period – two weeks in February 2016 – only 46% percent could actually be accessed. The rest were likely stort-lived C&C servers used to manage malware, chat clients, or file-sharing applications.
- Of those that have been accessed and analyzed with the companies’ “machine-learning” classification method, less than half (48%) can be classified as illegal under UK and US law. A separate manual classification of 1,000 sites found about 68% of the content to be illegal under those same laws.”
Darksum’s goal is to clear up misconceptions about the Dark Web and to better understand what is actually on the hidden sector of the Internet. The biggest hope is to demonstrate the Dark Web’s benefits.
July 1, 2016
Elastic’s Elasticsearch has become one of the go to open source search and retrieval solutions. Based on Lucene, the system has put the heat on some of the other open source centric search vendors. However, search is a tricky beastie.
Navigate to “AWS Elasticsearch Service Woes” to get a glimpse of some of the snags which can poke holes in one’s rip stop hiking garb. The problems are not surprising. One does not know what issues will arise until a search system is deployed and the lucky users are banging away with their queries or a happy administrator discovers that Button A no longer works.
The write up states:
We kept coming across OOM issues due the JVMMemoryPresure spiking and inturn the ES service kept crapping out. Aside from some optimization work, we’d more than likely have to add more boxes/resources to the cluster which then means more things to manage. This is when we thought, “Hey, AWS have a service for this right? Let’s give that a crack?!”. As great as having it as a service is, it certainly comes with some fairly irritating pitfalls which then causes you to approach the situation from a different angle.
One approach is to use templates to deal with the implementation of shard management in AWS Elasticsearch. Sample templates are provided in the write up. The fix does not address some issues. The article provides a link to a reindexing tool called es-tool.
The most interesting comment in the article in my opinion is:
In hindsight I think it may have been worth potentially sticking with and fleshing out the old implementation of Elasticsearch, instead of having to fudge various things with the AWS ES service. On the other hand it has relieved some of the operational overhead, and in terms of scaling I am literally a couple of clicks away. If you have large amounts of data you pump into Elasticsearch and you require granular control, AWS ES is not the solution for you. However if you need a quick and simple Elasticsearch and Kibana solution, then look no further.
My takeaway is to do some thinking about the strengths and weaknesses of the Amazon AWS before chopping through the Bezos cloud jungle.
Stephen E Arnold, July 1, 2016