July 6, 2016
I read “New Search Engine Makes Data Instantly Searchable, Increases Data Retention.” I like that instantly assertion. Keep in mind that for me “instantly” means immediately. Okay. I also noted the “everything.” Bold assertions.
According to the write up:
This search engine makes data instantly searchable and increases data retention. It’s also crafted so that the archived data doesn’t impact the current inflow. Another feature is an anomaly system, with everything available at the user’s fingertips.
What is Rocana besides a search engine? Well, it turns out that the company provides that :
Limitless online access and analysis of all operational data gives CIOs and technology leaders a distinct competitive edge in today’s digital economy.
So we have an “all” tossed in for good measure in the explanation of Rocana. A video explains the niche the company’s search technology targets: operational data and anomaly. The search engine scales to “volumes of data that none of our competitors can achieve.” The company delivers “results that matter.” The company is going “beyond search.” There you go. Instantly. Everything.
Stephen E Arnold, July 6, 2016
July 5, 2016
For research purposes, I surf the Dark Web on a regular basis. It is like skulking around the back alleys of a major city and witnessing all types of crime, but keeping to yourself. I have seen a few Web sites that could be deemed as legal, but most of the content I peruse is illegal: child pornography, selling prescription drugs, and even a hitman service. I have begun to think that everything on the Dark Web is illegal, except Help Net Security tells me that “Dark Web Mapping Reveals That Half Of The Content Is Legal.”
The Centre for International Governance Innovation (CIGI) conducted global survey and discovered that seven in ten (71%) of the surveyors believe the Dark Web needs to be shut down. There is speculation if the participants eve had the right definition about what the Dark Web is and might have confused the terms “Dark Web” and “Dark Net”.
Darksum, however, mapped the Tor end of the Dark Web and discovered some interesting facts:
- “Of the 29,532 .onion identified during the sampling period – two weeks in February 2016 – only 46% percent could actually be accessed. The rest were likely stort-lived C&C servers used to manage malware, chat clients, or file-sharing applications.
- Of those that have been accessed and analyzed with the companies’ “machine-learning” classification method, less than half (48%) can be classified as illegal under UK and US law. A separate manual classification of 1,000 sites found about 68% of the content to be illegal under those same laws.”
Darksum’s goal is to clear up misconceptions about the Dark Web and to better understand what is actually on the hidden sector of the Internet. The biggest hope is to demonstrate the Dark Web’s benefits.
July 1, 2016
Elastic’s Elasticsearch has become one of the go to open source search and retrieval solutions. Based on Lucene, the system has put the heat on some of the other open source centric search vendors. However, search is a tricky beastie.
Navigate to “AWS Elasticsearch Service Woes” to get a glimpse of some of the snags which can poke holes in one’s rip stop hiking garb. The problems are not surprising. One does not know what issues will arise until a search system is deployed and the lucky users are banging away with their queries or a happy administrator discovers that Button A no longer works.
The write up states:
We kept coming across OOM issues due the JVMMemoryPresure spiking and inturn the ES service kept crapping out. Aside from some optimization work, we’d more than likely have to add more boxes/resources to the cluster which then means more things to manage. This is when we thought, “Hey, AWS have a service for this right? Let’s give that a crack?!”. As great as having it as a service is, it certainly comes with some fairly irritating pitfalls which then causes you to approach the situation from a different angle.
One approach is to use templates to deal with the implementation of shard management in AWS Elasticsearch. Sample templates are provided in the write up. The fix does not address some issues. The article provides a link to a reindexing tool called es-tool.
The most interesting comment in the article in my opinion is:
In hindsight I think it may have been worth potentially sticking with and fleshing out the old implementation of Elasticsearch, instead of having to fudge various things with the AWS ES service. On the other hand it has relieved some of the operational overhead, and in terms of scaling I am literally a couple of clicks away. If you have large amounts of data you pump into Elasticsearch and you require granular control, AWS ES is not the solution for you. However if you need a quick and simple Elasticsearch and Kibana solution, then look no further.
My takeaway is to do some thinking about the strengths and weaknesses of the Amazon AWS before chopping through the Bezos cloud jungle.
Stephen E Arnold, July 1, 2016
July 1, 2016
The Tor-enabled search engine DuckDuckGo has received attention recently for being an search engine that does not track users. We found their activity report that shows a one year average of their direct queries per day. DuckDuckGo launched in 2008 and offers an array of options to prevent “search leakage”. Their website defines this term as the sharing of personal information, such as the search terms queried. Explaining a few of DuckDuckGo’s more secure search options, their website states:
“Another way to prevent search leakage is by using something called a POST request, which has the effect of not showing your search in your browser, and, as a consequence, does not send it to other sites. You can turn on POST requests on our settings page, but it has its own issues. POST requests usually break browser back buttons, and they make it impossible for you to easily share your search by copying and pasting it out of your Web browser’s address bar.
Finally, if you want to prevent sites from knowing you visited them at all, you can use a proxy like Tor. DuckDuckGo actually operates a Tor exit enclave, which means you can get end to end anonymous and encrypted searching using Tor & DDG together.”
Cybersecurity and privacy have become hot topics since Edward Snowden made headlines in 2013, which is notably when DuckDuckGo’s exponential growth begins to take shape. Recognition of Tor also became more mainstream around that time, 2013, which is when the Silk Road shutdown occurred, placing the Dark Web in the news. It appears that starting a search engine focused on anonymity in 2008 was not such a bad idea.
Megan Feil, July 1, 2016
June 29, 2016
Navigating the Dark Web can be a hassle, because many of the Web sites are shut down before you have the chance to learn what nefarious content, services, or goods are available. Some of these sites go down on their own, but law enforcement had a part in dismantling them as well. Some Dark Web sites are too big and encrypted to be taken down and sometimes they exchange hands, such as Silk Road and now Hell. Motherboard explains that “Dark Web Hacking Forum ‘Hell’ Appears To Have New Owners.”
The Real Deal, a computer exploit market, claimed to take ownership of Hell, the hacking forum known for spreading large data dumps and stolen data. Real Deal said of their acquisition:
“ ‘We will be removing the invite-only system for at least a week, and leave the “vetting” forum for new users,’ one of The Real Deal admins, who also used the handle The Real Deal, told Motherboard in an encrypted chat. ‘It’s always nice to have a professional community that meets our market’s original niche, hopefully it will bring some more talent both to the market and to the forums,’ the admin continued. ‘And it’s no secret that we as admins would enjoy the benefit of ‘first dibs’ on buying fresh data, resources, tools, etc.’”
The only part of Hell that has new administrators is the forum due to the old head had personal reasons that required more attention. Hell is one of the “steadier” Dark Web sites and it played a role in the Adult FriendFinder hack, was the trading place for Mate1 passwords, and hosted breaches from a car breathalyzer maker.
Standard news for the Dark Web, until the next shutdown and relaunch.
June 28, 2016
I scanned a number of write ups about Google’s embrace of machine learning and smart software. I supplement my Google queries with the results of other systems. Some of these have their own index; for example, Yandex.ru and Exalead. Others are metasearch engines will suck in results and do some post processing to help answer the users’ questions. Others are disappointing and I check them out when I have a client who is willing to pay for stone flipping; for example, DuckDuckGo, iSeek, or the estimable Qwant. (I love quirky spelling too.)
I read “RankBrain Third Most Important Factor Determining Google Search Results.” Here’s the quote I noted:
Google is characteristically fuzzy on exactly how it improves search (something to do with the long tail? Better interpretation of ambiguous requests?) but Jeff Dean [former AltaVista wizard] says that RankBrain is “involved in every query,” and affects the actual rankings “probably not in every query but in a lot of queries.” What’s more, it’s hugely effective. Of the hundreds of “signals” Google search uses when it calculates its rankings (a signal might be the user’s geographical location, or whether the headline on a page matches the text in the query), RankBrain is now rated as the third most useful. “It was significant to the company that we were successful in making search better with machine learning,” says John Giannandrea. “That caused a lot of people to pay attention.”Pedro Domingos, the University of Washington professor who wrote The Master Algorithm, puts it a different way: “There was always this battle between the retrievers and the machine learning people,” he says. “The machine learners have finally won the battle.”
I have noticed in the last year, that I am unable to locate certain documents when I use the words and phrases which had served me well before smart software became the cat’s pajamas.
One recent example was my need to locate a case example about a German policeman’s trials and tribulations with the Dark Web. When I first located this document, I was trying to verify an anecdote shared with me after one of my intelligence community lectures.
I had the document in my file and I pulled it up on my monitor. The document in question is the work of an outfit and person labeled “Lars Hilse.” The title of the write up is “Dark Web & Bitcoin: Global Terrorism “Threat Assessment. The document was published in April 2013 with an update issued in November 2013. (That document was the source or maybe confirmed the anecdote about the German policeman and his Dark Web research.)
For my amusement, I wondered if I could use the new and improved Google Web search to locate the document. I display section 4.8 on my screen. The heading of the section is “Extortion (of Law Enforcement Personnel).
I entered the phrase into Google without quotes. Here’s the first page of results:
None of the hits points to the document with the five word phrase.
June 24, 2016
I spoke with a confused and unbudgeted worker bee at a giant outfit this weekend. The stellar professional was involved in figuring out what to do about enterprise search. The story is one I have heard many times in the last 40 years. The system doesn’t meet the needs of the users. The system is over budget. The system does not index in real time. Yadda yadda yadda.
The big question was, “What are the enterprise search vendors offering a system which actually works, does not experience downtime, cost overruns, and user outrage. Note that this is not the word “outage.” The word is “outrage”.
I don’t know of such a system. As a helpful 72 year old, I rattled off a list of vendors who purport to offer Big Data capable, next generation semantic-linguistic-NLP systems. True to form, I repeated the list twice. I thought he would cry.
For those of you who want to know the vendors I plucked from my list of outfits in the search and content processing game, I reproduce the list. If you want upsides, downsides, license fees, gotchas, and other assorted details, I will provide the information. But since you are not likely to buy me dinner this evening, you will have to pay for my thoughts.
Here’s the selected list. Reader, start your browser:
- Elasticsearch (Lucene)
- Fabasoft Mindbreeze
- IBM Omnifind
- IHS Goldfire
- Lucid Works (Solr)
- Squiz Funnelback
There are quite a few outfits whose systems do search like Palantir, but I trimmed the list to companies for my worried pal.
What’s interesting is that most of these outfits explain that their systems are much, much more than search and retrieval. Believe it or not as Mr. Ripley used to say.
Factoid: Most of these outfits have been around for quite a few years. Only Elasticsearch has managed to become a “brand” in the search space. What happened to Autonomy, Convera, Endeca, Fast Search & Transfer, and Verity since I wrote the first three editions of the Enterprise Search Report between 2003 and 2007? Ugly for some.
Search is a tough problem and has yet to deliver what users expect. Remember Google killed its search appliance. Ads are a better business because they spell money for Alphabet.
Stephen E Arnold, June 24, 2016
June 21, 2016
Microsoft is gearing up for a fresh challenge to Google, with a Bing rebranding effort centered on the new “Bing Network.” This marks a different approach to leveraging the MS search platform, we learn from the piece, “Microsoft Rebrands Bing, Challenges Google” at SearchMarketingDaily. The incorporation of Yahoo has a lot to do with it. Reporter Laurie Sullivan writes:
“Microsoft’s message says the network pulls together in-the-moment data from across its mobile, global and local partners to support products that people use daily. And that network continues to grow. With the transition of all U.S. accounts, people and account management from Yahoo to Bing, the network represents an expanding set of partnerships such as AOL, and The Wall Street Journal, which adds more searches and clicks to the network daily, wrote Stephen Sirich, GM of advertising and consumer monetization group at Microsoft, in a post.”
Sullivan later reminds us:
“The shift in brand strategy also marks an end to the Yahoo-Bing Network. The renegotiated search deal between Microsoft and Yahoo in April 2015, five years into the 10-year deal, has ad sales and account management returning to their respective companies.”
The article discusses reasons Microsoft has struggled so to position Bing as an alternative to Google. For example, says one professional, Bing should not have tried to change the model Google had set up, and users had grown accustomed to, for Internet search. Also, Bing’s brand recognition has always lagged behind that of Google. Perhaps that is about to change with this renewed effort. See the article for some more background and stats on Bing’s performance.
Cynthia Murrell, June 21, 2016
June 17, 2016
I read “Google Launches Springboard Enterprise Search Tool, Revamps Sites.” Ah, thoughts flowed when I learned that a Google customer can search Google Docs and Google Drive soon.
Anyone remember Verity? What about Yahoo semantic search? No. The wizard influencing both of these outstanding services (well, outstanding might be too strong a word) is a person who has cast a shadow over information access for many years. I recall one great idea floated about 25 years ago. Users of the Verity system would pay for each cell of structured data a query “touched.” Yep, taxi meter pricing. Another great thought was offered when the individual told a BearStearns’ professional and me in no uncertain terms that Yahoo’s semantic search methods were better than Ramanathan Guha’s. Okay. Good assertion. Where are those thoughts now? Yes, searching Google services for one’s own content. Einstein, you have been aced.
The answer is Google Springboard. Yep, a service from the Alphabet Google thing which allows a Google services user to — hand on to your information access hat, gentle reader — to “will help them search more easily through and find information from Gmail, Docs, Calendar, Drive, Groups and other applications.”
I know that Alphabet Google is doing a bang up job in many technical disciplines. There is the “solving death” thing. There are the Loon balloons. There are self driving cars with sticky hoods. Oh, so much innovation.
The notion that the search function in Gmail will be extended to the goodies tucked into other Google cloud services is a bold move. For an advertising company, the shadow of Verity and Yahoo falls over precision and recall at Google.
Oh, wait. Google has not yet solved death. When the new service becomes available, perhaps finding an item in calendar or in a Google Doc will become a reality. Innovation never rests at the Alphabet Google thing.
Stephen E Arnold, June 18, 2016
June 17, 2016
A new survey about the Dark Web was released recently. Wired published an article centered around the research, called Dark Web’s Got a Bad Rep: 7 in 10 People Want It Shut Down, Study Shows. Canada’s Center for International Governance Innovation surveyed 24,000 people in 24 countries about their opinion of the Dark Web. The majority of respondents, 71 percent across all countries and 72 percent of Americans, said they believed the “dark net” should be shut down. The article states,
“CIGI’s Jardine argues that recent media coverage, focusing on law enforcement takedowns of child porn sites and bitcoin drug markets like the Silk Road, haven’t improved public perception of the dark web. But he also points out that an immediate aversion to crimes like child abuse overrides mentions of how the dark web’s anonymity also has human rights applications. ‘There’s a knee-jerk reaction. You hear things about crime and its being used for that purpose, and you say, ‘let’s get rid of it,’’ Jardine says.”
We certainly can attest to the media coverage zoning in on the criminal connections with the Dark Web. We cast a wide net tracking what has been published in regards to the darknet but many stories, especially those in mainstream sources emphasize cybercrime. Don’t journalists have something to gain from also publishing features revealing the aspects the Dark Web that benefit investigation and circumvent censorship?
Megan Feil, June 17, 2016