May 1, 2016
Apache Lucene receives the most headlines when it comes to discussion about open source search software. My RSS feed pulled up another open source search engine that shows promise in being a decent piece of software. Open Semantic Search is free software that cane be uses for text mining, analytics, a search engine, data explorer, and other research tools. It is based on Elasticsearch/Apache Solrs’ open source enterprise search. It was designed with open standards and with a robust semantic search.
As with any open source search, it can be programmed with numerous features based on the user’s preference. These include, tagging, annotation, varying file format support, multiple data sources support, data visualization, newsfeeds, automatic text recognition, faceted search, interactive filters, and more. It has the benefit that it can be programmed for mobile platforms, metadata management, and file system monitoring.
Open Semantic Search is described as
“Research tools for easier searching, analytics, data enrichment & text mining of heterogeneous and large document sets with free software on your own computer or server.”
While its base code is derived from Apache Lucene, it takes the original product and builds something better. Proprietary software is an expense dubbed a necessary evil if you work in a large company. If, however, you are a programmer and have the time to develop your own search engine and analytics software, do it. It could be even turn out better than the proprietary stuff.
April 30, 2016
It looks like Paris Hilton might have a new sibling, although the conversations at family gatherings will be lackluster. No, the hotel-chain family has not adopted Watson, instead a version of the artificial intelligence will work as a concierge. Ars Technica informs us that “IBM Watson Now Powers A Hilton Hotel Robot Concierge.”
The Hilton McLean hotel in Virginia now has a now concierge dubbed Connie, after Conrad Hilton the chain’s founder. Connie is housed in a Nao, a French-made android that is an affordable customer relations platform. Its brain is based on Watson’s program and answers verbal queries from a WayBlazer database. The little robot assists guests by explaining how to navigate the hotel, find restaurants, and tourist attractions. It is unable to check in guests yet, but when the concierge station is busy, you do not want to pull out your smartphone, or have any human interaction it is a good substitute.
” ‘This project with Hilton and WayBlazer represents an important shift in human-machine interaction, enabled by the embodiment of Watson’s cognitive computing,’ Rob High, chief technology officer of Watson said in a statement. ‘Watson helps Connie understand and respond naturally to the needs and interests of Hilton’s guests—which is an experience that’s particularly powerful in a hospitality setting, where it can lead to deeper guest engagement.’”
Asia already uses robots in service industries such as hotels and restaurants. It is worrying that Connie-like robots could replace people in these jobs. Robots are supposed to augment human life instead of taking jobs away from it. While Connie-like robots will have a major impact on the industry, there is something to be said for genuine human interaction, which usually is the preference over artificial intelligence. Maybe team the robots with humans in the service industries for the best all around care?
April 29, 2016
There is a new tool for organizations to more quickly detect whether their sensitive data has been hacked. The Atlantic discusses “The Spider that Crawls the Dark Web Looking for Stolen Data.” Until now, it was often many moons before an organization realized it had been hacked. Matchlight, from Terbium Labs, offers a more proactive approach. The service combs the corners of the Dark Web looking for the “fingerprints” of its clients’ information. Writer Kevah Waddell reveals how it is done:
“Once Matchlight has an index of what’s being traded on the Internet, it needs to compare it against its clients’ data. But instead of keeping a database of sensitive and private client information to compare against, Terbium uses cryptographic hashes to find stolen data.
“Hashes are functions that create an effectively unique fingerprint based on a file or a message. They’re particularly useful here because they only work in one direction: You can’t figure out what the original input was just by looking at a fingerprint. So clients can use hashing to create fingerprints of their sensitive data, and send them on to Terbium; Terbium then uses the same hash function on the data its web crawler comes across. If anything matches, the red flag goes up. Rogers says the program can find matches in a matter of minutes after a dataset is posted.”
What an organization does with this information is, of course, up to them; but whatever the response, now they can implement it much sooner than if they had not used Matchlight. Terbium CEO Danny Rogers reports that, each day, his company sends out several thousand alerts to their clients. Founded in 2013, Terbium Labs is based in Baltimore, Maryland. As of this writing, they are looking to hire a software engineer and an analyst, in case anyone here is interested.
Cynthia Murrell, April 29, 2016
April 28, 2016
Is it any surprise that emerging nations want in on the ability to spy on their citizens? That’s what all the cool governments are doing, after all. Indian Strategic Studies reports, “Even Developing Nations Want Cyber Spying Capabilities.” Writer Emilio Iasiello sets the stage—he contrasts efforts by developed nations to establish restrictions versus developing countries’ increased interest in cyber espionage tools.
On one hand, we could take heart from statements like this letter and this summary from the UN, and the “cyber sanctions” authority the U.S. Department of Treasury can now wield against foreign cyber attackers. At the same time, we may uneasily observe the growing popularity of FinFisher, a site which sells spyware to governments and law enforcement agencies. A data breach against FinFisher’s parent company, Gamma International, revealed the site’s customer list. Notable client governments include Bangladesh, Kenya, Macedonia, and Paraguay. Iasiello writes:
“While these states may not use these capabilities in order to conduct cyber espionage, some of the governments exposed in the data breach are those that Reporters without Borders have identified as ‘Enemies of the Internet’ for their penchant for censorship, information control, surveillance, and enforcing draconian legislation to curb free speech. National security is the reason many of these governments provide in ratcheting up authoritarian practices, particularly against online activities. Indeed, even France, which is typically associated with liberalism, has implemented strict laws fringing on human rights. In December 2013, the Military Programming Law empowered authorities to surveil phone and Internet communications without having to obtain legal permission. After the recent terrorist attacks in Paris, French law enforcement wants to add addendums to a proposed law that blocks the use of the TOR anonymity network, as well as forbids the provision of free Wi-Fi during states of emergency. To put it in context, China, one of the more aggressive state actors monitoring Internet activity, blocks TOR as well for its own security interests.”
The article compares governments’ cyber spying and other bad online behavior to Pandora’s box. Are resolutions against such practices too little too late?
Cynthia Murrell, April 28, 2016
April 26, 2016
Ever wonder how far stolen information can go on the Dark Web? If so, check out “Project Cumulus—Tracking Fake Phished Credentials Leaked to Dark Web” at Security Affairs. Researchers at Bitglass baited the hook and tracked the mock data. Writer Pierluigi Paganini explains:
“The researchers created a fake identity for employees of a ghostly retail bank, along with a functional web portal for the financial institution, and a Google Drive account. The experts also associated the identities with real credit-card data, then leaked ‘phished’ Google Apps credentials to the Dark Web and tracked the activity on these accounts. The results were intriguing, the leaked data were accessed in 30 countries across six continents in just two weeks. Leaked data were viewed more than 1,000 times and downloaded 47 times, in just 24 hours the experts observed three Google Drive login attempts and five bank login attempts. Within 48 hours of the initial leak, files were downloaded, and the account was viewed hundreds of times over the course of a month, with many hackers successfully accessing the victim’s other online accounts.”
Yikes. A few other interesting Project Cumulus findings: More than 1400 hackers viewed the credentials; one tenth of those tried to log into the faux-bank’s web portal; and 68% of the hackers accessed Google Drive through the Tor network. See the article for more details. Paganini concludes with a reminder to avoid reusing login credentials, especially now that we see just how far stolen credentials can quickly travel.
Cynthia Murrell, April 26, 2016
April 22, 2016
On April 14, 2016, I flipped through my dead tree copy of the New York Times. You know. The newspaper which is struggling to sell more copies than McPaper. What first caught my eye was this advertisement for a dead tree book called “ The New York Times Manual of Style and Usage: The Official Style guide Used by the Writers and Editors of the World’s Most Authoritative News Organization. I assume this manual was produced by “real” journalists and editors. I am not familiar with this book, although I was aware of its existence. The addled goose uses the style set forth in the classic Tressler Christ circa 1958. Oh, you may be able to read a version of the New York Times story at this link. Keep in mind that you may have to pay pay pay.
I noted in the very same edition of the dead tree edition of the New York Times this write up about a football (soccer) match. I know that the “real” journalists working in Midtown are probably not into the European Cup if there is a Starbuck’s nearby.
I noted this interesting stylistic touch:
I spotted two paragraphs which are mostly the same. I assume that the new edition of the Style and Usage volume is okay with duplicate passages. It is tough to determine which is the “correct” paragraph.
Tressler Christ, as I recall, suggested that writing the same passage twice in a row was not a good move in 1958. The reality of the cost conscious New York Times may be that it is okay to pontificate and then duplicate content.
Nifty. I will try this some time.
Nifty. I will try this some time.
Nifty. I will try this some time.
Nifty. I will try this some time.
See. Not annoying annoying annoying at all.
Stephen E Arnold, April 22, 2016
April 22, 2016
Working at whizzy Silicon Valley start ups has got to be rewarding. I know the shift to mobile is shaking up some assumptions about the Alphabet Google thing. I know that Google is trying to sell its robot outfit. I know that legal eagles are keeping the sun from some volleyball games. But I was delighted to learned that Google Nest has an “incredibly nice new cafe” which serves “Asian noodles.” Slam dunk.
I read “Nest CEO Tony Fadell Went to Google’s All-Hands Meeting to Defend Nest. Here’s What He Said.” I learned that Nest garnered some “damning articles.” I had not noticed because I don’t pay too much attention to home automation in general and thermostats in particular.
I learned that one “real” journalistic outfit wrote about a “corrosive culture” in another Alphabet Google operation. I am not sure what a corrosive culture is, but I think the idea is that some folks are not happy. What’s new? Anyone ever listed to a group of GS 12s discuss the efficacy of lateral transfers from Fish & Wildlife to the Postal Service? Grumpy, grumpy.
The Google is on top of employee satisfaction. There are tools to obtain feedback. There are senior managers who are managing. The passage in the write up I noted and circled in arugula green was this one:
I do respect the Nest employees. I do respect the Google employees. I respect the Alphabet employees. We try to work very hard together and partner in many different areas around the different companies. I also respect ex-Nesters, ex-Googlers, those kind of things. So when I read those things that say we don’t respect people, or I don’t, it’s absolutely wrong and that is not how I believe because I want to be treated with respect. And I give respect because I want to get respect.
My assumption was that respect at the old Google came from doing things that worked and mattered. I am a little fuzzy on the people side of the equation. The reason is that I heard long ago that the reason a certain big wheel media titan launched a multi year, very expensive legal dispute with the Google was a direct consequence of [a] senior Googlers not arriving at the meeting on time. Since the meeting was at Google Mountain View, the big wheel media person was not happy, [b] a certain founder of Google did not look at the media titan. The founder focused on his Mac laptop and ignored the media giant, [c] another Google founder arrived after the the first Google founder, perspiring because his rollerblading session ran long. Now I was not at this meeting, and this may be one of those apocryphal stories about why the Google and Viacom were not best friends for many years.
One thing the passage about respect did was trigger a memory of this anecdote. My source was a person familiar with the matter, and I gained some dribs and drabs to confirm the anecdote after the event. I assume the event and this remarkable presentation ran like a smart thermostat.
Yep, respect and Asian noodles, and the loss of a Glass executive. (Glass reports to Nest.)
Stephen E Arnold, April 22, 2016
April 18, 2016
The article titled Mindbreeze and MEDIALIFE Launch Strategic Partnership on BusinessWire discusses what the merger means for the Slovak and Czech Republic enterprise search market. MediaLife emphasizes its concentrated approach to document management systems for Slovak customers in need of large systems for the management, processing, and storage of documents. The article details,
“Based on this partnership, we provide our customers innovative solutions for fast access to corporate data, filtering of relevant information, data extraction and their use in automated sorting (classification)… Powerful enterprise search systems for businesses must recognize relationships among different types of information and be able to link them accordingly. Mindbreeze InSpire Appliance is easy to use, has a high scalability and shows the user only the information which he or she is authorized to view.”
Daniel Fallmann, founder and CEO of Mindbreeze, complimented himself on his selection of a partner in MediaLife and licked his chops at the prospect of the new Eastern European client base opened to Mindbreeze through the partnership. Other Mindbreeze partners exist in Italy, the UK, Germany, Mexico, Canada, and the USA, as the company advances its mission to supply enterprise search appliances as well as big data and knowledge management technologies.
Chelsea Kerwin, April 18, 2016
April 18, 2016
What better way to train a natural language AI than to bring venerated human authors into the equation? Wired reports, “Google Wants to Predict the Next Sentences of Dead Authors.” Not surprisingly, Google researchers are tapping into Project Gutenberg for their source material. Writer Matt Burgess relates:
“The network is given millions of lines from a ‘jumble’ of authors and then works out the style of individual writers. Pairs of lines were given to the system, which made a simple ‘yes’ or ‘no’ decision to whether they matched up. Initially the system didn’t know the identity of any authors, but still only got things wrong 17 percent of the time. By giving the network an indication of who the authors were, giving it another factor to compare work against, the computer scientists reduced the error rate to 12.3 percent. This was also improved by a adding a fixed number of previous sentences to give the network more context.”
The researchers carry their logic further. As the Wired title says, they have their AI predict an author’s next sentence; we’re eager to learn what Proust would have said next. They also have the software draw conclusions about authors’ personalities. For example, we’re told:
“Google admitted its predictions weren’t necessarily ‘particularly accurate,’ but said its AI had identified William Shakespeare as a private person and Mark Twain as an outgoing person. When asked ‘Who is your favourite author?’ and [given] the options ‘Mark Twain’, ‘William Shakespeare’, ‘myself’, and ‘nobody’, the Twain model responded with ‘Mark Twain’ and the Shakespeare model responded with ‘William Shakespeare’. Asked who would answer the phone, the AI Shakespeare hoped someone else would answer, while Twain would try and get there first.”
I can just see Twain jumping over Shakespeare to answer the phone. The article notes that Facebook is also using the work of human authors to teach its AI, though that company elected to use children’s classics like The Jungle Book, A Christmas Carol, and Alice in Wonderland. Will we eventually see a sequel to Through the Looking Glass?
Cynthia Murrell, April 18, 2016
April 15, 2016
Interested in a glimpse of the Dark Web without downloading Tor and navigating it yourself? E-Forensics Magazine published Peeling back the onion part 1: Mapping the Dark Web by Stuart Peck, which shares an overview of services and content in this anonymity-oriented internet. A new map covering the contents of the Dark Web, the first one to do so, was launched recently by a ZeroDayLab key partner, and threat intelligence service Intelliagg. The write-up explains,
“But this brings me to my previous point why is this map so important? Until recently, it had been difficult to understand the relationships between hidden services, and more importantly the classification of these sites. As a security researcher, understanding hidden services, such as private chat forums and closed sites, and how these are used to plan and discuss potential campaigns, such as DDoS, Ransom Attacks, Kidnapping, Hacking, and Trading of Vulnerabilities and leaked data, is key to protecting our clients through proactive threat intelligence.”
Understanding the layout of an online ecosystem is an important first step for researchers or related business ventures. But what about a visualization showing these web services are connected to functions, such as financial and other services, with brick-and-mortar establishments? It is also important to that while this may be the first Surface Web map of the Dark Web, many navigational “maps” on .onion sites that have existed as long as users began browsing on Tor.
Megan Feil, April 15, 2016