April 18, 2016
What better way to train a natural language AI than to bring venerated human authors into the equation? Wired reports, “Google Wants to Predict the Next Sentences of Dead Authors.” Not surprisingly, Google researchers are tapping into Project Gutenberg for their source material. Writer Matt Burgess relates:
“The network is given millions of lines from a ‘jumble’ of authors and then works out the style of individual writers. Pairs of lines were given to the system, which made a simple ‘yes’ or ‘no’ decision to whether they matched up. Initially the system didn’t know the identity of any authors, but still only got things wrong 17 percent of the time. By giving the network an indication of who the authors were, giving it another factor to compare work against, the computer scientists reduced the error rate to 12.3 percent. This was also improved by a adding a fixed number of previous sentences to give the network more context.”
The researchers carry their logic further. As the Wired title says, they have their AI predict an author’s next sentence; we’re eager to learn what Proust would have said next. They also have the software draw conclusions about authors’ personalities. For example, we’re told:
“Google admitted its predictions weren’t necessarily ‘particularly accurate,’ but said its AI had identified William Shakespeare as a private person and Mark Twain as an outgoing person. When asked ‘Who is your favourite author?’ and [given] the options ‘Mark Twain’, ‘William Shakespeare’, ‘myself’, and ‘nobody’, the Twain model responded with ‘Mark Twain’ and the Shakespeare model responded with ‘William Shakespeare’. Asked who would answer the phone, the AI Shakespeare hoped someone else would answer, while Twain would try and get there first.”
I can just see Twain jumping over Shakespeare to answer the phone. The article notes that Facebook is also using the work of human authors to teach its AI, though that company elected to use children’s classics like The Jungle Book, A Christmas Carol, and Alice in Wonderland. Will we eventually see a sequel to Through the Looking Glass?
Cynthia Murrell, April 18, 2016
April 15, 2016
The article on eWeek titled Microsoft Debuts Azure Basic Search Tier relates the perks of the new plan from Microsoft, namely, that it is cheaper than the others. At $75 per month (and currently half of for the preview period, so get it while it’s hot!) the Basic Azure plan has lower capacity when it comes to indexing, but that is the intention. The completely Free plan enables indexing of 10,000 documents and allows for 50 megabytes of storage, while the new Basic plan goes up to a million documents. The more expensive Standard plan costs $250/month and provides for up to 180 million documents and 300 gigabytes of storage. The article explains,
“The new Basic tier is Microsoft’s response to customer demand for a more modest alternative to the Standard plans, said Liam Cavanagh, principal program manager of Microsoft Azure Search, in a March 2 announcement. “Basic is great for cases where you need the production-class characteristics of Standard but have lower capacity requirements,” he stated. Those production-class capabilities include dedicated partitions and service workloads (replicas), along with resource isolation and service-level agreement (SLA) guarantees, which are not offered in the Free tier.”
So just how efficient is Azure? Cavanagh stated that his team measured the indexing performance at 15,000 documents per minute (although he also stressed that this was with batches organized into groups of 1,000 documents.) With this new plan, Microsoft continues its cloud’s search capabilities.
Chelsea Kerwin, April 15, 2016
April 13, 2016
The article titled The 10 Commandments of Business Intelligence in Big Data on Datanami offers wisdom written on USB sticks instead of stone tablets. In the Business Intelligence arena, apparently moral guidance can take a backseat to Big Data cost-savings. Suggestions include: Don’t move Big Data unless you must, try to leverage your existing security system, and engage in extensive data visualization sharing (think Github). The article explains the importance of avoiding certain price-gauging traps,
“When done right, [Big Data] can be extremely cost effective… That said…some BI applications charge users by the gigabyte… It’s totally common to have geometric, exponential, logarithmic growth in data and in adoption with big data. Our customers have seen deployments grow from tens of billions of entries to hundreds of billions in a matter of months. That’s another beauty of big data systems: Incremental scalability. Make sure you don’t get lowballed into a BI tool that penalizes your upside.”
The Fifth Commandment remind us all that analyzing the data in its natural, messy form is far better than flattening it into tables due to the risk of losing key relationships. The Ninth and Tenth Commandments step back and look at the big picture of data analytics in 2016. What was only a buzzword to most people just five years ago is now a key aspect of strategy for any number of organizations. This article reminds us that thanks to data visualization, Big Data isn’t just for data scientists anymore. Employees across departments can make use of data to make decisions, but only if they are empowered to do so.
Chelsea Kerwin, April 13, 2016
April 11, 2016
Impacting groups like Target to JP Morgan Chase, data breaches are increasingly common and security firms are popping up to address the issue. The article Dark Web data hunter Terbium Labs secures $6.4m in fresh funding from ZDNet reports Terbium Labs received $6.4 million in Series A funding. Terbium Labs released software called Matchlight which provides real-time surveillance of the Dark Web and alerts enterprises when their organization’s data surfaces. Consumer data, sensitive company records, and trade secrets are among the types of data for which enterprises are seeking protection. We learned,
“Earlier this month, cloud security firm Bitglass revealed the results of an experiment focused on how quickly stolen data spreads through the Dark Web. The company found that within days, financial credentials leaked to the underground spread to 30 countries across six continents with thousands of users accessing the information.”
While Terbium appears to offer value for stopping a breach once it’s started, what about preventing such breaches in the first place? Perhaps there are opportunities for partnerships with Terbium and players in the prevention arena. Or, then again, maybe companies will buy piecemeal services from individual vendors.
Megan Feil, April 11, 2016
April 10, 2016
I read “Stupeflix’s Acquisition by Go Pro: The First Exalead Mafia Exit.” The write up stated:
Acquired in 2011 by Dassault Systemes, Exalead was a power-house for big data & search talent in the mid 2000’s – specifically, out of Exalead Labs, their internal ‘playground’ – and the former employees (most of the engineering team left in the years following the acquisition) have gone on to start great startups: Algolia, Dataiku, OpenDataSoft – even Disclose, our own product, is built by our CTO Guillaume Esquevin, an Exalead alumnus – and, of course, Stupeflix. Cofounders Nicolas Steegman & Francois Lagunas met during their time at Exalead.
The PayPal mafia includes Peter Thiel and a handful of other Silicon Valley luminaries. The French version of the innovation gang has generated a winner.
I noted this statement:
For the rest of the Exalead Mafia, I’ll be keeping an eye out. Another round for Dataiku may be in the works – the startup just moved into some luxurious offices overlooking the Rex theater in the heart of Paris’ startup neighborhood. Algolia’s post-YC growth has been incredible, releasing feature after feature and wooing clients. Most recently they launched Super Bowl Search site where you can search all the ads that have ever aired during the Super Bowl. Expect more great things from the Exalead Mafia.
One point: My records show that Dassault acquired Exalead in 2010 for about $160 million.
Stephen E Arnold, April 10, 2016
April 10, 2016
Short honk: I saw a Tweet about ResearchCue. According to the firm’s Web site, the service handles data aggregation for business intelligence. I checked out the report “Top Companies in Semantic Web.” A search box allows the site visitor to enter Boolean queries. The concept is that a person looking for information wants a report with snippets of relevant information automatically located and displayed in an easy-to-scan format. The presentation highlights important articles, some metrics such as the number of articles and tweets in a time period, and the list of companies in the Semantic Web sector. For vendors of keyword search solutions, this type of service is a reminder that lists of articles are not going to core the apple. Many search vendors talk about “search” and then deliver the 1970s style results. ResearchCue is making more widely available the type of information access tools I discussed in CyberOSINT: Next Generation Information Access. For traditional vendors of proprietary search systems, the future may have already passed many companies by.
Stephen E Arnold, April 10, 2016
April 8, 2016
No matter the industry, it’s tough to recruit and keep talent. As the Skills shortage hits hackers published by Infosecurity Magazine reports, cybercriminals are no exception. Research conducted by Digital Shadows shows an application process exists not entirely dissimilar from that of tradition careers. The jobs include malware writers, exploit developers, and botnet operators. The article explains how Dark Web talent is recruited,
“This includes job ads on forums or boards, and weeding out people with no legitimate technical skills. The research found that the recruitment process often requires strong due diligence to ensure that the proper candidates come through the process. Speaking to Infosecurity, Digital
Shadows’ Vice President of Strategy Rick Holland said that in the untrusted environment of the attacker, reputation is as significant as in the online world and if someone does a bad job, then script kiddies and those who have inflated their abilities will be called out.”
One key difference cited is the hiring timeline; the Dark Web moves quickly. As you might imagine, apparently only a short window of opportunity to cash in stolen credit cards. The sense of urgency related to many Dark Web activities suggests speedier cybersecurity solutions are on the scene. As cybercrime-as-a-service expands, criminals’ efforts and attacks will only be swifter.
Megan Feil, April 8, 2016
April 8, 2016
The article titled GCHQ: Spy Chief Admits UK Agency Losing Cyberwar Despite £860M Funding Boost on International Business Times examines the surprisingly frank confession made by Alex Dewdney, a director at the Government Communications Headquarters (GCHQ). He stated that in spite of the £860M funneled into cybersecurity over the past five years, the UK is unequivocally losing the fight. The article details,
“To fight the growing threat from cybercriminals chancellor George Osborne recently confirmed that, in the next funding round, spending will rocket to more than £3.2bn. To highlight the scale of the problem now faced by GCHQ, Osborne claimed the agency was now actively monitoring “cyber threats from high-end adversaries” against 450 companies across the UK aerospace, defence, energy, water, finance, transport and telecoms sectors.”
The article makes it clear that search and other tools are not getting the job done. But a major part of the problem is resource allocation and petty bureaucratic behavior. The money being poured into cybersecurity is not going towards updating the “legacy” computer systems still in place within GCHQ, although those outdated systems represent major vulnerabilities. Dewdney argues that without basic steps like migrating to an improved, current software, the agency has no hope of successfully mitigating the security risks.
Chelsea Kerwin, April 8, 2016
April 7, 2016
Another wizard has scrutinized the Google and figured out how to make sure your site becomes number one with a bullet.
To get the wisdom, navigate to “Hummingbird – Mastering the art of Conversational Search.” The problem for the GOOG is that it costs a lot of money to index Web sites no one visits. Advertisers want traffic. That means the GOOG has to find a way to reduce costs and sell either more ads or fewer ads at a higher price.
The write up pays scant attention to the realities of the Google. But you will learn the tips necessary to work traffic magic. Okay, I don’t get too excited about info about Google from folks who are not working at the company or who have worked at the company. Sorry. Looking at the Google and reading tea leaves does not work for me.
But what works, according to the write up, are these sure fire tips. Here we go:
- Bone up on latent semantic indexing. Let’s see. That method has been around for 30, maybe 40 years. Get a move on, gentle reader.
- Make your Web site mobile friendly. Unfortunately mobile Web sites don’t get more traffic than a regular Web site which does not get much traffic. Sorry. The majority of clicks flow to a small percentage of the accessible Web sites.
- Forget the keyword thing. Well, I usually use words to write my articles and Web sites. I worry about focusing on a small number of topics and using the words necessary to get my point across. Keywords, in my opinion, are derivatives of information. Forgetting keywords is easy. I never used them before.
- Make your write ups accurate. Okay, that’s a start. What does one do with “real” news from certain sources. The info is baloney, but everyone pretends it is accurate. What’s up with that? The accuracy angle is part of Google’s scoring methods. Each has to deal with what’s correct in his or her own way. Footnotes and links are helpful. What happens when someone disagrees. Is this “accurate”? Oh, well.
- “Be bold and broad.” In my experience, not much content is bold and broad.
Now you understand Google Hummingbird. Will your mobile Web site generate hundreds of thousands of uniques if you adhere to this road map? Nah. Why not follow Google’s guidelines from the Google itself?
Stephen E Arnold, April 7, 2016
April 7, 2016
Once more we turn to the Fuzzy Notepad’s advice and their Pokémon mascot, Evee. This time we visited the fuzz pad for tips on Twitter. The 140-character social media platform has a slew of hidden features that do not have a button on the user interface. Check out “Twitter’s Missing Manual” to read more about these tricks.
It is inconceivable for every feature to have a shortcut on the user interface. Twitter relies on its users to understand basic features, while the experienced user will have picked up tricks that only come with experience or reading tips on the Internet. The problem is:
“The hard part is striking a balance. On one end of the spectrum you have tools like Notepad, where the only easter egg is that pressing F5 inserts the current time. On the other end you have tools like vim, which consist exclusively of easter eggs.
One of Twitter’s problems is that it’s tilted a little too far towards the vim end of the scale. It looks like a dead-simple service, but those humble 140 characters have been crammed full of features over the years, and the ways they interact aren’t always obvious. There are rules, and the rules generally make sense once you know them, but it’s also really easy to overlook them.”
Twitter is a great social media platform, but a headache to use because it never came with an owner’s manual. Fuzzy notepad has lined up hint for every conceivable problem, including the elusive advanced search page.