May 5, 2016
Search engine optimization, better known as SEO, is one of the prime tools Web site owners must master in order for their site to appear in search results. A common predicament most site owners find themselves in is that they may have a fantastic page, but if a search engine has not crawled it, the site might as well not exist. There are many aspects to mastering SEO and it can be daunting to attempt to make a site SEO friendly. While there are many guides that explain SEO, we recommend Mattias Geniar’s “A Technical Guide To SEO.”
Some SEO guides get too much into technical jargon, but Geniar’s approach uses plain speak so even if you have the most novice SEO skills it will be helpful. Here is how Geniar explains it:
“If you’re the owner or maintainer of a website, you know SEO matters. A lot. This guide is meant to be an accurate list of all technical aspects of search engine optimisation. There’s a lot more to being “SEO friendly” than just the technical part. Content is, as always, still king. It doesn’t matter how technically OK your site is, if the content isn’t up to snuff, it won’t do you much good.”
Understanding the code behind SEO can be challenging, but thank goodness content remains the most important aspect part of being picked up by Web crawlers. These tricks will only augment your content so it is picked up quicker and you will receive more hits on your site.
May 2, 2016
If you believe the Dark Web was destroyed when Silk Road went offline, think again! The Dark Web has roots like a surface weed, when one root remains there are dozens (or in this case millions) more to keep the weed growing. Tech Insider reports that OpenBazaar now occupies the space Silk Road vacated, “A Lawless And Shadowy New Corner Of The Internet Is About TO Go Online.”
OpenBazaar is described as a decentralized and uncensored online marketplace where people can sell anything without the fuzz breathing down their necks. Brian Hoffman and his crew had worked on it since 2014 when Amir Taaki thought it up. It works similar to eBay and Etsy as a peer-to-peer market, but instead of hard currency it uses bitcoin. Since it is decentralized, it will be near impossible to take offline, unlike Silk Road. Hoffman took over the project from Taaki and after $1 million from tech venture capital firms the testnet is live.
“There’s now a functioning version of OpenBazaar running on the “testnet.” This is a kind of open beta that anyone can download and run, but it uses “testnet bitcoin” — a “fake” version of the digital currency for running tests that doesn’t have any real value. It means the developer team can test out the software with a larger audience and iron out the bugs without any real risk.” If people lose their money it’s just a horrible idea,” Hoffman told Business Insider.”
A new user signs up for the OpenBazaar testnet every two minutes and Hoffman hopes to find all the bugs before the public launch. Hoffman once wanted to run the next generation digital black market, but now he is advertising it as a new Etsy. The lack of central authority means lower take rates or the fees sellers incur for selling on the site. Hoffman says it will be good competition for online marketplaces because it will force peer-to-peer services like eBay and Etsy find new ways to add value-added services instead of raising fees on customers.
May 1, 2016
Apache Lucene receives the most headlines when it comes to discussion about open source search software. My RSS feed pulled up another open source search engine that shows promise in being a decent piece of software. Open Semantic Search is free software that cane be uses for text mining, analytics, a search engine, data explorer, and other research tools. It is based on Elasticsearch/Apache Solrs’ open source enterprise search. It was designed with open standards and with a robust semantic search.
As with any open source search, it can be programmed with numerous features based on the user’s preference. These include, tagging, annotation, varying file format support, multiple data sources support, data visualization, newsfeeds, automatic text recognition, faceted search, interactive filters, and more. It has the benefit that it can be programmed for mobile platforms, metadata management, and file system monitoring.
Open Semantic Search is described as
“Research tools for easier searching, analytics, data enrichment & text mining of heterogeneous and large document sets with free software on your own computer or server.”
While its base code is derived from Apache Lucene, it takes the original product and builds something better. Proprietary software is an expense dubbed a necessary evil if you work in a large company. If, however, you are a programmer and have the time to develop your own search engine and analytics software, do it. It could be even turn out better than the proprietary stuff.
April 29, 2016
There is a new tool for organizations to more quickly detect whether their sensitive data has been hacked. The Atlantic discusses “The Spider that Crawls the Dark Web Looking for Stolen Data.” Until now, it was often many moons before an organization realized it had been hacked. Matchlight, from Terbium Labs, offers a more proactive approach. The service combs the corners of the Dark Web looking for the “fingerprints” of its clients’ information. Writer Kevah Waddell reveals how it is done:
“Once Matchlight has an index of what’s being traded on the Internet, it needs to compare it against its clients’ data. But instead of keeping a database of sensitive and private client information to compare against, Terbium uses cryptographic hashes to find stolen data.
“Hashes are functions that create an effectively unique fingerprint based on a file or a message. They’re particularly useful here because they only work in one direction: You can’t figure out what the original input was just by looking at a fingerprint. So clients can use hashing to create fingerprints of their sensitive data, and send them on to Terbium; Terbium then uses the same hash function on the data its web crawler comes across. If anything matches, the red flag goes up. Rogers says the program can find matches in a matter of minutes after a dataset is posted.”
What an organization does with this information is, of course, up to them; but whatever the response, now they can implement it much sooner than if they had not used Matchlight. Terbium CEO Danny Rogers reports that, each day, his company sends out several thousand alerts to their clients. Founded in 2013, Terbium Labs is based in Baltimore, Maryland. As of this writing, they are looking to hire a software engineer and an analyst, in case anyone here is interested.
Cynthia Murrell, April 29, 2016
April 27, 2016
It looks like some hackers are no longer afraid of the proverbial light, we learn from “Sony Hackers Still Active, ‘Darkhotel’ Checks Out of Hotel Hacking” at InformationWeek. Writer Kelly Jackson Higgins cites Kaspersky security researcher Juan Andres Guerrero-Saade, who observes that those behind the 2014 Sony hack, thought to be based in North Korea, did not vanish from the scene after that infamous attack. Higgins continues:
“There has been a noticeable shift in how some advanced threat groups such as this respond after being publicly outed by security researchers. Historically, cyber espionage gangs would go dark. ‘They would immediately shut down their infrastructure when they were reported on,’ said Kurt Baumgartner, principal security researcher with Kaspersky Lab. ‘You just didn’t see the return of an actor sometimes for years at a time.’
“But Baumgartner says he’s seen a dramatic shift in the past few years in how these groups react to publicity. Take Darkhotel, the Korean-speaking attack group known for hacking into WiFi networks at luxury hotels in order to target corporate and government executives. Darkhotel is no longer waging hotel-targeted attacks — but they aren’t hiding out, either.
“In July, Darkhotel was spotted employing a zero-day Adobe Flash exploit pilfered from the HackingTeam breach. ‘Within 48 hours, they took the Flash exploit down … They left a loosely configured server’ exposed, however, he told Dark Reading. ‘That’s unusual for an APT [advanced persistent threat] group.’”
Seeming to care little about public exposure, Darkhotel has moved on to other projects, like reportedly using Webmail to attack targets in Southeast Asia.
On the other hand, one group which experts had expected to see more of has remained dark for some time. We learn:
“Kaspersky Lab still hasn’t seen any sign of the so-called Equation Group, the nation-state threat actor operation that the security firm exposed early last year and that fell off its radar screen in January of 2014. The Equation Group, which has ties to Stuxnet and Flame as well as clues that point to a US connection, was found with advanced tools and techniques including the ability to hack air gapped computers, and to reprogram victims’ hard drives so its malware can’t be detected nor erased. While Kaspersky Lab stopped short of attributing the group to the National Security Agency (NSA), security experts say all signs indicate that the Equation Group equals the NSA.”
The Kaspersky team doesn’t think for a minute that this group has stopped operating, but believe they’ve changed up their communications. Whether a group continues to lurk in the shadows or walks boldly in the open may be cultural, they say; those in the Far East seem to care less about leaving tracks. Interesting.
Cynthia Murrell, April 27, 2016
April 21, 2016
Is Google trying to emulate BAE System‘s NetReveal, IBM i2, and systems from Palantir? Looking back at an older article from Search Engine Watch, How the Semantic Web Changes Everything for Search may provide insight. Then, Knowledge Graph had launched, and along with it came a wave of communications generating buzz about a new era of search moving from string-based queries to a semantic approach, organizing by “things”. The write-up explains,
“The cornerstone of any march to a semantic future is the organization of data and in recent years Google has worked hard in the acquisition space to help ensure that they have both the structure and the data in place to begin creating “entities”. In buying Wavii, a natural language processing business, and Waze, a business with reams of data on local traffic and by plugging into the CIA World Factbook, Freebase and Wikipedia and other information sources, Google has begun delivering in-search info on people, places and things.”
This article mentioned Knowledge Graph’s implication for Google to deliver strengthened and more relevant advertising with this semantic approach. Even today, we see the Alphabet Google thing continuing to shift from search to other interesting information access functions in order to sell ads.
Megan Feil, April 21, 2016
April 12, 2016
I read “With Government Data Unlocked, MIT Tries to Make It Easier to Soft Through.” I came away from the write up a bit confused. I recall that Palantir Technologies offered for a short period of time a site called AnalyzeThe.US. It disappeared. I also recalled seeing a job posting for a person with a top secret clearance who knew Tableau (Excel on steroids) and Palantir Gotham (augmented intelligence). Ii am getting old but I thought that Michael Kim, once a Deloitte wizard, gave a lecture about how one can use Palantir for analytics.
Why is this important?
The write up points out that MIT worked with Deloitte which, I learned:
provided funding and expertise on how people use government data sets in business and for research.
The Gray Lady’s article does not see any DNA linking AnalyzeThe.US, Deloitte, and the “new” Data USA site. Palantir’s Stephanie Yu gave a talk at MIT. I wonder if those in that session perceive any connection between Palantir and MIT. Who knows. I wonder if the MIT site makes use of AngularJS.
With regard to US government information, www.data.gov is still online. The information can be a challenge for a person without Tableau and Palantir expertise to wrangle in my experience. For those who don’t think Palantir is into sales, my view is that Palantir sells via intermediaries. The deal, in this type of MIT case, is to try to get some MIT students to get bitten by the Gotham and Metropolitan fever. Thank goodness I am not a real journalist trying to figure out who provides what to whom and for what reason. Okay, back to contemplating the pond filled with Kentucky mine run off water.
Stephen E Arnold, April 12, 2016
April 8, 2016
I read “Business Analytics Is a Big Sham and Over Rated.” My hunch is that the write up is a bit of April fool baloney. But, maybe not?
Many vendors are changing their marketing collateral to proclaim one very special outfit can make sense out of oodles of data and textual information.
The write up makes some interesting statements; for example:
Analysts waste every one’s time. Perhaps the statement should be “often are too busy to deal with requests for their services.”
But the write up is an April Fool joke. The problem is that large organizations and government entities want a silver bullet. Who has witnessed the implosion of a massive enterprise software project?
In my experience, business analytics are becoming a must have function. The problem is that the hoo haa tossed around by vendors and pundits seems reasonably accurate.
Humor and reality are one.
Stephen E Arnold, April 8, 2016
April 6, 2016
I read “IBM Launches Mainframe Platform for Spark.” This is an announcement which makes sense to me. The Watson baloney annoys; the mainframe news thrills.
According to the write up:
IBM is expanding its embrace of Apache Spark with the release of a mainframe platform that would allow the emerging open-source analytics framework to run natively on the company’s mainframe operating system.
I noted this passage as well:
The IBM platform also seeks to leverage Spark’s in-memory processing approach to crunching data. Hence, the z Systems platform includes data abstraction and integration services so that z/OS analytics applications can leverage standard Spark APIs. That approach eliminates processing and security issues associated with ETL while allowing organizations to analyze data in-place.
Hopefully IBM will play to its strengths not chase rainbows.
Stephen E Arnold, April 6, 2016
April 1, 2016
According to the capitalist tool:
A new survey of data scientists found that they spend most of their time massaging rather than mining or modeling data.
The point is that few wizards want to come to grips with the problem of figuring out what’s wrong with data in a set or a stream and then getting the data into a form that can be used with reasonable confidence.
Those exception folders, annoying, aren’t they?
The write up points that a data scientist spends 80 percent of his or her time doing housecleaning. Skip the job and the house becomes unpleasant indeed.
The survey also reveals that data scientists have to organize the data to be analyzed. Imagine that. The baloney about automatically sucking in a wide range of data does not match the reality of the survey sample.
Another grim bit of drudgery emerges from the sample which we assume was conducted with the appropriate textbook procedures was that the skills most in demand were for SQL. Yep, old school.
Consider that most of the companies marketing next generation data mining and analytics systems never discuss grunt work and old fashioned data management.
Why the disconnect?
My hunch is that it is the sizzle, not the steak, which sells. Little wonder that some analytics outputs might be lab-made hamburger.
Stephen E Arnold, April 1, 2016