February 26, 2014
I read “How Covert Agents Infiltrate the Internet to Manipulate, Deceive, and Destroy Reputations.” Public relations may need to do some PR and damage control. The allegedly accurate information provided one more factoid to support our contention that locating and verifying “news” is a tough job.
I will be addressing some of the methods a researcher can use to unwrap the ballistic padding that online services use to keep some information away from the grubby fingers of researchers. Consumers who gobble pay-top-play content are what most online services want. And, if you had not noticed, putting video content front and center is the new trend for those who are looking for facts, data, and high-value analyses.
As Kim Kardashian allegedly said, “I’m an entrepreneur. Ambitious is my middle name.”
The blog post “The Future of the News Business: A Monumental Twitter Stream All in One Place” was more interesting to me. The write up presses some familiar controls on the baloney making machine; for example:
- Consolidation is much better than individual services. I wonder if “consolidation” is a euphemism for monopoly, a concept with which some executives are more familiar. An older-school thinker used the word “convergence” but that buzzword makes an appearance in the source article.
- The time horizon is not three years (a long time in today’s uncertain world). The time horizon is 20 years in the future. I wonder how far in the future Viktor Yanukovych’s chief of staff planned yesterday. I think the plans are on hold for a while.
- The old way of news was monopolistic. The new way is to generate money from many streams; for example, advertising (good), Bitcoin (possibly problematic), and slicing and dicing (a possible copyright quagmire).
- The beacons range from Buzzfeed (listicles) to SearchEngineLand (the logic straining search engine optimization service described as “a place for all the search news, all the time.”)
The opportunity, if I follow the argument, is to tackle the job of creating a monumental Twitter Stream all in one place” with vision, scrappiness, experimentation, adaptability, focus, deferred gratification, and an entrepreneurial mindset.
I appreciate the elegant quote from Tommy Lasorda about how difficult creating a news-oriented “monumental Twitter stream” will be. My hunch is that a fusion of PR methods, content marketing, and “bits are bits” thinking will triumph.
February 17, 2014
I did a series of reports about open source search. Some of these were published under mysterious circumstances by that leader of the azure chip consultants, IDC. You can see the $3,500 per report offers on the IDC site. Hey, I am not getting the money, but that’s what some of today’s go go executives do. The list of titles appears below my signature.
Elasticsearch, a system that is based on Lucene, evolved after the still-in-use Compass system. What seems to have happened in the last six months is one of those singularities that Googlers seek.
In January 2014, GigaOM, a “real news” outfit reported that Elasticsearch had moved from free and open source to a commercial model. You can find that report in “6 million Downloads Later, Elasticsearch Launches a Commercial Product.” The write up equates lots of downloads with commercial success. Well, I am not sure that I accept that. I do know that Elasticsearch landed an additional $24 million in series B funding if Silicon Angle’s information is correct. Elasticsearch, armed with more money than the now aging and repositioning Lucid Works (originally Lucid Imagination) has. (An interview with one of the founders of Lucid Imagination, the precursor of Lucid Works is at http://bit.ly/1gvddt5. Mr. Krellenstein left Lucid Imagination abruptly shortly after this interview appeared.)
I noted that in February 2014, InfoWorld, owned by the publisher of the $3,500 report about Elasticsearch, called the company “ultra hip.” I don’t see many search companies—proprietary or open source—called “hip.” “Ultra Hip Elasticsearch Hits Commercial Release.” The write up asserts (although I wonder who provided the content):
Elasticsearch was originally spun off from the Compass project, an open source Java search engine framework, back in 2004, in an effort to create a highly scalable search solution. Built on top of the well-known and popular Lucene library from the Apache Software Foundation, Elasticsearch adds such features as multitenancy, sharding, faceted search, and a JSON-based REST API. This feature set puts it in competition with the Solr project as a complete search solution built on top of Lucene.
The statement does not hit what I thought are the main points of the Elasticsearch initiative. let me fill in the blanks. Perhaps an azure chip consultant can use these to whip up another $3,500 report?
February 12, 2014
Last I knew, the Google Search Appliance (GAS) had trimmed its product line, eliminated the impulse buy option for the Mini, and kept the price at the higher end of the appliance market.
I learned over the last two years that Google has placed more than 60,000 GSAs in organizations. I have no idea if the number is valid, but if it is, the GSA is one of the top dogs in enterprise search. I also heard that there was a small team working on the GSA and an even smaller team handling customer support. Google pushes functions to resellers who deal with the customers. Google outsources manufacturing of the GSA. Most important, Google seems to have an off-again, on-again interest in on premises search. The future, as I understand it, is the cloud. The GSA is, in my opinion, an anachronism in the Nest, X Labs, and Android-Chrome world. But, hey, I have been wrong before. I once asserted that basic search should not be a challenge for most organizations. Wow, did I get that wrong! Jail time, law suits, and DARPA’s almost admission that search is not working notwithstanding.
The GSA has been around almost a decade. Version 7.2 is “a leader in the Garnet Enterprise Search MQ.” I certainly don’t doubt the word of an estimable azure chip consulting firm. No, no, no.
The new version, according to Google, delivers:
- Metadata sorting. A function available in the 1983 version of Fulcrum Technologies’ system
- language translation. A function available from Delphes in the 1990s
- A document preview function. iPhrase in 1999 delivered this feature
- Entity recognition. Verity implemented this function in the 1980s
- Dynamic navigation. Endeca rolled out this feature in 1998
In my opinion, the GSA is catching up to innovations available for many years from other vendors. Comparing the EPI Thunderstone and Maxxcat appliances to the GSA emphasizes that the GSA is not quite at parity with other products in the channel.
According to “Google Updates Enterprise Search Appliance Tool,”
The GSA 7.2 update comes more than a year after the firm upgraded the GSA to version 7.0, and builds on the features included in that update. The most notable includes the ability to improve the way data can be indexed with key attributes, such as author name, or the date it was created.
How much does a GSA cost? According to the US government’s GSAadvantage.gov, a 36 month license for a GB 7007 is $69,296 for 500,000 documents. Have more documents? Pay for an upgrade. However, I can use a hosted service like Blossom Software to index my content for about $2,400 per month. I can use the low cost dtSearch solution for $160 per seat. I can download an open source solution and do it myself.
For an organization with 20 million documents to index, the cost of the GSA solution noses into HP Autonomy territory. Too rich for my blood, and I think that lower cost appliance vendors will see the Google Search Appliance as a lead generator.
I wonder if those azure chip consultants have licensed the GSA to handle their Intranet information retrieval tasks?
Stephen E Arnold, February 12, 2014
January 23, 2014
Do equations sell? Some color:
I know that I received negative feedback when I described the mathematical procedures used for Google’s semantic search inventions. I receive presentations and links to presentations frequently. Few of these contain mathematical expressions. In my forthcoming no-cost discussion of Autonomy from 1996 to 2007, I include one equation. I learned my lesson. Today’s search and content processing truth seekers want training wheels, not higher level math. I find this interesting because as systems become easier to use, the fancy math becomes more important.
Anyway, imagine my surprise when I received a link to a company founded 14 years ago. The outfit does business as Digital Reasoning, and it competes with Palantir (a segment superstar), IBM i2 (the industry leader for relationship analysis), and Recorded Future (backed, in part, by the Google). Dozens of other companies chase revenues in this content processing sector. Today’s New York Times includes a content marketing home run by an outfit called YarcData. You can find this op ed piece by Tim White on page A 23 of the dead tree version of the paper I received this morning (January 23, 2014). Now that’s a search engine optimization Pandas and the Times’s demographic can love.
To the presentation. My link points to Paragon Science at http://slidesha.re/1jpXAGd. I was logged in automatically, so you may have to register to flip through the slide deck.
Navigate to slides 33 and following. Slides 1 to 32 review how text has been parsed for decades. The snappy stuff kicks in on page 33. There are some incomprehensible graphics. These Hollywood style data visualizations are colorful. I, unlike the 20 somethings who devour this approach to information, have a tough time figuring out what I am supposed to glean.
At slide 42, I am introduced to “dynamic cluster analysis.” The approach echoes the methods developed by Dr. Ron Sacks-Davis in the late 1970s and embedded in some of the routines of the 1980 system that a decade later became better known as InQuirion and then TeraText.
At slide 44, the fun begins. Here’s an example which I am sure you will recall from your class in chaos mathematics. If you can’t locate your class notes, you can get a refresher at http://bit.ly/1mKR3G9 courtesy of Cal Tech, home of the easy math classes as I learned during my stint at Halliburton Nuclear Utility Services. The tough math classes were taught at MIT, the outfit that broke new ground in industry sponsored educational methods.
January 18, 2014
Years ago I gave a lecture at Yale. My subject was Google. I ran through the basic points in The Google Legacy and Google Version 2.0. The audience reacted as if I had dissected a dead frog. I received a smattering of polite applause and headed out for a talk in New York City. So much for Yale and the idea that Google was more than a Web search company.
I just read “Yale Students Made a Better Version of Their Course Catalogue. Then Yale Shut It Down.” A couple of students put up a Web page that allowed students to pinpoint classes and compare student ratings of professors. Sounds like an app to me.
Information? Who said it was supposed to be free? Image source: http://1.usa.gov/1dFIhW9
But Yale perceived the Web page differently. Here’s the quote:
‘Yale’s policy on free expression and free speech entitles no one to appropriate a Yale resource and use it as their [sic] own ,’ the statement read. It further stated its main priority at this time was supporting its own resources, ‘not others created independently and without the university’s cooperation or permission,’ and that ‘all the information on the website remains available to students on the Yale site.’
I assume the Washington Post is semi-accurate, just like an Amazon recommendation.
What did the future bonesmen learn? A nuance of academic freedom in Yale Land has been broadcast in an analogue transmission.
Will these two free thinkers demonstrate digital initiative in the future? Is Yale turning out well-trained online researchers for the next-generation information highway?
Stephen E Arnold, January 18, 2014
January 9, 2014
The article on Business Insider titled Here’s How Many Times People Switch Devices In a Single Hour provides insight into the studies being undertaken by both Google and Facebook into following users from device to device. They need to demonstrate to advertisers that the ad one user saw on his laptop at work later caused him to make a purchase from his smartphone. The article states
“A new study from the British unit of advertising buyer OMD shows just how massively important this cross-device tracking has become to monitoring a given consumer’s behavior.
In looking at the behavior of 200 Brits during one evening, OMD found that the average person shifted his attention between his smartphone, tablet, and laptop a staggering 21 times in one hour.”
This study’s findings may not come as huge surprise. An article on Salon titled How Baby Boomers Screwed Their Kids and Created Millennial Impatience argues that the Generation Y is the most distracted and impatient batch of people yet. The article contends,
“According to a study at Northwestern University, the number of children and young people diagnosed with attention deficit hyperactivity disorder (ADHD) shot up 66 percent between 2000 and 2010. Why the sudden and huge spike in a frontal lobe dysfunction over the course of a decade… What I believe is likely happening, however, is that more young people are developing an addiction to distraction. An entire generation has become addicted to the dopamine-producing effects of text messages, e-mails and other online activities.”
This “addiction to distraction” is often held up by Gen Y’ers as an ability to “multi-task”. But what does it mean to be someone unable to focus? In Buddhism there is the belief that if you are doing more than one focused task, you are not truly alive.
With telework, the workplace is now the world.
We have all succumbed at one time or another to the call of checking our e-mail, Facebook, or Twitter account, but when we are doing it so often that it takes over our concentration, what have our lives become? There is a wide gap between flitting from these exciting distractions and actually gaining some foothold of understanding. And the more we do jump back and forth between tasks, the less likely it becomes that any knowledge is created or stored. The Salon article paints a bleak picture, starting off with the dark Philip Larkin poem “This Be the Verse” (it is hardly “High Windows”) and including this dreary image of the future,
January 6, 2014
I follow two or three LinkedIn groups. Believe me. The process is painful. On the plus side, LinkedIn’s discussions of “enterprise search” reveal the broken ribs in the body of information retrieval. On the surface, enterprise search and content processing appear to be fit and trim. The LinkedIn discussion X-ray reveals some painful and potentially life-threatening injuries. Whether it is marketing professionals at search vendors or individuals with zero background in information retrieval, the discussions often give me a piercing headache.
The eruption of digital information posed a challenge to UK firms in Autonomy’s “Information Black Holes” report. © Autonomy, 1999
One of the “gaps” in the enterprise search sector is a lack of historical perspective. Moderators and participants see only the “now” of their search work. When looking down the information highway, the LinkedIn search group participants strain to see bright white lines. Anyone who has driven on the roads in Kentucky knows that lines are neither bright nor white. Most are faded, mere suggestions of where the traffic should flow.
In 1999, I picked up a printed document called “Information Black Holes.” The subtitle was this question, “Will the Evolution of EIPs Save British Business £17 Billion per Year?” The author of the report was an azure chip consulting firm doing business as “Continental Research.” The company sponsoring the research was Autonomy. Autonomy as a concept relates to “automatic”, “automation,” and “autonomous.” This connotation is a powerful one. Think “automation” and the mind accepts an initial investment followed by significant cost reductions. Autonomy had a name and brand advantage from its inception. Who remembers Cambridge Neurodynamics? Not many of the 20 something flogging search and content processing systems in 2014 I would wager.
As you may know, Hewlett Packard purchased Autonomy in 2011. I doubt that HP has a copy of this document, and I know that most of the LinkedIn enterprise search group members have not read the report. I understand because 15 year old marketing collateral (unlike Kentucky bourbon) does not often improve with age. But “Information Black Holes” is an important document. Unwittingly today’s enterprise search vendors are addressing many of the topics set forth in the 1999 Autonomy publication.
December 20, 2013
One of the ArnoldIT goslings called to my attention a 2011 PDF white paper with the title (I kid you not):
Human inFormation (sic): Cloud, pan enterprise search, automation, video search, audio search, discovery, infrastructure platfo9rm, Big Data, business process management, mobile search, OEMs, and advanced analytics.
I checked on December 19, 2013, and this PDF was available at http://bit.ly/19Vwkqg.
That covers a lot of ground even for HP with or without Autonomy. The analysis includes some “factoids”; for example:
- Unstructured data represents 85% of all information but structure information is growing at 22% CAGR
- Unstructured information is growing at 62% CAGR.
- Users upload 35 hours of video every minute
- Unstructured data will grow to over 35 zettabytes by 2020
- Videos on YouTube were viewed 2 billion times per day, 20 times more than in 2006.
You get the idea. With lots of data, information is a problem. I need to pause a moment and catch my breath.
Well, “it’s not just about search.” Again, I must pause. One Mississippi, two Mississippi, and three Mississippi. Okay.
Fundamentally, the ability to understand meaning and automatically process information is all about distance, probabilities, relativeness (sic), definitions, slang, and more. It is an overwhelming and continually growing problem that requires advanced technology to solve.
One technique is to use structured data methods to solve the unstructured problem. (Wasn’t this the approach taken by Fulcrum Technologies, what? 25 or 30 years ago? I just read a profile of Fulcrum that suggested Fulcrum did this first and continues chugging along within the OpenText product line up which competes directly with HP in information archiving.
HP points out, “People are Lazy.” More interesting is this observation, “People are stupid.” I thought about HP’s write off of billions after owning a company for a couple of years, but I assume that HP means “other people” are stupid, not HP people.
December 16, 2013
Is there a connection between Big Data and grilling? Is there a connection between Big Data and your business?
I read “Big Data Beyond Business Intelligence: Rise Of The MBAs.” The write up is chock full of statements about large data sets and the numerical recipes required to tame them. But none of the article’s surprising comments matches one point I noticed.
Here’s the quote:
Software automation can’t improve without reorganizing a company around its data. Consider it organizational self-reflection, learning from every interaction humans have with work-related machines. Collaborative, social software is at the heart of this interaction. Software must find innovative ways to interface data with employees, visualization being the most promising form of data democratization.
I will be the first to admit that the economic revolution has left some businesses reeling, particularly in rural Kentucky. Other parts of the country are, according to some pundits, bursting with health.
Is a business reorganization better with Big Data?
Will Big Data deliver better grilled meat? Buy a copy of this book by Lilly and Gibson and see if there are ways to reorganize the business of grilling around self reflection. Big Data cannot deliver a sure fire winning steak? Will Big Data deliver for other businesses?
But for the business that is working hard to make sales, meet payroll, and serve its customers, Big Data as a concept is one facet of senior managers’ work. Information is important to a business. The idea that more information will contribute to better decisions is one of the buttons that marketers enjoy mashing. Software is useful, but it is by itself not a panacea. Software can sink a business as well as float it.
However, figuring out the nuances buried within Big Data, a term that is invoked, not defined, is difficult. The rise of the data scientist is a reminder that having volumes of data to review requires skills many do not possess. Data integrity is one issue. Another is the selection of mathematical tools to use. Then there is the challenge of configuring the procedures to deliver outputs that make sense.
December 8, 2013
I do work for hire. The idea, as I implement it, requires someone to pay me; for example, a publisher like Galatea, IDC, or Pandia Press. I then submit written information for that money. The publisher can do with the information whatever the purchaser wants. Some publishers have spotty records of payment, but after working for “real” journalism and publishing outfits for years, slow pay or in some cases no pay is more common than I thought. I like to reflect on my naive understanding of the information business in 1954 when I wrote for money “Burger Boat Drive In” for the St Louis Post Dispatch. Think of it: That time span covers 60 years.
I read “Academia.edu Slammed with Takedown Notices from Journal Publisher Elsevier.” I found the write up amusing. I thought that “real” publishers had cracked down on tricky PhDs and “experts” who posted their research on their blogs or on silly academic or public-service-type Web sites a long time ago.
I was dead wrong. It seems that Elsevier, a renowned scientific and technical publisher, was asleep at the switch. Elsevier owns part of Reed Elsevier, another top flight information outfit. If anyone could locate duplicate content, it would be the experts at Elsevier. After all, at their fingertips were duplicate busting online search tools like LexisNexis text mining and search systems. A mouse click away is Google’s outstanding search system. For the more sophisticated investigator, Elsevier can use tools from Dassault or Yandex to locate improper use of content Elsevier owns.
A happy quack to Wikipedia at http://bit.ly/1d3pH7l
The write up tells me:
“In the past, Elsevier has sent out one or two DMCAs a week,” Price [Academia.edu’s top dog] wrote. “Then, a few weeks ago, Elsevier started sending Academia.edu DMCA take-down notices in batches of a thousand for papers that academics had uploaded to the site. This is what has caused the recent outcry in the blogosphere and Twitter.”
So what’s the big deal?
The article tries to answer my question:
Still, Elsevier’s ramping up of take-down requests is reminiscent of the shake-up happening as a result of the rise of massively open online courses, which have enabled millions to learn at a high level — for free. It could be that the basic premise of Academia.edu will throw things off kilter for publishers and cause them to react. And it even has a bit of the flavor of Aaron Swartz’s efforts to liberate academic papers from the premium site JSTOR.
I am not sure but I don’t think Mr. Swartz weathered the “free content” storm particularly well.