Xenky.com for a Google X Ray

March 26, 2013

If you want one click access to preliminary versions of 50 of my articles about Google, Xenky.com has a hyperlinked list. These are pre publication drafts, but the basic information is available. The more than 50 articles average 2,500 words and cover a range of subjects. Originally I planned to do another Google monograph. You can find the list of articles at Google X Ray. Xenky.com is a portal to the content produced by ArnoldIT. There is no charge for the content. Keep in mind that some of the final versions of the articles are owned by various publishers. I am providing this information for students and libraries. Be sure to run queries for the final version of the document on a for-fee information service which indexes commercial content. One final comment: Additional Google content appears in my monthly newsletter Honk, which is available at this link. Registration is required for the newsletter.

Stephen E Arnold, March 26, 2013

LucidWorks Partners with MongoDB

March 26, 2013

One of the strengths of LucidWorks is their willingness to partner with other companies to better meet the needs of today’s enterprises. MarketWatch covers the most recent LucidWorks partnership in their article, “LucidWorks Brings the Power of Enterprise Search to MongoDB.”

The article sums up the news:

“LucidWorks, the company transforming the way people access information, today announced the integration between LucidWorks Search and MongoDB. The combined solution brings search and analysis capabilities to MongoDB so organizations can easily search their MongoDB NoSQL database to discover actionable insights within the reams of semi-structured data. Together, LucidWorks and MongoDB extend the existing security and scalability benefits that LucidWorks Search brings to enterprises, driving innovation and enabling more ways to search and analyze big data.”

In addition to increased functionality, users can also expect increased security benefits from the partnership. The biggest direct benefit is to existing MongoDB users who can now search that data store directly with the power of LucidWorks. But, existing LucidWorks users now also have an additional option for storing their unstructured data.

Emily Rae Aldridge, March 26, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Alternative Weekly Publications Turn To Pulp

March 26, 2013

 

Alternative weekly prints are (were?) the younger sibling of big name newspapers. They provided an alternative viewpoint on the news and appealed to the vast subcultures that thrive in developed countries. According to Jack Shafer’s blog on Reuters this is the start of, ”The Long, Slow Decline Of Alt-Weeklies.” Much like the bigger publications, the alt-weekly titles saw plummeting sales with the digital print boom. It used to be and for a little while longer, alt-weeklies were the prime source for personals, jobs, apartments, etc. and while the publishers wanted readers to think it was the alternative views that drove sales, really it was these classified ads. Another big hit to the industry was when the record companies pulled their advertising and the retail stores that used to carry the alt-weeklies disappeared.

 

They alt-weeklies used to an anti-boredom device, but:

“…even a human fossil must concede that the smartphone trumps the alt-weekly as a boredom killer. How does a wedge of newsprint compete with an affordable messaging device that ferries games, social media apps, calendars, news, feature films, scores, coupons and a library’s worth of music and reading material? Ask a young person his opinion and he’ll tell you that nothing says “geezer” like a newspaper, be it daily or alt-weekly.”

 

Alt-weeklies are a losing business. Does this parallel the decline of search and retrieval, commercial database publishing, and content management systems. The market just drifts away. Transition periods stink.

 

Whitney Grace, March 26, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Give A Hand For Old Fashioned Journalistic Bribery

March 26, 2013

A top news story used to either make or break a reporter, though it can still do so today, the old channels are mostly closed and monitored by the Internet beat. Reporters used to have to bribe sources and the best information continues to come from the source. In a throwback to the old days, The Guardian says, “Wall Street Journal Blames Beijing Troublemaking For US Bribery Probe.” The accusation is that the Chinese Wall Street Journal office bribed government officials with expensive gifts for information. The US Justice Department was already conducting an investigation on the Journal’s parent company News Corporation under the Foreign Corrupt Practices Act.

News Corporation believes that someone only wants to make trouble for the Journal and they are upset over the allegations. They also believe a Chinese government agent tipped off authorities. In an internal investigation, News Corporation did not find anything wrong.

How did this happen?

 

“The newspaper believes the bribery allegation came in relation to the Journal’s reporting of events in Chongqing, the province in which disgraced Chinese official Bo Xilai once had a power base.”

 

and:

“The report also comes in the wake of claims that China has hacked into the systems of US newspapers – allegations that are denied by Beijing.”

 

The proper authorities are conducting further investigation, while the US, England, and China argue back and forth, name-calling and the like. The new Chinese premier Li Keqiang even made a statement that everyone should forget this event and concentrate on preventing further cyber attacks. Only in a perfect world or if something bigger comes along, like North Korea gaining an atom bomb.

 

Whitney Grace, March 26, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Taming Unstructured Information

March 25, 2013

Right now, as you read this, your company’s data are piling up. Scarier yet, most don’t have a way to structure all that precious information, so it goes to waste. Thankfully, clarity is on the way as we found in a recent Paradigma Labs story, “Unstructured Information Extraction: A Sample Case with a Unitex-Manager.”

The article lays out the problem:

There is a lot of information in today’s companies flowing from one computer to another like e-mails, documents, many kinds of files and, of course, the webs the employees surf through. These electronic documents probably contain part of the core knowledge of the company or, at least, very useful information which besides of being easily readable by humans is unstructured and impossible to be processes automatically using computers. The amount of unstructured information in enterprises is around 80% [1] to 85% [2] nowadays, and such a situation is a disadvantage…

This has been an elephant in the room for many preparing to start squeezing help from their data. Unstructured data can derail good intentions by making it impossible to sort out. Thankfully, there are companies with experience in structuring the unstructured and then forming useful analytic insights from this info. One of our favorites is the international firm, Sinequa who boast an incredible two-plus decades in the business.

Patrick Roland, March 25, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search.

Government Initiatives and Search: A Make-Work Project or Innovation Driver?

March 25, 2013

I don’t want to pick on government funding of research into search and retrieval. My goodness, pointing out that payoffs from government funded research into information retrieval would bring down the wrath of the Greek gods. Canada, the European Community, the US government, Japan, and dozens of other nation states have poured funds into search.

In the US, a look at the projects underway at the Center for Intelligent Information Retrieval reveals a wide range of investigations. Three of the projects have National Science Foundation support: Connecting the ephemeral and archival information networks, Transforming long queries, and Mining a million scanned books. These are interesting topics and the activity is paralleled in other agencies and in other countries.

Is fundamental research into search high level busy work. Researchers are busy but the results are not having a significant impact on most users who struggle with modern systems usability, relevance, and accuracy.

In 2007 I read “Meeting of the MINDS: An Information Retrieval Research Agenda.” The report was sponsored by various US government agencies. The points made in the report were, like the University of Massachusetts’ current research run down, were excellent. The 2007 recent influences are timely six years later. The questions about commercial search engines, if anything, are unanswered. The challenges of heterogeneous data also remain. Information analysis and organization which is today associated with analytics and visualization-centric systems could be reprinted with virtually no changes. I cite one example, now 72 months young, for your consideration:

We believe the next generation of IR systems will have to provide specific tools for information transformation and user-information manipulation. Tools for information transformation in real time in response to a query will include, for example, (a) clustering of documents or document passages to identify both an information group and also the document or set of passages that is representative of the group; (b) linking retrieved items in timelines that reflect the precedence or pseudo-causal relations among related items; (c) highlighting the implicit social networks among the entities (individuals) in retrieved material;
and (d) summarizing and arranging the responses in useful rhetorical presentations, such as giving the gist of the “for” vs. the “against” arguments in a set of responses on the question of whether surgery is recommended for very early-stage breast cancer. Tools for information manipulation will include, for example, interfaces that help a person visualize and explore the information that is thematically related to the query. In general, the system will have to support the user both actively, as when the user designates a specific information transformation (e.g., an arrangement of data along a timeline), and also passively, as when the system recognizes that the user is engaged in a particular task (e.g., writing a report on a competing business). The selection of information to retrieve, the organization of results, and how the results are displayed to the user all are part of the new model of relevance.

In Europe, there are similar programs. Examples range from Europa’s sprawling ambitions to Future Internet activities. There is Promise. There are data forums, health competence initiatives, and “impact”. See, for example, Impact. I documented Japan’s activities in the 1990s in my monograph Investing in an Information Infrastructure, which is now out of print. A quick look at Japan’s economic situation and its role in search and retrieval reveals that modest progress has been made.

Stepping back, the larger question is, “What has been the direct benefit of these government initiatives in search and retrieval?”

On one hand, a number of projects and companies have been kept afloat due to the funds injected into them. In-Q-Tel has supported dozens of commercial enterprises, and most of them remain somewhat narrowly focused solution providers. Their work has been suggestive, but none has achieved the breathtaking heights of Facebook or Twitter. (Search is a tiny part of these two firms, of course, but the government funding has not had a comparable winner in my opinion.) The benefit has been employment, publications like the one cited above, and opportunities for researchers to work in a community.,

On the other hand, the fungible benefits have been modest. As the economic situation in the US, Europe, and Japan has worsened, search has not kept pace. The success story is Google, which has used search to sell advertising. I suppose that’s an innovation, but it is not one which is a result of government funding. The Autonomy, Endeca, Fast Search-type of payoff has been surprising. Money has been made by individuals, but the technology has created a number of waves. The Hewlett Packard Autonomy dust up is an example. Endeca is a unit of Oracle and is becoming more of a utility than a technology game changer. Fast Search has largely contracted and has, like Endeca, become a component.

Some observations are warranted.

First, search and retrieval is a subject of intense interest. However, the progress in information retrieval is advancing just slowly in my opinion. I think there are fundamental issues which researchers have not been able to resolve. If anything, search is more complicated today than it was when the Minds Agenda cited above was published. The question is, “Maybe search is more difficult than finding the Higgs Boson?” If so, more funding for search and retrieval investigations is needed. The problem is that the US, Europe, and Japan are operating at a deficit. Priorities must come into play.

Second, the narrow focus of research, while useful, may generate insights which affect the margins of larger information retrieval questions. For example, modern systems can be spoofed. Modern systems generate strong user antipathy more than half the time because they are too hard to use or don’t answer the user’s question. The problem is that the systems output information which is quite likely incorrect or not useful. Search may contribute to poor decisions, not improve decisions. The notion that one is better off using more traditional methods of research is something not discussed by some of the professionals engaged in inventing, studying, or selling search technology.

Third, search has fragmented into a mind boggling number of disciplines and sub-disciplines. Examples range from Coveo (a company which has ingested millions in venture funding and support from the province of Québec) which is sometimes a customer support system and sometimes a search system to Palantir (a recipient of venture funding and US government funding) which outputs charts and graphs, relegating search to a utility function.

Net net: I am not advocating the position that search is unimportant. Information retrieval is very important. One cannot perform some work today unless one can locate a specific digital item in many cases.

The point is that money is being spent, energies invested, and initiatives launched without accountability. When programs go off the rails, these programs need to be redirected or, in some cases, terminated.

What’s going on is that information about search produced in 2007 is as fresh today as it was 72 months ago. That’s not a sign of progress. That’s a sign that very little progress is evident. The government initiatives have benefits in terms of making jobs and funding some start ups. I am not sure that the benefits affect a broader base of people.

With deficit financing the new normal, I think accountability is needed. Do we need some conferences? Do we need giveaways like pens and bags? Do we need academic research projects running without oversight? Do we need to fund initiatives which generate Hollywood type outputs? Do we need more search systems which cannot detect semantically shaped or incorrect outputs?

Time for change is upon us.

Stephen E Arnold, March 25, 2013

Cengage: Time to Disengage?

March 25, 2013

Thomson Reuters in “Cengage Learning Hires Restructuring Advisers” reported that a former Thomson property is arranging a modest infusion of cash. “Modest” in this context is about $430 million, which is nothing when compared to the cost of a modern text book. (“See Textbook Prices Are Inflating Even Faster Than Tuition Prices: New Boston University Classifieds for Students Makes Buying Textbooks More Affordable.”)

Cengage used to be Thomson Learning, a sprawling collection of publishing companies. Some of the firms had traditional textbooks; others had combinations of traditional textbooks and electronic versions. My recollection is that the technical infrastructure of the original Thomson Learning was quite diverse. “Diverse” publishing infrastructures in the same organization add significantly to the costs of doing business. “Diverse” is also a stuck brake on innovation because repurposing content is time consuming and labor intensive. Prior to spinning off Thomson Learning to Apax Partners and Omers Capital Partners, Thomson’s senior management were focusing their considerable talents on cost efficiencies. . I assume that the technical infrastructure issues have been resolved.

Debt can be a burden as this illustration from Shape Home Loans suggests?i Does debt enhance agility or is it a financial play disconnected from structural changes such as those described in my “Gadzooks, It’s MOOCs: The Fuss over Open Source Learning” article?

One item in the Thomson Reuters news release caught my attention:

…the company said it had borrowed $430 million, almost all of its remaining credit facility to ensure its businesses have the cash they need. Stamford, Connecticut-based Cengage has a $1.5 billion term loan that matures next year and a total of $5.3 billion of debt as of Dec. 31.

Several observations:

First, this type of cash crunch in publishing is likely to become more common. I wrote a story for Online Searcher about the impact of online learning. There is also a chorus of “if you are smart, you can skip college” echoing around Kentucky. What if the online learning and the “you don’t have to go to college” blend? Companies depending upon the traditional purchasing patterns in education may find that new revenues are not sufficient to keep up with old revenue losses.

Second, the spillover from a Cengage-type of problem will have cascading effects. Examples which come to mind are revenues flowing to such organizations like Ebsco Electronic Publishing, ProQuest, and Wolters Kluwer. These companies are in the education food chain. If Cengage flu becomes contagious, these firms will face some additional financial challenges.

Third, the authors who provide content to the textbook giants have to be paid. With the shift to online courses, some of these authors may take their “fame” and their content and go a new direction. It is now possible for some textbook superstar authors to try to become celebrities. If Google needs knowledge, the company just hires the superstar. Won’t the same approach become possible in the online learning space? Maybe an existing textbook company will corner this market? I am not sure  traditional textbook companies have the agility necessary to pull off a slam dunk.

Fourth, the online services like Thomson Reuters’ WestLaw and Reed Elsevier’s LexisNexis may also feel the impact of a shift. On one hand, these systems could gain new content from disaffected textbook publishers and, therefore, more revenue pulling information. On the other hand, traditional online services have been caught flatfooted by the surge in online educational content and may be too late to ride the new revenue train.

Net net: Is it time for customers of Cengage to disengage? A larger question is, “Will the professional publishing and professional online services be able to adjust to yet another sapping of their life blood?” Changes are coming. Many of these shifts will not be gentle, kind, or slow I fear.

Stephen E Arnold, March 25,2013

Black Duck Developers Weigh in on Open Source

March 25, 2013

Black Duck has made a name for itself helping organizations build better software faster by choosing the correct open source components. Two of their leading developers, Dave Gruber and Peter Vescuso, give their opinions on the rapid growth of the open source market in the Linux Insider article, “Black Duck’s Dave Gruber and Peter Vescuso: Open Source Is Maturing.”

After a series of questions and answers, the duo responds to a question about how the rise of open source is impacting the market at large:

“By that I mean large organizations and large businesses are looking at what is happening in the open source industry around them. They are seeing all the innovation around cloud and mobile and the speed at which new projects are being created. They are looking at all of the innovation going on with these rookie projects and are saying that activity is compelling. They are wishing to bring those methods to their own companies internally.”

So, if organizations are exploring ways to integrate open source innovation into their own methods and processes, what is the most effective way to do so? For many companies it will mean adopting an open source value-added software solution, like LucidWorks. LucidWorks builds upon Apache Lucene and Solr, enabling organizations to benefit from the agility and efficiency of open source without having to build their own customized solution. Out-of-the-box, LucidWorks responds to the open source needs of the current market.

Emily Rae Aldridge, March 25, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

It Is Movie Search Time

March 25, 2013

Google, Bing, and DuckDuckGo are the preliminary search engines users turn to for locating information. One of the problems, even with advanced search options, is sifting through the search results. Any search expert will tell you if the desired information is not in the first or second page of results, users move on. Does this call for a specialization in search engines? It just might for a subject as all encompassing as movies. MoreFlicks searches through the popular video streaming Web sites:Hulu, Netflix, Vudu, Fox, Crackel, and BBC iPlayer for movies and TV Shows.

It takes a page out of Google’s book by displaying basic facts about a movie or show: summary, genre, release date along with where it can be viewed online. Search results can be sorted by genre, most popular, new arrivals, and what is soon expiring. It will come in hand when you are searching for an obscure title. Downsides are that it only browses through legal channels. YouTube has been given the boot for these results. MoreFlicks is a niche search engine, possibly the lovechild of Google and IMDB, but how long it stays depends on content relevance or until Google snaps it up. Zeus eating Athena anyone?

Whitney Grace, March 25, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

IBM Goes Big On Big Data

March 25, 2013

IBM is probably the biggest name in big data when it comes to commercial, propriety vendors. The company is an established, household name and they continue to make great technological advances, the most notable being Watson the AI. When visiting the company’s Web site, IBM has gone above and beyond to set themselves apart from other Big Data companies. Take a look at the Smarter Analytics page they created. IBM is stressing the analytical aspect of big data and how their solutions cover software, research, hardware, and services:

 

“Big data is more than a matter of size; it is a way to uncover insights and opportunities from new and emerging internal and external sources of data and content. IBM’s big data capabilities include an enterprise-class big data platform, predictive and content analytics, and decision management to give your organization a competitive edge. IBM’s capabilities and signature solutions are designed to complement your existing information, analytics and content management infrastructure, so you can get started quickly and achieve game-changing results.”

 

Unlike other companies who spout what they have done, IBM provides video evidence documenting how big data has changed/helped companies. Several of those who benefited were T-Mobile, Vestas Wind Systems, NYSE Euronext, and Fiserv. IBM knows how to market itself as a viable big data solution. Unlike other companies it has the multi-generation appeal because of its longevity and new advances.

 

Whitney Grace, March 25, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta