Apache Lucene and Solr New Codec

January 30, 2013

Apache Lucene and Solr have announced the new release of version 4.1. Improvements to Solr’s request parsing and support of Internet Explorer are just a few of the new features available. Read about all of the new features and upgrades in The H Open article, “Apache Lucene and Solr Update with New Default Codec.”

The article begins:

“The Apache Lucene project has announced Lucene and Solr 4.1, the latest updates to the Java-based text search library and search platform built around it. Lucene 4.1 has a new default codec “Lucene41Codec” which is based on a previously experimental “Block” indexing format. The new codec includes optimisations around pulsing (where a term only appears in one document) and efficient compressed stored fields to help keep data within the bounds of I/O cache.”

Lucene and Solr serve as the basis for many strong enterprise products. LucidWorks is one company that builds its solutions atop Lucene and Solr, ensuring that they are harnessing the best and most current open source advancements. Check out LucidWorks Big Data and/or LucidWorks Search – both are sure to get even better, benefiting from the improvements in Lucene and Solr’s new codec.

Emily Rae Aldridge, January 30, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Big Data Hailed as Triumphant

January 30, 2013

We’ve tripped over more big-data cheerleading, and we are ready to say, “enough already.” The timpark.io blog trumpets, “Data Trumps Everything.” Oh, really?

Mr. Park uses the example of the modern supermarket to illustrate his assertion: the use of big-data analysis has eclipsed human experience and intuition. While information technology was adopted to assist the seasoned manager with time-consuming calculations, the write-up asserts that big data has now taken over. Using grocery-receipt data, software now analyzes a myriad of factors, builds sophisticated models, and directs in-store humans in order to maximize profits. Park notes:

“That Halloween expansion of candy?   That wasn’t a guess – the supermarket knows down to a matter of hours of when to roll that out.   This is an obvious example, but a data scientist at one major retailer confided to me that they have over 550 such rotations that happen in a year to capture ebbs and flows in certain products.  Some of these are obvious, like Halloween candy or Valentine’s Day cards, that any human manager could have predicted — perhaps not with the accuracy of the data driven approach — but close enough.   But the vast majority of these are changes that frankly they don’t completely understand, like that having a sale on cereal on Tuesdays results in 17% more profit in breakfast products during 2 week periods where less than 4 sunny days are forecast.”

Park is correct that this is now our grocery-store reality. He is even correct to extrapolate that many other types of business are following suit. However, going on to say that data trumps “everything” is, shall we say, a bit simplistic. Even at large retail chains, humans take ultimate responsibility for decisions, including whether or not to follow the suggestions of that pricey software they chose to buy.

Now, if Watson ever takes over as CEO at IBM, that will be a different matter.

Cynthia Murrell, January 30, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Graph Search Makes Facebook Rival Google

January 30, 2013

Facebook’s search application has never been very strong. Yandex’s Wonder application has urged Facebook to bump up its search development and launch the new Graph Search. Steve Cheny’s blog takes an in depth look at the new Graph Search in his post: “Graph Search’s Dirty Promise And The Con Of The Facebook ‘Like.’” Graph Search is supposed to compete with Google and allow users to search all of the content on their social networks. Cheny says that Graph Search is much weaker than Facebook wants to admit and most of the data it searches is outdated.

Cheny explains that Facebook has convinced companies that they need to buy fans, meaning “likes” on Facebook. Facebook’s users are not its customers, rather these companies are and they have spent 50% of their advertising budget on Facebook campaigns. All of this produces a lot of data and connections, but Cheny argues that it will not meet users’ real needs.

“The truth is Graph Search deserves the exact disclaimer FB gave it… it’s a beta product. Through time, iteration, and effort it can and will be a useful tool for FB power users who are well connected, to find people and to sift through memories. But the fact is we’re living in a web where services are unbundling, and social is unbundling too. You simply can’t roll up recommendations for people, places, and interests into a service that’s one size fits all. “

Of course Graph Search is a beta. It will not decide what you do, only try to influence your decision. Facebook have you failed in search?

Whitney Grace, January 30, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Facebook Changes Privacy Policy Again

January 30, 2013

In light of the Facebook’s aim to improve its search and make more money, the social network Web site changed its privacy policy yet again. Quartz has more info on the change in the article, “Ahead Of Graph Search Launch, Facebook Removed The Ability To Opt Out Of Search Results.” Facebook changed the privacy policy due to a new search tool called Graph Search that allows users to search their networks for queries about restaurants, friends’ locations, and likes. It is a big step up for Facebook as its search functions have been extremely limited. Facebook hopes that advertising and use more of its user data.

Users cannot opt out fully from search results, but they can still control who sees their content. The Federal Trade Commission has been keeping tabs on Facebook and its privacy policy and has issued a heavy fine if the social network refuses to follow rules:

“The FTC settlement mandates that Facebook submit to annual privacy audits for 20 years and pay $16,000 per day for any violations. It also requires Facebook to “obtain the user’s affirmative express consent” when adding a feature that “materially exceeds the restrictions imposed by a user’s privacy setting.” The changes to Facebook’s privacy policy in December may have given Facebook clearance to debut Graph Search, although for now, at least, the company is also asking users to sign up for the feature.”

Facebook makes it hard to keep your information private, so always remember to watch what you post. It will come up in search when you least expect it.

Whitney Grace, January 30, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Github Excitement: We Love Overhyped Open Source Vendors

January 29, 2013

I spotted an interesting article in Open Search News. The story which caught my attention is “Github Search Exposes Passwords Then Crashes.”  The article, which I assume is spot on, asserts:

Github is a powerful and popular collaborative tool for open source and private development projects.  It is a favorite repository of open source developers.  Last week a new search infrastructure was unveiled to enable search for specific code within the millions of individual repositories.  Elasticsearch powered the search infrastructure.  However, the search infrastructure quickly revealed lots of private date including passwords and private ssh keys.

We have heard about a number of similar issues. Whether it is the difficulty of reaching one vendor’s technical team in Russia or the elusiveness of the Danish library’s technical team, open source search has some history.

On the other hand, when we wrote our analysis of open source search for that outstanding, reliable, and up front consulting firm IDC, we noted that one or two vendors stood out from the crowd.

Which vendor came out on top? Well, it is not the organization who seems to have contributed to the password situation. To find out whom we found the best, you will have to pay IDC a mere $3,500 to get the inside scoop. A deal at twice the price. I mean such value.

Hint: Beware the one-many band with high hopes and $10 million.

Stephen E Arnold, January 29, 2013

Sponsored by Gourmetdeville.com

Quote to Note: Craziness about Facebook Search

January 29, 2013

Here’s a quote to note. I don’t want to lose this puppy. I spotted it in the dead tree edition of the New York Times. The location of this notable phrase is the business section, page B 7. The story containing the quote is “Facebook’s Search Had to Go Beyond Robospeak.” The story explains the wonderfulness of Facebook’s beta search system. We love Facebook search. How could the company possibly improve on a graph surfing system which blocks outfits like Yandex from indexing content. No way. Anyway, here’s the quote:

Letting users talk with a computer on their own terms.

Oh, baby. Do I love this type of insightful comment about search and retrieval. I was not aware that I was able to talk with Facebook, but what do I know. Even better I live the idea of doing the talking on my own terms.

How interesting is this statement about letting users talk with a computer? Beyond interesting. The statement ventures into the fantasyland of every person who watched and confused Star Trek, Star Wars, and Mary had a little lamb.

A keeper.

Stephen E Arnold, January 29, 2013

Check out our sponsor Dumante.com

Thoughts about Commercial Databases: 2013

January 29, 2013

After the dress rehearsal for my weaponized information webinar, a couple of librarians and I were talking about the commercial database business. I narrowed the focus to the commercial outfits selling primary and secondary information to libraries and other professionals; namely, to the legal and health care sectors.

In a nutshell, the digital future does not look too bright for companies such as:

  • Ebsco Electronic Publishing (everything but the kitchen sink coverage)
  • Elsevier (scientific and technical with Fast Search in its background)
  • ProQuest (everything but the kitchen sink coverage plus Dialog)
  • Thomson Reuters (multiple disciplines, including financial real time info)
  • Wolters Kluwer (mostly legal and medical and a truckload of individual brands)

image

I just reread “Why Acquisitions Fail: The Five Main Factors by Pearson Education. This outfit has a long and storied past. The irony of Pearson Education explaining the problems of making an acquisition work is interesting but not germane to the main points in the write up. the fact that this item was available to me without charge via the Internet is amusing to me as well. Here’s what the Pearson analyst suggests about the causes of failure:

Survey after survey has proclaimed that most acquisitions fail. Denzil Rankine’s Executive Briefing on Why Acquisitions Fail (FT Prentice Hall) examines why. There are five key factors, which we will examine below:

  1. Flawed business logic
  2. Flawed understanding of the new business
  3. Flawed deal management
  4. Flawed integration management
  5. Flawed corporate development

No argument from me. The business model for these firms has been built on selling “must have” information to markets who need the information to do their job. The reason for the stress on this group of companies is that the traditional customers are strapped for cash or have lower cost alternatives.

If one of these outfits buys a company, the likelihood that the acquisition will be a home run revenue success is low. These five companies are bottom-line oriented, so the acquisitions will have to perform. The idea of massive investment to realize the promise of the purchase is not in the game plan.

So big traditional commercial database companies have to find a way to work around the Pearson Education hurdles. Let me consider some of the options available to the Ebscos, Elseviers, ProQuests, Thomsons, and Wolters Kluwers of the world. (Yes, there are oligopolies in a number of other countries, not just the US and Western Europe.)

The Hail, Mary Deal

This is the option which makes investment bankers’ and deal brokers’ hearts go pitty patter. We know how that approach works.

Buy One Another

The idea is that no other outfit wants to buy commercial database companies. Ergo: These outfits buy one another in some combination. Good for the investment bankers but long term, the customers may not be able to cope with ever increasing prices. Librarians, lawyers, and accountants are not exactly in a GEICO made of money mode.

The Microsoft Dell Variant

The idea is that a third party like Google buys one or more commercial database companies and monetizes the content with ads. (I would lobby for this if I were attached to a giant money machine like the Google.)

Fire Sale

I think that Thomson Reuters’ effort to get out of the health fraud business makes clear that the price offered kills the deals. Nevertheless, some of the commercial database publishers may be forced to chop off fingers and toes to keep the core alive. Highly probable path opine I.

Raise Prices and Innovate from Within

This option keeps the Board of Directors engaged. The reality is that such innovation goes nowhere. Ah, I am looking forward to annoyed vice presidents asserting, “I am innovative. We do innovate.” Okay, okay.

Net net?

Big changes are coming for commercial database producers, access to curated content, and the quality of the commercial information. Lawyers are looking to cut costs. No good for Lexis and West. Librarians are under severe financial pressure. Accountants? Accountants don’t want to spend their own money.

Looks like the future is moving in directions different from what these traditional, commercial database producers are going. I suppose after a couple of decades of evolution, the arrival of the End of Times is tough to accept.

Disagree? Agree? Surprise me. Keep in mind that I don’t have a stake in these companies and find myself baffled by the management challenges each has created for itself.

Stephen E Arnold, January 29, 2013

Sponsored by Dumante.com

Big Data Solutions Put the Information to Work and Enable Insights to Spread Across the Enterprise

January 29, 2013

Both skill and will are needed for a project to come to fruition. Many organizations have determined that deploying technologies to add value to big data would be beneficial at this point in time. Now, they are looking around to find the workforce with the skill to truly glean all the opportunities and insights possible out of big data. Forbes discusses data scientists and the convoluted Hadoop framework in “Combating the Big Data Skills Shortage.”

The article explains that integrating Hadoop with other projects can prove cumbersome but that the IT community has helped to bridge the gaps:

In the most extreme case, it means that traditional Oracle or DB/2 based applications could essentially run on top Hadoop. In more realistic applications, it means that some traditional applications could be migrated to run on Hadoop, as new data sources are integrated with traditional structured databases. New queries could then be created to take advantage of the traditional and the new data sources together to provide new insight and value to the business.

While some frameworks like Hadoop are better geared towards data scientists and analysts that need years of experience trained in this specific technology, there are still other technologies like infrastructure components that have those skills built into them. A big data solution like PolySpot puts Information At Work to lessen the need for hiring data scientists immediately.

Megan Feil, January 29, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search.

Liferay is Not Dead Yet

January 29, 2013

Gartner recognizes Liferay as a leader in portal technology. And yet, the technology is offers some would say is doomed to fail and on its way out. However, Liferay keeps carrying on. See what tricks Liferay has up its proverbial sleeve in The Register article, “Liferay’s Not Dead Yet – But What’s Keeping it Alive?”

The author begins:

“The enterprise portal market should have died years ago . . . the market was being written off almost as soon as it began. More bluntly, in a market that was limping to single-digit growth in 2008, with growth stalling since then, how could an open-source player like Liferay hope to survive, particularly given its penchant for using the company as a vehicle for doing as much social good as company profit? And yet Liferay is steering toward $100m in revenues, with financials that look dramatically better than competitors like Jive Software.”

The article goes on to talk about the humanitarian work that Liferay employees are engaged in, with the support of the company. Liferay appears to be here to stay, but so does many other open source based enterprise solutions. LucidWorks is another enterprise option that is beating the odds; a thriving company built on open source. Explore their LucidWorks Big Data and LucidWorks Search for solutions that could affordably and seamlessly bolster your enterprise.

Emily Rae Aldridge, January 29, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Disney And Its Big Data Plan Is Home Built

January 29, 2013

When most companies aim to take advantage of big data, usually they turn to a commercial company to set them up with a deployment plan and software. Disney, one of the world’s biggest companies, decided to build its own big data initiative in-house and with open source software. Gigaom has all of the details on the Mouse’s plans in the article, “How Disney Built A Big Data Platform On A Startup Budget.” When Arun Jacob, Disney’s director of data solutions, was told to build a big data platform, he knew he needed to make something that would be useful to the entire corporation.

Disney’s platform uses MongoDB, Hadoop, and Cassandra, but while Jacob tried to use as much open source software as possible he did tap into Disney’s large purse and buy commercial software. The project is moving along well, but Jacob had this to say:

“Still, after all the work he put into building Disney’s big data platform, it’s not exactly a process Jacob is hoping to repeat as the platform evolves. The tools for managing big data are getting better, he said, so he still does a build-versus-buy analysis when it’s time to make a change. Building custom tools is fine when you don’t have a choice, but it’s not always wise when buying something could save untold man-hours and headaches.”

Economy is good. Now why does Disney charge thousands for a mouse guide who takes well-heeled customers through the exits to skip the serpentine lines. Oh, to create money to do big data economically. M I C K E Y, see you real soon.

Whitney Grace, January 29, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta