New Updates to Solr and Lucene

March 18, 2013

Apache Solr and Lucene are notable for good maintenance and frequent updates. These updates are one of the many reasons why Solr and Lucene are considered top choices in open source software. Another upgrade has just been announced in the default codec update 4.2. Read all the details in the article, “Apache Solr and Lucene 4.2 Update Default Codec Again.”

The article sums up some of the improvements:

“The Solr search platform now has a REST API which allows developers to read the schema; support for writing the schema is coming. DocValues are now integrated with Solr and as they allow faster loading and can use different compression algorithms, the integration offers a wide range of feature possibilities and performance benefits. Collections now support aliasing allowing for reindexing and swapping while in production, and the Collections API has now been improved to make it easier to ‘see how things turned out.’ It is also now possible to interact with a collection in a node even if it doesn’t have a replica on that node.”

The full details of the changes can be read in the Lucene 4.2 and Solr 4.2 release notes. When foundational software is improved, the value-added software attached to it gets an automatic upgrade as well. This is the case with LucidWorks and their suite of search offerings built upon the open source strength of Lucene and Solr. Interestingly, LucidWorks has been criticized for not having a RESTful API, but with the newest upgrade to Solr, the claim is no longer valid. LucidWorks will no doubt remain on top.

Emily Rae Aldridge, March 18, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Written by Stephen E. Arnold · Filed Under News, Open source, Search | 1 Comment

Search Engine Optimization: Scumbags? Amazing

March 11, 2013

If you are paying upwards of $2,000 a month or more for search engine optimization, you will want to read “Sick of SEO Scumbags” and the comments to the post. Web site traffic is important. What is tough to swallow is that most of the billions of Web sites get little traffic. You can check out if your Web site rates with a quick check of Compete.com’s free analytics look up. Sign up at www.compete.com.

The search engine optimization experts practice the dark art of SEO. The idea is that these experts have methods which can trick Bing, Google, Yandex, or any other free Web indexing service. Once fooled, a query for a topic will return the client’s Web site at the top of the results list for a query. Magic. Almost.

The problem is that Bing and the other outfits want to sell ads based on the content on a Web page. If the content or other element is misleading, the ads won’t hit their target. Most ads do miss but in today’s landscape a slight improvement in targeting may be enough to keep the ad revenue flowing. Where else can an advertiser go to get traffic? SEO wizards know that paying for traffic works. Some SEO actions don’t work.

The post asserts:

Recently, a well known flower shop lost both the rankings for the brand name and the keyword ‘flowers’, the SEO agency involved are a good agency and this post isn’t about the tactics used but large companies like Interflora have years of brand building, offline campaigns, TV advertising, word of mouth, mailing lists, newspapers and shops to fill the gap incase any one vertical (search) drops, they can and will still survive, there will be a dip in some profit sheet somewhere and someone might lose their job, but the company doesn’t fold. If you do that with a small ‘mom and pop’ shop (Dom’s Flowers) and they get banned or lose the rankings for ‘flower shop east leeds’, Game Over, most of the smaller clients I see depend on Google traffic…

Where’s the scumbaggery?

I’ve seen companies who have built a site for a client, no index it then charge for an SEO package to ‘sort out the rankings’, Domain change audits with no 301?s, I’ve seen agencies charge £10k for ‘keyword research’ which is copied and pasted straight from Google Adwords and more than a few times I’ve seen companies charge a thousands per month for an IBP report. We all make mistakes, we all have clients that want to be #1 for $crazy keyword, but as the search team involved with the campaign, you have to set realistic expectations and know the risk when placing links and making site changes, those that don’t, that just take the money and hammer with shitty links or try and scam the client to extract more money, those are the SEO Scumbags and they are giving the search industry a really bad name.

Does SEO work? The answer is that what actions the SEO expert takes may help or hurt a client. What annoys me is that the word “search” gets dragged into a traffic and click related exercise. Using the word “search” to refer for methods of buying traffic and for actions such as finding information in an organization’s archive muddies the water.

When it comes to scumbaggery or to a more serious activity such as enterprise search, clarity is useful. If a Web site wants traffic, man up and buy it from Google. If you want to build a brand or position a person, use content. No tricks required.

Stephen E Arnold, March 11, 2013

Written by Stephen E. Arnold · Filed Under Marketing, News, SEO | 1 Comment

Autonomy: An Anomaly or Bellwether for Search?

February 24, 2013

I don’t pay much attention to the corporate calisthenics at Hewlett Packard. I noted the chatter about layoffs at Autonomy. (See “Layoffs, Hiring to Come at HP’s Autonomy Unit.”) I chuckled at the notion that HP’s management team would write off billions and then try to sell Autonomy. (See HP: “Jefferies Analyst Says CEO Whitman Unlikely To Sell Autonomy, EDS.”)

Allegedly HP is in profit making mode, has its act together, and now sees Android as a way into the booming mobile market. Too late? No, never to late for a giant company which has tilled the ground for generations of MBA students to analyze and discuss. Few companies are quite the case study breeder reactor which HP has become.

The larger question is, “Is Autonomy an anomaly or a bellwether for search, analytics, and content processing?”

A happy quack to this outstanding surprise image from Jokeroo. See http://www.jokeroo.com/pictures/funny/very-unpleasant-surprise.html

Let’s look at the upside. Some folks at HP obviously perceived Autonomy’s technology, industry stature, and customer list as having value. The dollar amount assigned to the “value” is a subject of discussion. The point is that search such looked tempting and too good to pass up. HP talked to wizards, gurus, and poobahs. The information added up to $10 or $11 billion for the deal. The number, after the oddball write off, should have been closer to $2 billion. One cannot argue with the powerful enervating effect talk about the payoff from search and line extensions causes among “rational” managers.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Financial | Comments Off on Autonomy: An Anomaly or Bellwether for Search?

Another Palantir Push: But Little Hard Financial Data. Why Not?

February 23, 2013

I was reading about the TED Conference’s yo-yo presentation. My eye drifted across an expanse of cellulose and landed on “The Humane Way to Crack Terrorists.” (This link will go dead so be aware that you may have to pay to read the item online.) The subtitle was one of those News Corp. Google things: “Big data may make enhanced interrogation obsolete.” The source? Some minor blog from America’s hinterland, Silicon Valley? Nope. The Wall Street Journal, February 23, 2013, page C 12.

What’s the subject – really? The answer, in my opinion, Palantir. If you monitor the flagship, traditional media, Palantir has a solid track record of getting written about in print magazines. I suppose that the folks who have pumped about $150 million into the “big data” company read those magazines and the Wall Street Journal type publications each day. I know I do, and I am an addled goose in rural Kentucky, the high tech nerve center of the new industrial revolution. After February 28, 2013, I am not sure about the economy, however.

Here’s the passage I noted:

There’s a tellingly brief passage in “The Finish: The Killing of Osama bin Laden” by Mark Bowden. “The hunt for bin Laden and others eventually drew on an unfathomably rich database,” he writes. “Sifting through it required software capable of ranging deep and fast and with keen discernment—a problem the government itself proved less effective at solving than were teams of young software engineers in Silicon Valley. A startup called Palantir, for instance, came up with a program that elegantly accomplished what TIA [Terrorism Information Awareness program, set up in 2002] had set out to do.” When I met the chief executive and co-founder of Palantir, Alex Karp, recently, he was straightforward: “It is my personal belief that flawless data integration at any kind of scale, with a rigorous access control model, allows analysts to perform operations that are only intrusive on the data. They are not intrusive on human beings.” Obviously, Palantir doesn’t comment on classified work. But its technological phalanx—processing countless leads, from flight manifests to tapped phone calls, into one resource for people to interpret—is known to have been key in locating bin Laden. The company, founded in 2004, has large contracts across the intelligence community and is enterprise-wide at the FBI. Its first client was the CIA.

Nifty stuff. Palantir has high profile clients like intelligence and law enforcement outfits. But where is a hedge fund or a consumer products company? Allegedly the fancy math technology can work wonders. The implication is that outfits like Digital Reasoning, Recorded Future, and even Tibco are not in Palantir’s league. Oh, really? What about outfits like IBM and Oracle and SAS? Nah. Palantir seems to be where the good stuff happens in the context of this Wall Street Journal article.

In my view, the write up triggered several notes on my ubiquitous 4×6 paper note cards, just like the ones I used in high school debate competitions:

First, what about that legal dust up with i2 Group? Here’s a link to refresh one’s memory. I recall that there was also some disagreement, a few real media stories, and then a settlement regarding sector leader i2. Note: I did some work years ago for this out, which is now owned by IBM. Oh, and after the settlement silence. Just what was that legal dispute about anyway? The Wall Street Journal story does not touch on that obviously trivial issue related to the legal matter. Why not? The space in the newspaper was probably needed to cover the yo-yo guy.

Second, can software emulate the motion picture approach to reality? In my experience, numerical recipes can be useful, but they can also provide some points which are subject to contention. A recent example is the gentleman’s disagreement about an electric vehicle. Data, analyses, and interpretations—muddled. Not like the motion pictures’ tidiness and quite final end point. “The end” solves a lot of fictional problems. Life is less clear, a lot less clear in my experience.

Third, how is Palantir doing as a business? After all, the story ran in the Wall Street Journal, which is about business. I appreciate the references to a motion picture, but I am curious about how Palantir is doing on its march to generate a billion or more in revenues. At some point, the investors are going to look at the money pumped into Palantir, the time spent developing the magical technology which warrants metaphorical juxtaposition to Hollywood outputs, and the profitability of the company’s sales. Why doesn’t the Wall Street Journal do the business thing? Revenue, commercial customers, and case studies which do not flaunt words which Bing and Google love to consume in their indexing systems?

It is Saturday, and I suppose I there are lots of 20 somethings working at 0900 Eastern as I write this. They will fill the gap. I will have to wait. I wonder if the predictive algorithms from Palantir can tell me how long before hard facts become available?

One final question: If this Palantir type of system worked, why aren’t the firms in this Palantir-type software sector dominating in financial services, marketing, and consumer products? I wonder if the reason is that fancy math generates high expectations and then creates some situations in which reality does not work just like a cinema thriller?

Stephen E Arnold, February 23, 2013

Written by Stephen E. Arnold · Filed Under Analytics, News, Text analytics, Text processing | 1 Comment

Autonomy Improves its eDiscovery Software

February 15, 2013

HP is on the move, leveraging their Autonomy investment with new features, we learn in the company’s announcement, “HP Autonomy Strengthens eDiscovery Solution with New Information Governance Capabilities.”

The crucial early case assessment (ECA) phase occurs at the onset of a legal procedure, when large volumes of data must be assessed quickly, thoroughly, and carefully. The press release informs us:

“Autonomy has extended its Meaning Based Coding (MBC) capability to its ECA module, further enhancing its in-depth eDiscovery analysis capabilities. Autonomy’s MBC capabilities enable organizations to automate analysis based on the Autonomy Intelligent Data Operating Layer (IDOL), which quickly categorizes data by concepts, ideas and patterns in information. Unlike traditional predictive coding technologies, MBC classifications are carried through to the review and production phase without new processing or indexing. As a result, Autonomy ECA can perform an analysis of the data faster, more accurately and at a lower cost.”

Also new is the software’s integration with HP’s Application Information Optimizer, which automates data migration and retirement. Furthermore, Autonomy has added native discovery functionality to the on-premise version of their archiving solution, Autonomy Consolidated Archive. They say these improvements streamline the eDiscovery process, saving money, time, and frustration.

Autonomy, founded in 1996, offers solutions that use IDOL to tame mind-boggling amounts of unstructured data. The technology grew from research originally performed at Cambridge University, and now serves prominent public and private organizations around the world. HP acquired Autonomy in 2011.

Cynthia Murrell, February 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Analytics, EDiscovery, News | 1 Comment

Sinequa France: Update 2013

February 14, 2013

My research team was winnowing our archive of information about European search vendors. Since Martin White’s article for eContent in 2011, a number of changes have swept through the search and content processing sector. Some changes were significant; for example, HP’s stunning acquisition of Autonomy. Others were more modest; for example, the steady progress of such companies as Sinequa and Spotter, among others.

The European technical grip on search is getting stronger. Google is the dominant player in Web search. But in enterprise content processing, some European firms are moving more rapidly than their North American or Pacific Rim counterparts.

The Sinequa tag cloud. See http://www.sinequa.com/en/page/solutions/category-1.aspx

One interesting example is Sinequa, based in Paris. The company, like other French technology firms, has a staff of capable engineers and managers. However, unlike some other companies, Sinequa has continued to establish a track record as a company innovating in technology and capturing some important accounts; for example, Siemens, the German industrial powerhouse.

Sinequa’s approach is to emphasize that enterprise search has moved to unified information access. A number of companies make similar claims. Sinequa has established that its technology can deliver the type of one-stop access to structured and unstructured content that almost every vendor claims to deliver. You can get a useful overview of the architecture of the Sinequa platform at http://www.sinequa.com/en/page/product/product.aspx.

A relatively recent addition to the Sinequa.com Web site are case analysis videos. I find case examples extremely useful. The presentation of this type of information in rich media format makes it easier for me to get a sense of the value of the solution a vendor delivers. I found the Mercer video particularly interesting. You can find these testimonials at http://www.sinequa.com/en/page/clients/clients-video.aspx.

The trajectory of European search, content processing, and analytics vendors is difficult to plot in today’s uncertain economic climate. Sinequa warrants a close look for organizations seeking an integrated approach to its content assets. For more information about Sinequa’s current activities, tap into the firm’s blog at http://blog.sinequa.com/

Stephen E Arnold, February 14, 2013

Sponsored by EMRxNow, the information service which tracks automated indexing of electronic medical records

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search, Text analytics, Text processing | Comments Off on Sinequa France: Update 2013

Truth Teller Recognizes Fact Over Fiction Mostly

February 12, 2013

Cory Haik’s “Debuting Truth Teller from the Washington Post; Real-time lie detection service at your service (not quite yet)” is a look at human nature at its finest…and how we can overcome our human gullibility by utilizing technology.

“In August 2011, Michele Bachmann held a small rally in the parking lot of a sports bar in Indianola, Iowa with a few dozen people. Over the course of the event, Bachman, like many politicians repeatedly misled her audience. The Post’s National Political Editor, Steven Ginsberg, was at the event and detected a problem: No one attending seemed to realize they were being misled…”

This led Ginsberg and The Post to seek funding from the Knight Foundation’s Prototype Fund in order to build an application that will fact check (mainly politicians) in real time (or as close as they can get). Truth Teller combines many well known technologies such as Microsoft Audio Video Indexing Service (MAVIS) to aid in its endeavor. The result is been less than perfect but it’s getting there.

“Facts themselves are increasingly under attack and falsehoods can easily and instantly find their way to a mass audience. In fact, many are designed to.”

While in theory the application sounds like it could be a cure for a myriad of problems in reality, can this really be done in a way that will alleviate the spread or creation of false facts? Or is it just an attempt at swaying the publics gullibility in another direction?

Leslie Radcliff, February 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under News, Technology | Comments Off on Truth Teller Recognizes Fact Over Fiction Mostly

Apache Lucene and Solr New Codec

January 30, 2013

Apache Lucene and Solr have announced the new release of version 4.1. Improvements to Solr’s request parsing and support of Internet Explorer are just a few of the new features available. Read about all of the new features and upgrades in The H Open article, “Apache Lucene and Solr Update with New Default Codec.”

The article begins:

“The Apache Lucene project has announced Lucene and Solr 4.1, the latest updates to the Java-based text search library and search platform built around it. Lucene 4.1 has a new default codec “Lucene41Codec” which is based on a previously experimental “Block” indexing format. The new codec includes optimisations around pulsing (where a term only appears in one document) and efficient compressed stored fields to help keep data within the bounds of I/O cache.”

Lucene and Solr serve as the basis for many strong enterprise products. LucidWorks is one company that builds its solutions atop Lucene and Solr, ensuring that they are harnessing the best and most current open source advancements. Check out LucidWorks Big Data and/or LucidWorks Search – both are sure to get even better, benefiting from the improvements in Lucene and Solr’s new codec.

Emily Rae Aldridge, January 30, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under News, Open source, Search | Comments Off on Apache Lucene and Solr New Codec

Quote to Note: Craziness about Facebook Search

January 29, 2013

Here’s a quote to note. I don’t want to lose this puppy. I spotted it in the dead tree edition of the New York Times. The location of this notable phrase is the business section, page B 7. The story containing the quote is “Facebook’s Search Had to Go Beyond Robospeak.” The story explains the wonderfulness of Facebook’s beta search system. We love Facebook search. How could the company possibly improve on a graph surfing system which blocks outfits like Yandex from indexing content. No way. Anyway, here’s the quote:

Letting users talk with a computer on their own terms.

Oh, baby. Do I love this type of insightful comment about search and retrieval. I was not aware that I was able to talk with Facebook, but what do I know. Even better I live the idea of doing the talking on my own terms.

How interesting is this statement about letting users talk with a computer? Beyond interesting. The statement ventures into the fantasyland of every person who watched and confused Star Trek, Star Wars, and Mary had a little lamb.

A keeper.

Stephen E Arnold, January 29, 2013

Check out our sponsor Dumante.com

Written by Stephen E. Arnold · Filed Under News, Quotation, Search | Comments Off on Quote to Note: Craziness about Facebook Search

List of Significant Open Source Programs Neglects Search Engines

January 28, 2013

Zdnet’s recent article focusing on listing, “The 10 Oldest Significant Open Source Programs,” still in popular usage today becomes redundant and neglects to mention other, more relevant projects. Open source software and freeware projects have been influencing software development since the early days of computers.

According to the article:

“Both concepts were actually used long before proprietary software showed up. As Richard M. Stallman, (rms) free software’s founder noted, ‘When I started working at the MIT Artificial Intelligence Lab in 1971, I became part of a software-sharing community that had existed for many years. Sharing of software was not limited to our particular community; it is as old as computers […]'”

Linux has certainly had incredible success as the foundation for the internet and the most ported operating system in the world, running on everything from Android devices to super computers. Python has also proven its impact by becoming the fastest growing open source programming language.

While the article goes on to list several other programming languages and another operating system, I cannot help but notice the lack of open source search engine and indexing software. Lucene and Solr have been around since 1999 and 2004, respectively. These projects merged in March 2010, and have just received a robust update. Not only are these programs currently still in use, but they are making strides towards solving the search problems that plague big enterprise.

Michael Cole, January 28, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Written by Stephen E. Arnold · Filed Under Enterprise, Indexing, News, Open source | Comments Off on List of Significant Open Source Programs Neglects Search Engines

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

New Updates to Solr and Lucene

Search Engine Optimization: Scumbags? Amazing

Autonomy: An Anomaly or Bellwether for Search?

Another Palantir Push: But Little Hard Financial Data. Why Not?

Autonomy Improves its eDiscovery Software

Sinequa France: Update 2013

Truth Teller Recognizes Fact Over Fiction Mostly

Apache Lucene and Solr New Codec

Quote to Note: Craziness about Facebook Search

List of Significant Open Source Programs Neglects Search Engines

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta