Social Media Strategies
April 9, 2013
Social media is not just for personal use anymore it has expanded into the business world. The Expert System Cogito Blog piece “Understanding the Strategic Value of Social Media Analysis” talks about how many companies are selling themselves short when it comes to using social media.
“I have often said that companies are missing out on the real value of social media analysis. More often than not, even the big players don’t have the processes or models in place to really make use of the data gained from the analysis. As a result, social media analysis has a limited impact on the business, not to mention the budgets assigned to such projects.”
However, despite the usual oversights the author talks about a recent encounter with the head of customer experience at a well-known bank. They were going to discuss the tools they would need to support social media analysis but instead of going through the usual song and dance the manager was actually prepared to discuss exactly what they needed from them. Even more surprising the customer was actually able to provide specific examples of quantitative as well as qualitative data that she wanted to be able to extract from the streams of data. This made it easier to talk about semantics and how it can bring value to their company. Strategies such as focusing on extracting relationships between monitored entities and relieving some of the social media noise through deep analysis and contextualization can help to improve product visibility as well as market trends. The author ends by nothing that they are sure that they haven’t seen the last of their “usual pitch” because many organizations do not have a clear and concise strategy when it comes to social media projects. However, as the trend changes and more and more companies are realizing the importance of social media semantic technology vendors better strike fast and learn how to “grab the bull by its horns.”
April Holmes, April 09, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Pay to Play Content: Now Even the New York Times Knows
April 8, 2013
Mondays usually start in a predictable way. I walk the dogs. I eat a cardiologist-approved breakfast. I find out what my wife has on her list for me to do. But this morning I flipped through the New York Times, environmentally unfriendly version, and burst out laughing.
My wife asked, “What’s so funny?”
I replied, “The New York Times describes pay to play with more crazy synonyms than I thought possible.”
She asked, “And that’s humorous?”
To me it was. Navigate to these two articles. The first is on the front page of Harrod’s Creek edition and Google-crafted this way: “Scientific Articles Accepted (Personal Checks, Too).” The story appears in the April 8, 2013, edition which you will find in the dead tree version. My link points to a short lived version of the file on another newspaper. After a rousing quote “the dark side of open access” the story jumps to section B, page 8.
The second story appears in the business section of the same issue. Its title is “Sponsoring Articles, Not Just Ads. Branded Content on the Web Mingles with Regular Coverage.” The story features a creative graphic showing pencils held in a roll of money. (You remember. Printed money just like the early newspaper moguls collected by the horse drawn cart in the good old days of publishing.)
The point of both articles is that there are people who will pay to get their content published in a form which has some respectability. Academics pay to play in the academic journals. Companies pay to get their ideas published in a wide range of channels. The New York Times mentions Mashable, but there are many other outfits who charge money to run content. My Augmentext operation is in this business too. I suppose I could trot out the names of big publishers who offer college guides with inflated “inclusions” describing the wonders of certain college campuses. The write ups are compelling and once produced money for those who operated these quasi-reference services.
What words does the New York Times use to describe these pay to play operations? Here’s a list of some of the terms from the write up:
- Advertising
- Advertorials
- Branded content
- Campaign
- Content
- Corporate propaganda
- Native advertising
- Pure editorial
- Sponsored content
Here in Harrod’s Creek, we call content someone wants published for money:
- An inclusion
- A “pay to play” story
- POP or Plain old propaganda as defined by Jacques Ellul. If the name does not ring a bell, you can find the information in his decades old study in Propaganda: The Formation of Men’s Attitudes.
The professional publishing sector has been charging academics for page proofs and other services for many years. Now the practice has diffused to conferences. In my view, the use of “pay to play” methods is now part of the atmosphere and has been for decades.
I find it fascinating that the topics are now front page news from the New York Times. Perhaps “real” journalists are learning more about how the information world works.
What troubles me is that none of these questions is addressed:
- Do modern systems identify pay to play content?
- Are automated content processing systems giving equal weight to shaped content and objective content?
- Are the outputs from analytics systems manipulable?
In my proprietary report on this subject, the surprising answer is, “We just process data.”
In short, despite the huff and puff of next generation content processing system cheerleaders, the systems have what William James called “a certain blindness.” In the quest for revenues, many organizations are unwittingly conspiring to deliver information which at best is semantically swizzled and at worst weaponized. Oh, the phrase “weaponized information” does not appear in the New York Times’ stories nor in the gigabytes of words explaining the wonders of next generation analytics. Like the New York Times, the present is too much with us.
Stephen E Arnold, April 8, 2013
Elasticsearch Joins Fog Creek
April 8, 2013
Elasticsearch is trying to expand its reach by partnering with other trendy tech services. It is definitely getting some headlines. The most recent headline is detailed by Market Watch in their article, “Fog Creek Selects Elasticsearch to Search and Analyze Terabytes of Data.”
“Elasticsearch, the company behind the popular real-time search and analytics open source project, today announced that Fog Creek has selected Elasticsearch to provide instant search capabilities within Kiln, its software development product. Kiln is designed to support and simplify development workflow for users searching more than 100,000 source code repositories. Elasticsearch is now a critical ingredient of Kiln, providing instant search for 300,000,000 requests across 40 billion lines of code to improve overall performance, reliability and user experience.”
Elasticsearch is known for collaboration with leading edge products, but it is not without its controversies as well. GitHub recently reached out to Elasticsearch to develop its new search infrastructure, but the service quickly exposed security concerns and then crashed. So when it comes to a search infrastructure that goes beyond trends, trust an industry standard. Do not assume that every search application will be safe enough for the enterprise. For instance, consider LucidWorks. They are built on open source Lucene/Solr, employ one quarter of the Core Committers on that project, and are optimized for the enterprise. Choose industry confidence, not trends.
Emily Rae Aldridge, April 8, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Understanding JSON
April 8, 2013
The Altova Blog piece “Editing, Converting and Generating JSON” provides a helpful guide to using JSON. The use of JSON as a data transport protocol has been on the rise and so has the debate about the advantages of JSON vs. XML. The debate has been waging on but the author actually sums it up fairly well.
“But when you boil it down, there are simply some cases for which JSON is the best choice, and others where XML makes more sense. While you might need to choose between JSON and XML depending on the development task at hand, you don’t have to choose between code editors – XMLSpy supports both technologies and will even convert between the two.”
Altova has extended its intelligent XML editing features to JSON editor in order to make JSON editing as simple as possible. Users who begin editing JSON in text view will get lots of help along the way from XMLSpy thanks in the form of syntax coloring, bracket matching, source folding, entry helper windows, menus and other helpful tools. A one click option on the XMLSpy convert menu makes converting XML to or from JSON quick and easy. The ability to edit but also convert items directly within the XML editor program is extremely useful. JSON lovers will definitely have something to look forward to.
April Holmes, April 08, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Axiell Group Expands
April 8, 2013
The Swedish Axiell Group AB is developing making moves with the acquisition of the Dutch company Adlib Information Systems. Currently Axiell Group develops as well as supplies advanced IT systems and services for clients such as archives, museums, libraries and also schools. The Journalism.co.uk article “The Swedish Axiell Group in global expansion: Acquisition of Adlib Information Systems makes Axiell the largest in Europe” talks about the big merger. This new record-breaking deal will make Axiell one of the five largest players in the global market. Not only will they gain clients in 30 counties, this merger will provide the platform for them to continue to grow internationally which is one of Axiell’s ultimate goals. Joel Sommerfeldt, CEO of Axiell Group AB made the following statement.
“We are convinced that the combined expertise of the two companies will help us to boost our offer towards the museums, archives and specialist libraries sector all over the world, and we regard this as an important part of our strategy.”
Bert Degenhart Drenth, CEO of Adlib and Marijke van der Kwartel, CFO will continue their work with the company and will become a part of the Axiell management team. Bert Degenhart Drenth, CEO Adlib Information Systems had the following to say
“We at Adlib are very proud to have become a part of the Axiell group. I feel that our combined products, markets and geographic spread enables us to take the next step into the future. However, this is not only important to us, but equally important for our customers, who will benefit from a truly European and sustainable supplier for their mission critical systems. Together we can do more: offer fully integrated solutions for Libraries, Museums and Archives on a large scale.”
The Axiell Group is definitely doing big things and from here their future looks bright.
April Holmes, April 08, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
All About Solr
April 8, 2013
Apache Solr has already claimed the role of one of the most popular and sought after search applications currently on the market. The Apache Solr platform uses Lucene to power its indexing and querying abilities. The Eventbrite article “Solr Unleashed SC” which was translated using Google translator gives details about the upcoming Solr Unleashed training class on June 13, 2013 in Brazil.
“Solr Unleashed is a complete training, hands-on, facing the Solr 4, or SolrCloud. The SolrCloud is a complete change of structure of Solr to facilitate installations of Big Data. Allows indexing distributed beyond search distributed, eliminating the need for master-slave configuration.”
The course will be spread out over two 8-hour days. Students will need to bring their own computer and will get the chance to develop a complete application. This application will actually be a real search prototype and students will learn it so that it can potentially be used for future projects. In addition students will also get an official certification of LucidWorks and will be given a digital copy of all the course material. The actual material will be in English but the course will be taught in Portuguese. Semantix, a LucidWorks partner company, will be giving the class. During the class students will not only get an in depth introduction to Solr but they will also get an up close and personal look at the new open source search system Solr 4. It’s great to see Solr growing and transcending to other languages. Looks like regardless of the language, search is where it’s at.
April Holmes, April 08, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
InTrade: A Harbinger of Prediction Woes to Come?
April 7, 2013
Key word search is not to useful when there are trillions of content objects. Clustering trillions of objects is not economically feasible, so the sets are trimmed. Who’s to know? Predictive analytics sounds so darned promising because “real time processing” is cheap, plentiful, and trivial to boot.
What can go wrong with text processing, text analytics, social crowdsourcing data, and the other Lone Ranger silver bullets? How can predictive systems come back and bite a user, an investor, or an employee who loses her job?
I suppose that the article “InTrade Announces $700,000 Cash Shortfall And Risk Of Imminent Liquidation” describes an anomaly. Here’s the key point in my opinion:
..the company has posted the following message on its site, which says that it has discovered a $700,000 cash shortfall that must be rectified immediately in order to avoid liquidation.
InTrade, is or maybe was, a prediction market. The company says:
It’s a market that allows you to make predictions on the outcome of hundreds of real-world events. Stock exchanges find the price of stocks, and futures markets find the price of commodities. Prediction markets find the probability of something happening – a predefined, uncertain future event.
InTrade is more than voting. The company uses a range of methods to answer yes or no. Life should be so simple. The company even posted some Golden Rules to make the system almost foolproof; for example, “If you sell shares you profit if the market value of the shares goes down. Your profit is maximized if the market settled at $0.00.”
Eel bites can be painful due to “alien style jaws.” Investors in some predictive outfits may experience similar bites.
There are many meanings for the word “prediction.” I don’t want to get into a squabble that InTrade is one type of prediction and an outfit like Digital Reasoning or Agilex is another type of prediction. I want to capture several thoughts so I can include them in my text analytics lecture later this month, chance willing, of course:
First, predictions are slippery eels. I once offered predictions to my clients. Now I offer clients. I learned that regardless of methods predictions jump into a murky pool and get lost. Stick your hand in the pool and one can come up with nothing or an eel clamping on the extremity. Ouch.
Second, predictions and various methods and the companies built upon them can simply fail. Why not predict that? I think that getting hoisted by one’s petard is part of life.
Third, InTrade may be one example of what can happen when hyperbole outraces the capabilities of the numerical recipe crowd. Will other companies in the fancy math business suffer similar fates? I don’t know. I won’t predict.
If you are into fancy math, why not plug your retirement nest egg into one of the analytics outfits and let me know how that works out for you. Azure chip consultants, feel free to weigh in and explain to me and my two to three readers how such a clever idea could end up in a pickle of reality.
Stephen E Arnold, April 7, 2013
Business Structures Revealed through New Analysis Technique
April 7, 2013
Now here is an interesting implication of social-graph analysis in business. The MIT Technology Review reports, “Social Networks Reveal Structure (And Weaknesses) of Business.” We’ve known for some time that, through the analysis of connections, social networks can reveal even more about us than is obvious to most users. Now, researchers at Israel’s Ben Gurion University used this concept to derive an impressive amount of information about businesses. The article reveals that the team begins:
“. . . by using a search engine to find the Facebook pages of a number of individuals who work for a specific company.
“Using these individuals as seeds, they then begin crawling the social networks, sometimes jumping from one network to another, looking for other individuals at the same company. These in turn become seeds to find more employees and so on.
“They end up with a basic network of links between employees within the company. It’s then that the fun begins.
“Using standard measures of connectedness, Fire and co then identified people in positions of leadership and by adding in details such as location, mined from the Facebook pages, they reconstructed the international structure of these organisations. They also used community detection algorithms to reconstruct the organisational structure of the company.”
Wow. The researchers used their method on several “well known hi-tech companies” and found startling details. For example, they found a cluster of comparatively disconnected folks at a large organization, and discerned they belonged to an acquired startup that had yet to be well-integrated into the company. This sort of information can be used by companies to monitor themselves, but it could also be used by potential investors (for good or ill for the business, I suppose, depending on what turned up.)
More ominously, competitors could use the information to their advantage. Now that this technology is in the news, many companies will want to prevent such details from emerging, but how? Researcher Michael Fire advises them to “enforce strict policies which control the use of social media by their employees.” Immediately, I might add. And, I suspect that whatever was previously considered a “strict policy” must become even more strict in order to avoid exposure from this technique.
Won’t employees be thrilled?
Cynthia Murrell, April 07, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
NLP: Do Not Look at Results. Look at Pictures
April 6, 2013
One of my two or three readers sent me a link to a LinkedIn post in the Information Access and Search Professionals section of the job hunting and consultant networking service. LinkedIn owns Slideshare (a hosting service for those who are comfortable communicating with presentations) and Pulse (an information aggregation service which plays the role of a selective dissemination of information service via a jazzy interface).
The posting which the reader wanted me to read was “How Natural Language Processing Will Change E Commerce Search Forever.” Now that is a bold statement. Most of the search systems we have tested feature facets, prediction, personalization, hit boosting for specials and deals, and near real time inventory updating.
The company posting the information put a version of the LinkedIn information on the Web at Inbenta.
The point of the information is to suggest that Inbenta can deliver more functionality which is backed by what is called “search to buy conversions.” In today’s economy, that’s catnip to many ecommerce site owners who—I presume—use Endeca, Exalead, SLI, and EasyAsk, among others.
I am okay with a vendor like Inbenta or any of the analytics hustlers asserting that one type of cheese is better than another. In France alone, there are more than 200 varieties and each has a “best”. When it comes to search, there is no easy way to do a tasting unless I can get my hands on the fungible Chevrotin.
Search, like cheese, has to be experienced, not talked about. A happy nibble to Alpes gourmet at http://www.alpesgourmet.com/fromage-savoie-vercors/1008.php
In the case of this Inbenta demonstration, I am enjoined to look at two sets of results from a the Grainger.com site. The problem is I cannot read the screenshots. I am not able to determine if the present Grainer.com site is the one used for the “before” and “after” examples.
Next I am asked to look at queries from PCMall.com. Again, I could not read the screenshots. The write up says:
Again, the actual details of the search results are not important; just pay attention that both are very different. But in both cases, wasn’t what we searched basically the same thing? Why are the results so different?
The same approach was used to demonstrate that Amazon’s ecommerce search is doing some interesting things. Amazon is working on search at this time, and I think the company realizes that its system for ecommerce and for the hosted service leaves something out of the cookie recipe.
My view is that if a vendor wants to call attention to differences, perhaps these simple guidelines would eliminate the confusion and frustration I experience when I try to figure out what is going on, what is good and bad, and how the outputs differ:
First, provide a link to each of the systems so I can run the queries and look at the results myself. I did not buy into the Watson Jeopardy promotion because in television, magic takes place in some editing studios. Screenshots which I cannot read nor replicate open the door to similar suspicions.
Second, to communicate the “fix” I need more than an empty data table. A list of options does not help me. We continue to struggle with systems which describe a “to be” future yet cannot deliver a “here and now” result. I have a long and winding call with an analytics vendor in Nashville, Tennessee which follows a similar, abstract path in explaining what the company’s technology does. If one cannot show functionality, I don’t have time to listen to science fiction.
Third, the listing of high profile sites is useful for search engine optimization, but not for making crystal clear the whys and wherefores of a content processing system. Specific information is needed, please.
To wrap up, let me quote from the Inbenta essay:
By applying these techniques on e-commerce website search, we have accomplished the following results in the first few weeks.
- Increase in conversion ratio: +1.73%
- Increase average purchase value: +11%
Okay, interesting numbers. What is the factual foundation of them? What method was used to calculate the deltas? What was the historical base of the specific sites in the sample?
In a world in which vendors and their pet consultants jump forward with predictions, assertions, and announcements of breakthroughs—some simple facts can be quite helpful. I am okay with self promotion but when asking me to see comparisons, I have to be able to run the queries myself. Without that important step, I am skeptical just as I was with the sci-fi fancies of the folks who put marketing before substance.
Stephen E Arnold, April 6, 2013
Sponsored by Augmentext
Newest Version of MongoDB Includes Text Search
April 6, 2013
Some welcome enhancements to MongoDB are included in the open-source data base’s latest release, we learn from “MongoDB 2.4 Can Now Search Text,” posted at the H Open. The ability to search text indexes has been one of the most requested features, and the indexing supports 14 languages (or no language at all.) The write-up supplies this handy link to a discussion of techniques for creating and searching text indexes.
The post describes a second feature of MongoDB 2.4, the hashed index and sharding:
“Hash-based sharding allows data and CPU load to be spread well between distributed database nodes in a simple to implement way. The developers recommend it for cases of randomly accessed documents or unpredictable access patterns. New Geospatial indexes with support for GeoJSON and spherical geometry allow for 2dsphere indexing; this, in turn, offers better spherical queries and can store points, lines and polygons.”
There is also a new modular authentication system, though its availability is limited so far. The project has also: added support for fixed sized arrays in documents; optimized counting performance in the execution engine; and added a working set size analyzer. See the article for more details, or see the release notes, which include upgrade instructions. The newest version can be downloaded here.
Cynthia Murrell, April 06, 2013
Sponsored by ArnoldIT.com, developer of Augmentext