July 2, 2015
The article titled Metadata Matters; What’s The One Piece of Technology Microsoft Doesn’t Provide On-Premises Or in the Cloud? on ConceptSearching re-introduces Compound Search Processing, ConceptSearching’s main offering. Compound Search Processing is a technology achieved in 2003 that can identify multi-word concepts, and the relationships between words. Compound Search Processing is being repositioned, with Concept Searching apparently chasing Sharepoint Sales. The article states,
“The missing piece of technology that Microsoft and every other vendor doesn’t provide is compound term processing, auto-classification, and taxonomy that can be natively integrated with the Term Store. Take advantage of our technologies and gain business advantages and a quantifiable ROI…
Microsoft is offering free content migration for customers moving to Office 365…If your content is mismanaged, unorganized, has no value now, contains security information, or is an undeclared record, it all gets moved to your brand new shiny Office 365.”
The angle for Concept Searching is metadata and indexing, and they are quick to remind potential customers that “search is driven by metadata.” The offerings of ConceptSearching comes with the promise that it is the only platform that will work with all versions of Sharepoint while delivering their enterprise metadata repository. For more information on the technology, see the new white paper on Compoud Term Processing.
Chelsea Kerwin, July 2, 2014
July 1, 2015
have expressed interest in Computer Sciences Corp’s public sector division. There are not a lot of details about the possible transaction as it is still in the early stages, so everything is still hush-hush.
The possible acquisition came after the news that CSC will split into two divisions: one that serves US public sector clients and the other dedicated to global commercial and non-government clients. CSC has an estimated $4.1 billion in revenues and worth $9.6 billion, but CACI International, Leidos Holdings, and Booz Allen Hamilton might reconsider the sale or getting the price lowered after hearing this news: “Computer Sciences (CSC) To Pay $190M Penalty; SEC Charges Company And Former Executives With Accounting Fraud” from Street Insider. The Securities and Exchange Commission are charging CSC and former executives with a $190 million penalty for hiding financial information and problems resulting from the contract they had with their biggest client. CSC and the executives, of course, are contesting the charges.
“The SEC alleges that CSC’s accounting and disclosure fraud began after the company learned it would lose money on the NHS contract because it was unable to meet certain deadlines. To avoid the large hit to its earnings that CSC was required to record, Sutcliffe allegedly added items to CSC’s accounting models that artificially increased its profits but had no basis in reality. CSC, with Laphen’s approval, then continued to avoid the financial impact of its delays by basing its models on contract amendments it was proposing to the NHS rather than the actual contract. In reality, NHS officials repeatedly rejected CSC’s requests that the NHS pay the company higher prices for less work. By basing its models on the flailing proposals, CSC artificially avoided recording significant reductions in its earnings in 2010 and 2011.”
Oh boy! Is it a wise decision to buy a company that has a history of stealing money and hiding information? If the company’s root products and services are decent, the buyers might get it for a cheap price and recondition the company. Or it could lead to another disaster like HP and Autonomy.
Whitney Grace, July 1, 2015
June 30, 2015
Facebook recently enabled users to post GIF images on the social media platform. Reddit was in an uproar over the new GIF and celebrated by posting random moving images from celebrities making weird faces to the quintessential cute kitten. GIFs are an Internet phenomenon and are used by people to express their moods, opinions, or share their fandom. Another popular social medium platform, Tumblr, the microblogging site used to share photos, videos, quotes, and more, has added a GIF search, says PCMag in “Tumblr Adds New GIF Search Capabilities.”
The main point of Tumblr is the ability share content either a user creates or someone else creates. A user’s Tumblr page is a personal reflection of themselves and GIFs are one of the ultimate content pieces to share. Tumblr’s new search option for GIFs is very simple: a user picks the + button, clicks the GIF button, and then search for the GIF that suits your mood. A big thing on Tumblr is citing who created a piece and the new search option has that covered:
“Pick the GIF you want and it slinks right in, properly credited and everything,” the company said. “Whoever originally posted the GIF will be notified accordingly. On their dashboard, on their phone, all the regular places notifications go.”
GIFs are random bits of fun that litter the Internet and quickly achieve meme status. They are also easy to make, which appeals to people with vey little graphic background. They can make something creative and fun without much effort and now the can be easily found and shared on Tumblr.
Whitney Grace, June 30, 2015
June 26, 2015
The article titled Spy Tools Come to the Cloud on Enterprise Tech shows how Amazon’s work with analytics companies on behalf of the government have realized platforms like “GovCloud”, with increased security. The presumed reason for such platforms being the gathering of intelligence and threat analysis on the big data scale. The article explains,
“The Digital Reasoning cognitive computing tool is designed to generate “knowledge graphs of connected objects” gleaned from structured and unstructured data. These “nodes” (profiles of persons or things of interest) and “edges” (the relationships between them) are graphed, “and then being able to take this and put it into time and space,” explained Bill DiPietro, vice president of product management at Digital Reasoning. The partners noted that the elastic computing capability… is allowing customers to bring together much larger datasets.”
For former CIA staff officer DiPietro it logically follows that bigger questions can be answered by the data with tools like the AWS GovCloud and subsequent Hadoop ecosystems. He cites the ability to quickly spotlight and identify someone on a watch list out of the haystack of people as the challenge set to overcome. They call it “cluster on demand,” the process that allows them to manage and bring together data.
Chelsea Kerwin, June 26, 2015
June 25, 2015
Twitter has been experimenting with improving its search results and according to TechCrunch the upgrade comes via a new search results interface: “Twitter’s New Search Results Interface Expands To All Users.” The new search results interface is the one of the largest updates Twitter has made in 2015. It is supposed to increase the ease with a cleaner look and better filtering options. Users will now be able to filter search results by live tweets, photos, videos, news, accounts, and more.
Twitter made the update to help people better understand how to use the message service and to take a more active approach to using it, rather than passively reading other peoples tweets. The update is specifically targeted at new Twitter users.
The tweaked search interface will return tweets related to the search phrase or keyword, but that does not mean that the most popular tweets are returned:
“In some cases, the top search result isn’t necessarily the one with the higher metrics associated with it – but one that better matches what Twitter believes to be the searcher’s “intent.” For example, a search for “Steve Jobs” first displays a heavily-retweeted article about the movie’s trailer, but a search for “Mad Men” instead first displays a more relevant tweet ahead of the heavily-favorited “Mad Men” mention by singer Lorde.”
The new interface proves to be simpler and better list trends, related users, and news. It does take a little while to finesse Twitter, which is a daunting task to new users. Twitter is not the most popular social network these day and it’s using these updates to increase its appeal.
June 23, 2015
MIT did not discover object recognition, but researchers did teach a deep-learning system designed to recognize and classify scenes can also be used to recognize individual objects. Kurzweil describes the exciting development in the article, “MIT Deep-Learning System Autonomously Learns To Identify Objects.” The MIT researchers realized that deep-learning could be used for object identification, when they were training a machine to identify scenes. They complied a library of seven million entries categorized by scenes, when they learned that object recognition and scene-recognition had the possibility of working in tandem.
“ ‘Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building,’ says Antonio Torralba, an associate professor of computer science and engineering at MIT and a senior author on the new paper.”
When the deep-learning network was processing scenes, it was fifty percent accurate compared to a human’s eighty percent accuracy. While the network was busy identifying scenes, at the same time it was learning how to recognize objects as well. The researchers are still trying to work out the kinks in the deep-learning process and have decided to start over. They are retraining their networks on the same data sets, but taking a new approach to see how scene and object recognition tie in together or if they go in different directions.
Deep-leaning networks have major ramifications, including the improvement for many industries. However, will deep-learning be applied to basic search? Image search still does not work well when you search by an actual image.
June 22, 2015
Despite efforts to maintain an open Internet, malware seems to be pushing online explorers into walled gardens, akin the old AOL setup. The trend is illustrated by a story at PandoDaily, “Security Trumps Ideology as Google Closes Off its Chrome Platform.” Beginning this July, Chrome users will only be able to download extensions for that browser from the official Chrome Web Store. This change is on the heels of one made in March—apps submitted to Google’s Play Store must now pass a review. Extreme measures to combat an extreme problem with malicious software.
The company tried a middle-ground approach last year, when they imposed the our-store-only policy on all users except those using Chrome’s development build. The makers of malware, though, are adaptable creatures; they found a way to force users into the development channel, then slip in their pernicious extensions. Writer Nathanieo Mott welcomes the changes, given the realities:
“It’s hard to convince people that they should use open platforms that leave them vulnerable to attack. There are good reasons to support those platforms—like limiting the influence tech companies have on the world’s information and avoiding government backdoors—but those pale in comparison to everyday security concerns. Google seems to have realized this. The chaos of openness has been replaced by the order of closed-off systems, not because the company has abandoned its ideals, but because protecting consumers is more important than ideology.”
Better safe than sorry? Perhaps.
Cynthia Murrell, June 22, 2015
June 20, 2015
Think back. Vivisimo asserted that it deduplicated and presented federated search results. There are folks at Oracle who have pointed to Outside In and other file conversion products available from the database company as a way to deal with different types of data. There are specialist vendors, which I will not name, who are today touting their software’s ability to turn a basket of data types into well-behaved rows and columns complete with metatags.
Well, not so fast.
Unifying structured and unstructured information is a time consuming, expensive process. The reasons for the obese exception files where objects which cannot be processed go to live out their short, brutish lives.
I read “Tamr Snaps Up $25.2 Million to Unify Enterprise Data.” The stakeholders know, as do I, that unifying disparate types of data is an elephant in any indexing or content analytics conference room. Only the naive believe that software whips heterogeneous data into Napoleonic War parade formations. Today’s software processing tools cannot get undercover police officers to look ship shape for the mayor.
Ergo, an outfit with an aversion to the vowel “e” plans to capture the flag on top of the money pile available for data normalization and information polishing. The write up states:
Tamr can create a central catalogue of all these data sources (and spreadsheets and logs) spread out across the company and give greater visibility into what exactly a company has. This has value on so many levels, but especially on a security level in light of all the recent high-profile breaches. If you do lose something, at least you have a sense of what you lost (unlike with so many breaches).
Tamr is correct. Organizations don’t know what data they have. I could mention a US government agency which does not know what data reside on the server next to another server managed by the same system administrator. But I shall not. The problem is common and it is not confined to bureaucratic blenders in government entities.
Tamr, despite the odd ball spelling, has Michael Stonebraker, a true wizard on the task. The write up mentions an outfit what might be politely described as a “database challenge” as a customer. If Thomson Reuters cannot figure out data after decades of efforts and millions upon millions of investment, believe me when I point out that Tamr may be on to something.
Stephen E Arnold, June 20, 2015
June 13, 2015
LinkedIn is the social network for professionals. The company meets the needs of individuals who want to be hired and companies looking to find individuals to fill jobs. We use the system to list articles I have written. If you examine some of the functions of LinkedIn, you may discover that sorting is a bit of disappointment.
LinkedIn has been working hard to find technical solutions to its data management challenges. One of the company’s approaches has been to create software, make it available as open source, and then publicize the contributions.
A recent example is the article “LinkedIn Fills Another SQL-on-Hadoop Niche.” What is interesting in the write up is that the article does not make clear what LinkedIn does with this software home brew. I learned:
Pinot was designed to provide the company with a way to ingest “billions of events per day” and serve “thousands of queries per second” with low latency and near-real-time results — and provide analytics in a distributed, fault-tolerant fashion.
On the surface, it seems that Hadoop is used as a basked. Then the basket’s contents is filtered using SQL queries. But for me the most interesting information in the write up is what the system does not do; for example:
- The SQL-like query language used with Pinot does not have the ability to perform table joins
- The data is (sic) strictly read-only
- Pinot is narrow in focus.
Has LinkedIn learned that its internal team needs more time and money to make Pinot a mash up with wider appeal? Commercial companies going open source is often a signal that the assumptions of the in house team have collided with management’s willingness to pay for a sustained coding commitment.
Stephen E Arnold, June 13, 2015
June 12, 2015
Hadoop fans, navigate to “A Better Mousetrap: A JSON Data Warehouse Takes on Hadoop.” There are a couple of very interesting statements in this write up. Those who do the Hadoop the loop know that certain operations are sloooow. Other operations are not efficient for certain types of queries. One learns about these Hadoop the Loops over time, but the issues are often a surprise to the Hadoop/Big Data cheerleaders.
The article reports that SonarW may have a good thing with its Mongo and JSON approach. For example, I highlighted:
In other words, Hadoop always tries to maximize resource utilization. But sometimes you need to go grab something real quick and you don’t need 100 nodes to do it.
That means the SonarW approach might address some sharp focus, data analysis tasks. I also noted:
What could work to SonarW’s advantage is its simplicity and lower cost (starting at $15,000 per terabyte) compared to traditional data warehouses and MPP systems. That might motivate even non-MongoDB-oriented companies to at least kick the tires.
Okay, good. One question which crossed my mind, will SonarW’s approach provide some cost and performance capabilities that offer some options to XML folks thinking JSON thoughts?
I think SonarW warrants watching.
Stephen E Arnold, June 12, 2015