Inxight Moving to EC Wise

November 13, 2012

If true, this is an interesting development. The site Dvd to Ipad Converter Reviews announces the acquisition of data-management-developer resource Inxight Software by enterprise software company EC Wise in “BO M & Inxight Software.” The headline’s a little confusing; whither “business objects” in text mining we wonder?

The brief write up states:

“EC Wise said the company plans to Inxight’s unstructured information that companies added to the EC Wise business intelligence products to help customers take full advantage of all the data to make the right decisions. . . . EC Wise said, Inxight’s text analytics, federated search and visualization applications, will become part of EC Wise XI platform.”

Unfortunately, we have zero information about the value of this deal, which is expected to be completed in July.

The piece also gives us this observation:

“The acquisition reflects the consolidation trend in the software industry. Last Tuesday, the German software company AG said the United States have been given approval to $ 546,000,000 of its acquisition of webMethods.(AG is a building SOA-based software vendor), last week, Microsoft also announced that six billion U.S. dollars acquisition of online advertising company aQuantive.Earlier this month, Oracle acquired the company to 495 million Agile Software Corporation.”

EC Wise U.S. is based in San Rafael, California, while EC Wise Sichuan makes its home in Chengdu, China. The company focuses on business intelligence, big data, business process optimization/ automation, and, interestingly, gaming and entertainment.

Inxight Software‘s impressive customer roster includes Morgan Stanley and Yahoo. It emerged in 1997 from the Xerox Palo Alto Research Center. The company has changed hands a couple of times already, having been bought by Business Objects in 2007, which was in turn absorbed by SAP in 2008. Let’s hope the company finds a happy home at EC Wise.

Cynthia Murrell, November 13, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Business intelligence, Data, News | Comments Off on Inxight Moving to EC Wise

Europe and Disinformation: Denmark? Denmark.

December 7, 2016

If you want to catch up on what “Europe” is doing about disinformation, you will want to read “European Union Efforts to Counter Disinformation.” After you have worked through the short document, do a couple of queries on Bing, Google, Inxight, and Yandex for Copenhagen protests. With a bit of work, you will locate a December 4, 2016, write up from the estimable Express newspaper Web site. The story is “WAR ON DENMARK’S STREETS: Migrant Chaos Sparks Clashes between Police and Protestors.” Disinformation, misinformation, and reformation of information are different facets of this issue. However, a growing problem is the absence of information. Locating semi accurate “factoids” is a tough job. “Real” journalists prefer to recycle old information or just take what pops into their mobile phone’s browser. Hey, finding out things is really hard. People are really busy with the Facebook thing. Are you planning a holiday in Denmark where a policeman was shot in the head on December 6, 2016? No quotes because the source is the outstanding Associated Press. That outfit does not want people like me to recycle their factoids. Hey, where’s the story about the car burnings which have been increasing this year? Oh, never mind. If the information is not in Google, it does not exist. Convenient? You bet.

Stephen E Arnold, December 7, 2016

Written by Stephen E. Arnold · Filed Under News, Publishing | Comments Off on Europe and Disinformation: Denmark? Denmark.

Hear That Bing Ding: A Warning for Google Web Search

November 23, 2016

Bing. Bing. Bing. The sound reminds me of a broken elevator door in the Block & Kuhl when I was but a wee lad. Bing. Bing. Bing. Annoying? You bet.

I read “Microsoft Corporation Can Defeat Alphabet Inc in Search.” I enjoy these odd, disconnected from the real world write ups predicting that Microsoft will trounce Google in a particular niche. This particular write up seizes upon the fluff about Microsoft having an “intelligence fabric.” Then with a spectacular leap, which ignores the fact that more than 90 percent of the humans use Google Web search, suggests that Bing will be the next big thing in Web search.

Get real.

Bing, after two decades of floundering, allegedly is profitable. No word on how long it will take to pay back the money Microsoft has invested in Web search over these 4,000 days of stumbling.

I highlighted this passage in the write up:

Rik van der Kooi, corporate vice president of Microsoft Search Advertising, referred to Bing as an “intelligence fabric” that has been embedded into Windows 10, Cortana, Xbox and other products, including Hololens. He went on to say the future Bing will be personal, pervasive and offer a personal experience so much that it “might not be obvious users are even interacting with the search engine.

I think I understand. Microsoft is everywhere. Microsoft Bing is embedded. Therefore, Microsoft beats Google Web search.

Great thinking.

I do like this passage:

This is a bold call considering that Google owned 89.38% of the global desktop search engine market, while Microsoft owned 4.2% as of July 2016, according to data provided by Statista. With MSFT’s endeavors to create an integrated ecosystem, however, the long-term scale is tipping in the favor of Microsoft stock. That’s because Microsoft’s traditional business is entrenched into many people’s lives as well as business operations. For instance, the majority of desktop devices run on Windows.

Yep, there are lots of desktops still. However, there are more mobile devices. If I am not mistaken, Google’s Android runs more than 80 percent of these devices. Add desktop and mobile and what do you get? No dominance of Web search by Bing the way I understand the situation.

Sure, I love the Bing thing. I have some affection for Qwant.com, Yandex.com, and Inxight.com too. But Microsoft has yet to demonstrate that it can deliver a Web search system which is able to change the behaviors of today’s users. Look at the Google in the word processing space. Microsoft continues to have an edge and Google has been trying for more than a decade to make Word an afterthought. That hasn’t happened. Inertia is a big factor.

Search for growing market share on Bing. What’s that answer look like? Less than five percent of the Web search market? Oh, do that query on Google by the way.

Stephen E Arnold, November 23, 2016

Written by Stephen E. Arnold · Filed Under Google, Microsoft, News, Search | 1 Comment

Massive Analytic: The First Precognitive Analytics Platform

October 1, 2015

Let me reflect a moment. IBM is doing cognitive computing. I am assuming that the on going PR and marketing activities are accurate representation of money making technologies.

Massive goes IBM one better. The Massive Analytic outfit claims on its Web site that it delivers “effortless data driven decisions.” The product or service is Oscar AP, which allows you to “analyze all your data with artificial precognition.”

Interesting. About five weeks ago, I read “SAP, Oracle and HP Don’t Get Big Data, Claims Massive Analytic Chairman.” In the article, I learned:

Large IT vendors such as SAP, Oracle and HP don’t understand how to properly help their customers to make the most of big data, being more concerned about locking them into their ecosystems than providing them with true analytical insight. That’s according to George Frangou, executive chairman and founder of “precognitive data analytics” platform Oscar AP, which Frangou described as “an AI that allows people to foresee the future and the outcome of their decisions” which “makes Minority Report real”.

That reminded me of Recorded Future, an outfit partially funded by the Alphabet Google thing and In-Q-Tel, the US government intelligence community’s investment fund. Recorded Future rolled out in 2008 after a year or so of gestation. Massive Analytic took its first breath in 2010. I assume the wiggle room created by the term “precognitive” allows Massive Analytic to claim the adjective “first.”

The write up about Massive Analytic contained a statement which I found interesting. I circled this in red, gentle reader:

according to Frangou, larger competitors such as SAP, Oracle and HP “don’t get it” when it comes to making the most of big data and analytics. “They don’t get it because the driver for them is to sell kit. … You’re into millions of dollars before you start,” he said, attacking the aggressive sales tactics of the big vendors, which he said are designed solely to sell the product and not to provide support. “And by the way, the actual algorithms don’t scale either, so you’re into lots of people and manual intervention,” he added. Because of this, Frangou said Massive Analytic is “quite unashamedly following a displacement strategy to displace the incumbents because they’re not getting it”. He added that SAP HANA, Oracle Exalytics and HP Haven are essentially the same product because they’re built on the same base code.

It is true that most analytics vendors recycle what the engineers and mathematicians with MBAs learned in their university courses. I am not sure about the “algorithms don’t scale.” There are issues with algorithms, but as the work by SRCH2 shows, there is a great deal of innovation opportunity in optimization.

But the point which I find slightly jarring is the reference to SAP, Oracle, and HP “built on the same base code.”

Well, maybe. SAP uses home brew code (anyone remember TREX), acquired stuff from Business Objects (Inxight), and open source snippets. Oracle uses the wild and crazy home brew code, acquired code from “analytics” outfits like Endeca, and confections from some of the Oracle partner ecosystem. HP—an example for MBA cases studies for the next couple of decades—uses home brew, acquired technology from outfits like Autonomy, and probably scripts written by the Board of Directors and Meg Whitman in their spare time.

What the three companies share is, therefore:

Code written by employees and contractors
Code from open source and licensed libraries
Code from companies acquired in moments of great wisdom.

The wrappers each of these companies exposes to its customers and partners make it easy to use the popular programming conventions, recycle structured query language, and exploit reasonably stable Web conventions.

I would suggest that once one looks under the hood of one of these companies’ projects, there will be a world of differences. There is a simple reason or two.

First, some familiar bits and lots of unfamiliar or downright extraterrestrial methods translate to job security and on going consulting work. Who wants to lose a night Oracle DBA job? Not anyone I know.

Second, enterprise software is about customization. I know the yap about enterprise apps, but these apps are little more than customized scripts to allow a hapless marketer with a degree in home economics to pull down a standard report.

I will leave it to you to unravel the mysteries of precognitive analytics and the assertion that HP, Oracle, and SAP are peas in a pod.

Stephen E Arnold, October 1, 2015

Written by Stephen E. Arnold · Filed Under Analytics, Business strategy, News | 1 Comment

SAP and Business Intelligence: Simple Stuff, Really Simple

May 14, 2015

I came across an interesting summary of SAP’s business intelligence approach. Navigate to “SAP BI Suite Roadmap Strategy Update from ASUG SapphireNow.” ASUG, in case you are not into the SAP world, means America’s SAP User Group. Doesn’t everyone know that acronym? I did not.

The article begins with a legal disclaimer, always a strange attractor to me. I find content on the Web which includes unreadable legal lingo sort of exciting.

It is almost as thrilling as some of the security methods which SAP employs across its systems and software. I learned from a former SAP advisor that SAP was, as I recall the comment, “Security has never been a priority at SAP.”

The other interesting thing about the article is that it appears to be composed of images captured either from a low resolution screen capture program or a digital camera without a massive megapixel capability.

I worked through the slides and comments as best as I could. I noted several points in addition to the aforementioned lacunae regarding security; to wit:

SAP wants to simplify the analytics landscape. This is a noble goal, but my experience has been that SAP is a pretty complex beastie. That may be my own ignorance coloring what is just an intuitive, tightly integrated example of enterprise software.
SAP likes dedicating servers or clusters of servers to tasks. There is a server for the in memory database. There is a server for what I think used to be Business Objects. There is the SAP desktop. There are edge servers in case your SAP installation is not for a single user. There is the SAP cloud which, I assume, is an all purpose solution to computational and storage bottlenecks. Lots of servers.
Business Objects is the business intelligence engine. I am not confident in my assessment of complexity, but, as I recall, Business Objects can be a challenge.

My reaction to the presentation is that for the faithful who owe their job and their consulting revenue to SAP’s simplified business intelligence solutions and servers, joy suffuses their happy selves.

For me, I keep wondering about security. And whatever happened to TREX? What happened to Inxight’s Thingfinder and related server technologies?

How simple can an enterprise solution be? Obviously really simple. Did I mention security?

Stephen E Arnold, May 14, 2015

Written by Stephen E. Arnold · Filed Under Business intelligence, Business strategy, News, Security | 2 Comments

The Law of Moore: Is Information Retrieval an Exception?

April 17, 2015

I read “Moore’s Law Is Dead, Long Live Moore’s Law.” The “law” cooked up by a chip company suggests that in technology stuff gets better, faster, and cheaper.” With electronic brains getting, better, faster, cheaper, it follows that phones are more wonderful every few weeks. The logic applies to laptops, intelligence in automobiles, and airline related functions.

The article focuses on the Intel-like world of computer parts. The write up makes this point which I highlighted:

From 2005 through 2014, Moore’s Law continued — but the emphasis was on improving cost by driving down the expense of each additional transistor. Those transistors might not run more quickly than their predecessors, but they were often more power-efficient and less expensive to build.

Yep, the cheaper point is significant. The article then tracks to a point that warranted a yellow highlight:

After 50 years, Moore’s Law has become cultural shorthand for innovation itself. When Intel, or Nvidia, or Samsung refer to Moore’s Law in this context, they’re referring to the continuous application of decades of knowledge and ingenuity across hundreds of products. It’s a way of acknowledging the tremendous collaboration that continues to occur from the fab line to the living room, the result of painstaking research aimed to bring a platform’s capabilities a little more in line with what users want. Is that marketing? You bet. But it’s not just marketing.

These two points sparked my thinking about the discipline of enterprise information access. Enterprise search relies on a wide range of computing operations. If these operations are indeed getting better, faster, and cheaper, does it make sense to assume that information retrieval is also getting better, faster, and cheaper?

What is happening from my point of view is that the basic design of enterprise information access systems has not changed significantly in the last decade, maybe longer. There is the content acquisition module, the normalization or transformation module, the indexing module, the query processing module, the administrative module, and other bits and pieces.

The outputs from today’s information access systems do not vary much from the outputs available from systems on offer a decade ago. Endeca generated visual reports by 2003. Relationship maps were available from Inxight and Semio (remember that outfit) even earlier. Smart software like the long forgotten Inference system winnowed results on what the user sought in his or her query. Linguistic functions were the heart and soul of Delphes. Statistical procedures were the backbone of PLS, based on Cornell wizardry.

Search and retrieval has benefited from faster hardware. But the computational burdens piled on available resources have made it possible to layer on function after function. The ability to make layers of content processing and filtering work has done little to ameliorate the grousing about many enterprise search systems.

The fix has not been to deliver a solution significantly different from what Autonomy and Fast Search offered in 2001. The fix has been to shift from what users’ need to deal with business questions to:

Business intelligence
Semantics
Natural language processing
Cognitive computing
Metadata
Visualization
Text analytics.

I know I am missing some of the chestnuts. The point is that information access may be lagging behind certain other sectors; for example, voice search via a mobile device. When I review a “new” search solution, I often find myself with the same sense of wonder I had when I first walked through the Smithsonian Museum: Interesting but mostly old stuff.

Just a thought that enterprise search is delivering less, not “Moore.”

Stephen E Arnold, April 17, 2015

Written by Stephen E. Arnold · Filed Under Enterprise search, News, Search | Comments Off on The Law of Moore: Is Information Retrieval an Exception?

Attensity Adds Semantic Markup

April 3, 2015

You have been waiting for more markup. I know I have, and that is why I read “Attensity Semantic Annotation: NLP-Analyse für Unternehmensapplikationen.”

So your wait and mine—over.

Attensity, a leading in figuring out what human discourse means, has rolled out a software development kit so you can do a better job with customer engagement and business intelligence. Attensity offers Dynamic Data Discovery. Unlike traditional analysis tools, Attensity does not focus on keywords. You know, what humans actually use to communicate.

Attensity uses natural language processing in order to identify concepts and issues in plain language. I must admit that I have heard this claim from a number of vendors, including long forgotten systems like DR LINK, among others.

The idea is that the SDK makes it easier to filter data to evaluate textual information and identify issues. Furthermore the SDK performs fast content fusion. The result is, as other vendors have asserted, insight. There was a vendor called Inxight which asserted quite similar functions in 1997. At one time, Attensity had a senior manager from Inxight, but I assume the attribution of functions is one of Attensity’s innovations. (Forgive me for mentioning vendors with which some 20 somethings know quite well.)

If you are dependent upon Java, Attensity is an easy fit. I assume that if you are one of the 150 million plus Microsoft SharePoint outfits, Attensity integration may require a small amount of integration work.

According the Attensity, the benefits of Attensity’s markup approach is that the installation is on site and therefore secure. I am not sure about this because security is person dependent, so cloud or on site, security remains an “issue” different from the one’s Attensity’s system identifies.

Attensity, like Oracle, provides a knowledge base for specific industries. Oracle made term lists available for a number of years. Maybe since its acquisition of Artificial Linguistics in the mid 1990s?

Attensity supports five languages. For these five languages, Attensity can determine the “tone” of the words used in a document. Presumably a company like Bitext can provide additional language support if Attensity does not have these ready to install.

Vendors continue to recycle jargon and buzzwords to describe certain core functions available from many different vendors. If your metatagging outfit is not delivering, you may want to check out Attensity’s solution.

Stephen E Arnold, April 3, 2015

Written by Stephen E. Arnold · Filed Under Marketing, News, Text analytics | Comments Off on Attensity Adds Semantic Markup

Taxonomy Turmoil: Good Enough May Be Too Much

February 28, 2015

For years, I have posted a public indexing Overflight. You can examine the selected outputs at this Overflight link. (My non public system is more robust, but the public service is a useful temperature gauge for a slice of the content processing sector.)

When it comes to indexing, most vendors provide keyword, concept tagging, and entity extraction. But are these tags spot on? No, most are good enough.

A happy quack to Jackson Taylor for this “good enough” cartoon. The salesman makes it clear that good enough is indeed good enough in today’s marketing enabled world.

I chose about 50 companies that asserted their systems performed some type of indexing or taxonomy function. I learned that the taxonomy business is “about to explode.” I find that to be either an interesting investment tip or a statement that is characteristic of content processing optimists.

Like search and retrieval, plugging in “concepts” or other index terms is a utility function. For example, if one indexes each word in an article appearing in this blog, the article might be about another subject. For example, in this post, I am talking about Overflight, but the real topic is the broader use of metadata in information retrieval systems. I could assign the term “faceted navigation” to this article as a way to mark this article as germane to point and click navigation systems.

If you examine the “reports” Overflight outputs for each of the companies, you will discover several interesting things as I did on February 28, 2015 when I assembled this short article.

Mergers or buying failed vendors at fire sale prices are taking places. Examples include Lucidea’s purchase of Cuadra and InMagic. Both of these firms are anchored in traditional indexing methods and seemed to be within a revenue envelope until their sell out. Business Objects acquired Inxight and then SAP acquired Business Objects. Bouvet acquired Ontopia. Teradata acquired Revelytix
Moving indexing into open source. Thomson Reuters acquired ClearForest and made most of the technology available as OpenCalais. OpenText, a rollup outfit, acquired Nstein. SAS acquired Teragram. Smartlogic acquired Schemalogic. (A free report about Schemalogic is available at www.xenky.com/vendor-profiles.)
A number of companies just failed, shut down, or went quiet. These include Active Classification, Arikus, Arity, Forth ICA, MaxThink, Millennium Engineering, Navigo, Progris, Protege, punkt.net, Questans, Quiver, Reuse Company, Sandpiper,
The indexing sector includes a number of companies my non public system monitors; for example, the little known Data Harmony with six figure revenues after decades of selling really hard to traditional publishers. Conclusion: Indexing is a tough business to keep afloat.

There are numerous vendors who assert their systems perform indexing, entity, and metadata extraction. More than 18 of these companies are profiled in CyberOSINT, my new monograph. Oracle owns Triple Hop, RightNow, and Endeca. Each of these acquired companies performs indexing and metadata operations. Even the mashed potatoes search solution from Microsoft includes indexing tools. The proprietary XML data management vendor MarkLogic asserts that it performs indexing operations on content stored in its repository. Conclusion: More cyber oriented firms are likely to capture the juicy deals.

So what’s going on in the world of taxonomies? Several observations strike me as warranted:

First, none of the taxonomy vendors are huge outfits. I suppose one could argue that IBM’s Lucene based system is a billion dollar baby, but that’s marketing peyote, not reality. Perhaps MarkLogic which is struggling toward $100 million in revenue is the largest of this group. But the majority of the companies in the indexing business are small. Think in terms of a few hundred thousand in annual revenue to $10 million with generous accounting assumptions.

What’s clear to me is that indexing, like search, is a utility function. If a good enough search system delivers good enough indexing, then why spend for humans to slog through the content and make human judgments. Why not let Google funded Recorded Future identify entities, assign geo codes, and extract meaningful signals? Why not rely on Haystax or RedOwl or any one of more agile firms to deliver higher value operations.

I would assert that taxonomies and indexing are important to those who desire the accuracy of a human indexed system. This assumes that the humans are subject matter specialists, the humans are not fatigued, and the humans can keep pace with the flow of changed and new content.

The reality is that companies focused on delivering old school solutions to today’s problems are likely to lose contracts to companies that deliver what the customer perceives as a higher value content processing solution.

What can a taxonomy company do to ignite its engines of growth? Based on the research we performed for CyberOSINT, the future belongs to those who embrace automated collection, analysis, and output methods. Users may, if the user so chooses, provide guidance to the system. But the days of yore, when monks with varying degrees of accuracy created catalog sheets for the scriptoria have been washed to the margin of the data stream by today’s content flows.

What’s this mean for the folks who continue to pump money into taxonomy centric companies? Unless the cyber OSINT drum beat is heeded, the failure rate of the Overflight sample is a wake up call.

Buying Apple bonds might be a more prudent financial choice. On the other hand, there is an opportunity for taxonomy executives to become “experts” in content processing.

Stephen E Arnold, February 28, 2015

Written by Stephen E. Arnold · Filed Under Business strategy, Cyber OSINT, Feature, Indexing, Taxonomy | Comments Off on Taxonomy Turmoil: Good Enough May Be Too Much

SAS Releases a New Component of Enterprise Miner: SAS Text Miner

November 20, 2014

The product article for SAS Text Miner on SAS Products offers some insight into the new element of SAS Enterprise Miner. SAS acquired Teragram and that “brand” has disappeared. Some of the graphics on the Text Miner page are reminiscent of SAP Business Objects’ Inxight look. The overview explains,

“SAS Text Miner provides tools that enable you to extract information from a collection of text documents and uncover the themes and concepts that are concealed in them. In addition, you can combine quantitative variables with unstructured text and thereby incorporate text mining with other traditional data mining techniques.SAS Text Miner is a component of SAS Enterprise Miner. SAS Enterprise Miner must be installed on the same machine.”

New features and enhancements for the Text Miner include support for English and German parsing and new functionality. For more information about the Text Miner, visit the Support Community available for users to ask questions and discover the best approaches for the analysis of unstructured data. SAS was founded in 1976 after the software was created at North Carolina State University for agricultural research. As the software developed, various applications became possible, and the company gained customers in pharmaceuticals, banks and government agencies.
Chelsea Kerwin, November 20, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under News, Text processing | Comments Off on SAS Releases a New Component of Enterprise Miner: SAS Text Miner

AeroText: A New Breakthrough in Entity Extraction

June 30, 2014

I returned from a brief visit to Europe to an email asking about Rocket Software’s breakthrough technology AeroText. I poked around in my archive and found a handful of nuggets about the General Electric Laboratories’ technology that migrated to Martin Marietta, then to Lockheed Martin, and finally in 2008 to the low profile Rocket Software, an IBM partner.

When did the text extraction software emerge? Is Rocket Software AeroText a “new kid on the block”? The short answer is that AeroText is pushing 30, maybe 35 years young.

Digging into My Archive of Search Info

As far as my archive goes, it looks as though the roots of AeroText are anchored in the 1980s, Yep, that works out to an innovation about the same age as the long in the tooth ISYS Search system, now owned by Lexmark. Over the years, the AeroText “product” has evolved, often in response to US government funding opportunities. The precursor to AeroText was an academic exercise at General Electric. Keep in mind that GE makes jet engines, so GE at one time had a keen interest in anything its aerospace customers in the US government thought was a hot tamale.

The AeroText interface circa mid 2000. On the left is the extraction window. On the right is the document window. From “Information Extraction Tools: Deciphering Human Language, IT Pro, November December 2004, page 28.

The GE project, according to my notes, appeared as NLToolset, although my files contained references to different descriptions such as Shogun. GE’s team of academics and “real” employees developed a bundle of tools for its aerospace activities and in response to Tipster. (As a side note, in 2001, there were a number of Tipster related documents in the www.firstgov.gov system. But the new www.usa.gov index does not include that information. You will have to do your own searching to unearth these text processing jump start documents.)

The aerospace connection is important because the Department of Defense in the 1980s was trying to standardize on markup for documents. Part of this effort was processing content like technical manuals and various types of unstructured content to figure out who was named, what part was what, and what people, places, events, and things were mentioned in digital content. The utility of NLToolset type software was for cost reduction associated with documents and the intelligence value of processed information.

The need for a markup system that worked without 100 percent human indexing was important. GE got with the program and appears to have assigned some then-young folks to the project. The government speak for this type of content processing involves terms like “message understanding” or MU, “entity extraction,” and “relationship mapping. The outputs of an NLToolset system were intended for use in other software subsystems that could count, process, and perform other operations on the tagged content. Today, this class of software would be packaged under a broad term like “text mining.” GE exited the business, which ended up in the hands of Martin Marietta. When the technology landed at Martin Marietta, the suite of tools was used in what was called in the late 1980s and early 1990s, the Louella Parsing System. When Lockheed and Martin merged to form the giant Lockheed Martin, Louella was renamed AeroText.

Over the years, the AeroText system competed with LingPipe, SRA’s NetOwl and Inxight’s tools. In the hay day of natural language processing, there were dozens and dozens of universities and start ups competing for Federal funding. I have mentioned in other articles the importance of the US government in jump starting the craziness in search and content processing.

In 2005, I recall that Lockheed Martin released AeroText 5.1 for Linux, but I have lost track of the open source versions of the system. The point is that AeroText is not particularly new, and as far as I know, the last major upgrade took place in 2007 before Lockheed Martin sold the property to AeroText. At the time of the sale, AeroText incorporated a number of subsystems, including a useful time plotting feature. A user could see tagged events on a timeline, a function long associated with the original version of i2’s the Analyst Notebook. A US government buyer can obtain AeroText via the GSA because Lockheed Martin seems to be a reseller of the technology. Before the sale to Rocket, Lockheed Martin followed SAIC’s push into Australia. Lockheed signed up NetMap Analytics to handle Australia’s appetite for US government accepted systems.

AeroText Functionality

What does AeroText purport to do that caused the person who contacted me to see a 1980s technology as the next best thing to sliced bread?

AeroText is an extraction tool; that is, it has capabilities to identify and tag entities at somewhere between 50 percent and 80 percent accuracy. (See NIST 2007 Automatic Content Extraction Evaluation Official Results for more detail.)

The AeroText approach uses knowledgebases, rules, and patterns to identify and tag pre-specified types of information. AeroText references patterns and templates, both of which assume the licensee knows beforehand what is needed and what will happen to processed content.

In my view, the licensee has to know what he or she is looking for in order to find it. This is a problem captured in the famous snippet, “You don’t know what you don’t know” and the “unknown unknowns” variation popularized by Donald Rumsfeld. Obviously without prior knowledge the utility of an AeroText-type of system has to be matched to mission requirements. AeroText pounded the drum for the semantic Web revolution. One of AeroText’s key functions was its ability to perform the type of markup the Department of Defense required of its XML. The US DoD used a variant called DAML or Darpa Agent Markup Language. natural language processing, Louella, and AeroText collected the dust of SPARQL, unifying logic, RDF, OWL, ontologies, and other semantic baggage as the system evolved through time.

Also, staff (headcount) and on-going services are required to keep a Louella/AeroText-type system generating relevant and usable outputs. AeroText can find entities, figure out relationships like person to person and person to organization, and tag events like a merger or an arrest “event.” In one briefing about AeroText I attended, I recall that the presenter emphasized that AeroText did not require training. (The subtext for those in the know was that Autonomy required training to deliver actionable outputs.) The presenter did not dwell on the need for manual fiddling with AeroText’s knowledgebases and I did not raise this issue.)

Written by Stephen E. Arnold · Filed Under Analytics, Feature, Search | 1 Comment

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Inxight Moving to EC Wise

Europe and Disinformation: Denmark? Denmark.

Hear That Bing Ding: A Warning for Google Web Search

Massive Analytic: The First Precognitive Analytics Platform

SAP and Business Intelligence: Simple Stuff, Really Simple

The Law of Moore: Is Information Retrieval an Exception?

Attensity Adds Semantic Markup

Taxonomy Turmoil: Good Enough May Be Too Much

SAS Releases a New Component of Enterprise Miner: SAS Text Miner

AeroText: A New Breakthrough in Entity Extraction

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta