April 7, 2015
Cyber OSINT continues to reshape information access. Traditional keyword search has been supplanted by higher value functions. One of the keystones for systems that push “beyond search” is technology patented and commercialized by BrightPlanet.
A search on Google often returns irrelevant or stale results. How can an organization obtain access to current, in-depth information from Web sites and services not comprehensively indexed by Bing, Google, ISeek, or Yandex?
The answer to the question is to turn to the leader in content harvesting, BrightPlanet. The company was one of the first, if not the first, to develop systems and methods for indexing information ignored by Web indexes which follow links. Founded in 2001, BrightPlanet has emerged as a content processing firm able to make accessible structured and unstructured data ignored, skipped, or not indexed by Bing, Google, and Yandex.
In the BrightPlanet seminar open to law enforcement, intelligence, and security professionals, BrightPlanet said the phrase “Deep Web” is catchy but it does not explain what type of information is available to a person with a Web browser. A familiar example is querying a dynamic database, like an airline for its flight schedule. Other types of “Deep Web” content may require the user to register. Once logged into the system, users can query the content available to a registered user. A service like Bitpipe requires registration and a user name and password each time I want to pull a white paper from the Bitpipe system. BrightPlanet can handle both types of indexing tasks and many more. BrightPlanet’s technology is used by governmental agencies, businesses, and service firms to gather information pertinent to people, places, events, and other topics
In an exclusive interview, William Bushee, the chief executive officer at BrightPlanet, reveals the origins of the BrightPlanet approach. He told Cyber Wizards Speak:
I developed our initial harvest engine. At the time, little work was being done around harvesting. We filed for a number of US Patents applications for our unique systems and methods. We were awarded eight, primarily around the ability to conduct Deep Web harvesting, a term BrightPlanet coined.
The BrightPlanet system is available as a cloud service. Bushee noted:
We have migrated from an on-site license model to a SaaS [software as a service] model. However, the biggest change came after realizing we could not put our customers in charge of conducting their own harvests. We thought we could build the tools and train the customers, but it just didn’t work well at all. We now harvest content on our customers’ behalf for virtually all projects and it has made a huge difference in data quality. And, as I mentioned, we provide supporting engineering and technical services to our clients as required. Underneath, however, we are the same sharply focused, customer centric, technology operation.
The company also offers data as a service. Bushee explained:
We’ve seen many of our customers use our Data-as-a-Service model to increase revenue and customer share by adding new datasets to their current products and service offerings. These additional datasets develop new revenue streams for our customers and allow them to stay competitive maintaining existing customers and gaining new ones altogether. Our Data-as-a-Service offering saves time and money because our customers no longer have to invest development hours into maintaining data harvesting and collection projects internally. Instead, they can access our harvesting technology completely as a service.
The company has accelerated its growth through a partnering program. Bushee stated:
We have partnered with K2 Intelligence to offer a full end-to-end service to financial institutions, combining our harvest and enrichment services with additional analytic engines and K2’s existing team of analysts. Our product offering will be a service monitoring various Deep Web and Dark Web content enriched with other internal data to provide a complete early warning system for institutions.
BrightPlanet has emerged as an excellent resource to specialized content services. In addition to providing a client-defined collection of information, the firm can provide custom-tailored solutions to special content needs involving the Deep Web and specialized content services. The company has an excellent reputation among law enforcement, intelligence, and security professionals. The BrightPlanet technologies can generate a stream of real-time content to individuals, work groups, or other automated systems.
BrightPlanet has offices in Washington, DC, and can be contacted via the BrightPlanet Web site atwww.brightplanet.com.
The complete interview is available at the Cyber Wizards Speak web site at www.xenky.com/brightplanet.
Stephen E Arnold, April 7, 2015
December 9, 2014
The article titled To Bing and Beyond on IDM provides an interview with Dave Hawking, an award-winner in the field of information retrieval and currently a Partner Architect for Bing. In the somewhat lengthy interview, Hawking answers questions on his own history, his work at Bing, natural language search, Watson, and Enterprise Search, among other things. At one point he describes how he arrived in the field of information retrieval after studying computer science at the Australian National University, where he the first search engine he encountered was the library’s card catalogue. He says,
“I worked in a number of computer infrastructure support roles at ANU and by 1991 I was in charge of a couple of supercomputers…In order to do a good job of managing a large-scale parallel machine I thought I needed to write a parallel program so I built a kind of parallel grep… I wrote some papers about parallelising text retrieval on supercomputers but I pretty soon decided that text retrieval was more interesting.”
When asked about the challenges of Enterprise Search, Hawking went into detail about the complications that arise due to the “diversity of repositories” as well as issues with access controls. Hawking’s work in search technology can’t be overstated, from his contributions to the Text Retrieval Conferences, CSIRO, FunnelBack in addition to his academic achievements.
Chelsea Kerwin, December 09, 2014
July 2, 2014
Making money from search and content processing is difficult. One company has made a breakthrough. You can learn how Mark Brandon, one of the founders of QBox, is using the darling of the open source search world to craft a robust findability business.
I interviewed Mr. Brandon, a graduate of the University of Texas as Austin, shortly after my return from a short trip to Europe. Compared with the state of European search businesses, Elasticsearch and QBox are on to what diamond miners call a “pipe.”
In the interview, which is part of the Search Wizards Speak series, Mr. Brandon said:
We offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.
Simple enough, but I have ample evidence that dozens and dozens of search and content processing vendors are unable to generate sufficient revenue to stay in business. Many well known firms would go belly up without continual infusions of cash from addled folks with little knowledge of search’s history and a severe case of spreadsheet fever.
Qbox’s approach pivots on Elasticsearch. Mr. Brandon said:
When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on. Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.
When I asked where Mr. Brandon’s vision for functional findablity came from, he told me about an experience he had at Oracle. Oracle owns numerous search systems, ranging from the late 1980s Artificial Linguistics’ system to somewhat newer systems like the late 1990s Endeca system, and the newer technologies from Triple Hop. Combine these with the SES technology and the hybrid InQuira formed from two faltering NLP systems, and Oracle has some hefty investments.
Here’s Mr. Brandon’s moment of insight:
During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, “certainly”, and moments later an Excel spreadsheet popped into my inbox. I was stunned. I asked him if he was aware that “Excel is a Microsoft technology and we are Oracle.” He said, “Yes, of course.” I responded, “Why don’t you just share it with me in the CRM System?” (the CRM was, of course, Siebel, an Oracle product). He chortled and said, “Nobody uses the CRM here.” My head exploded. I gathered my wits to reply back, “Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don’t use it in-house?” He shot back, “It’s slow and unusable, so nobody uses it.” As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for “just middleware buyers”, “just at my accounts”, “in the Northeast”, I could literally go get a cup of coffee and come back before the query was finished. If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the world’s biggest software companies, how bad is it throughout the enterprise?
Stephen E Arnold, July 2, 2014
May 22, 2014
The interview titled Text Analytics 2014: Jeff Catlin, Lexalytics on Breakthrough Analysis may be overstating its case when it is billed as a breakthrough analysis. Most of the questions cover state-of-the-industry topics and Lexalytics promotion. Catlin offers insight into the world of enterprise data and the future of the industry. For example, when asked about new features for 2014 and the near future, Catlin responded,
“As a company, Lexalytics is tackling both the basic improvements and the new features with a major new release, Sallience 6.0 which will be landing sometime in the second half of the year. The core text processing and grammatic parsing of the content will improve significantly, which will in turn enhance all of our core features of the engine. Additionally, this improved grammatic understanding will allow us to be the key to detecting intention, which is the big new feature in Salience 6.0”
Catlin repeats in several of his answers that the industry is in flux, and that vendors can only scramble to keep up, even going so far as to compare 2013 and 2014 enterprise data to the Berlin Wall. He describes two “fronts”, one involving improving core technology, and the other focused on vertical market prospects.
Chelsea Kerwin, May 22, 2014
November 25, 2013
With Google becoming more difficult to use, many professionals need a way to locate, filter, and obtain high value information that works. Silobreaker is an online service and system that delivers actionable information.
The co-founder of Silobreaker said in an exclusive interview for Search Wizards Speaks says:
I learned that in most of the organizations, information was locked in separate silos. The information in those silos was usually kept under close control by the silo manager. My insight was that if software could make available to employees the information in different silos, the organization would reap an enormous gain in productivity. So the idea was to “break” down the the information and knowledge silos that exists within companies, organizations and mindsets.
And knock down barriers the system has. Silobreaker’s popularity is surging. The most enthusiastic supporters of the system come from the intelligence community, law enforcement, analysts, and business intelligence professionals. A user’s query retrieves up-to-the-minute information from Web sources, commercial services, and open source content. The results are available as a series of summaries, full text documents, relationship maps among entities, and other report formats. The user does not have to figure out which item is an advertisement. The Silobreaker system delivers muscle, not fatty tissue.
Mr. Bjore, a former intelligence officer, adds:
Silobreaker is an Internet and a technology company that offers products and services which aggregate, analyze, contextualize and bring meaning to the ever-increasing amount of digital information.
Underscoring the difference between Silobreaker and other online systems, Mr. Bjore points out:
What sets us apart is not only the Silobreaker technology and our commitment to constant innovation. Silobreaker embodies the long term and active experience of having a team of users and developers who can understand the end user environment and challenges. Also, I want to emphasize that our technology is one integrated technology that combines access, content, and actionable outputs.
The ArnoldIT team uses Silobreaker in our intelligence-related work. We include a profile of the system in our lectures about next-generation information gathering and processing systems.
Stephen E Arnold, November 25, 2013
November 4, 2013
We posted a Search Wizards Speak with SearchYourCloud. You can locate the interview at this link. There are more than 60 interviews with experts in search, content processing, and analytics. The collection is available without charge. Why pay the azure chip crowd when you can get information from the folks who bring you information retrieval software and systems?
Stephen E Arnold, November 4, 2013
October 28, 2013
Semantria is a company focused on providing text and sentiment analysis to anyone. The company’s approach is to streamline the analysis of content to that in less than three minutes and for a nominal $1,000, the power of content processing can help answer tough business questions.
The firm’s founder is Oleg Rogynskyy, who has worked at Nstein (now part of Open Text) and Lexalytics. The idea for Semantria blossomed from Mr. Rogynskyy’s insight that text analytics technology was sufficiently mature so that it could be useful to almost any organization or business professionals.
I interviewed Mr. Rogynskyy on October 24, 2013. He told me:
At Semantria, we want to simplify and democratize access to text analytics technology. We want people to be able to get up and running in no time, with a small budget, and actually derive value from our technology. The classic story is you buy a system worth $100k and don’t deploy it.
Semantria focuses on a class of problems that a few years ago would have been outside the reach of many firms. He said:
We make it simple for our clients to solve the following problems: First, some organizations have too much text to read. For example, a Twitter stream or surveys with many responses. Also, there is the need to move quickly and reduce the time to get to market. Many survey results come with an expiry date before they’re irrelevant. Then there is reporting the information. Anyone can use their Excel smarts to build simple/interesting reports and visuals out of unstructured data. But that can take some time, and Semantria accelerates this step. Finally, users need to analyze text with the same impartiality each time. A human might see a glass as half full or half empty, but Semantria will always see a glass with water.
One of the most interesting aspects of Semantria is that the company delivers its solution as a cloud service. Mr. Rogynskyy observed:
We are happily in the cloud, and in the cloud we trust. We have android and iOS software development kits in the works, so whoever wants to talk to our API from mobile devices will be doing it with ease very soon.
You can get more information about Semantria at https://semantria.com.
This interview is one or more than 60 full-text interviews with individuals who are deeply involved in search, content processing, and analytics. You can find the full series at www.arnoldit.com/search-wizards-speak.
Stephen E Arnold, October 28, 2013
July 23, 2013
Stephen E Arnold, July 23, 2013
Sponsored by Xenky
July 3, 2013
SLI Systems is now listed on the New Zealand Exchange. CEO Shaun Ryan shares his thoughts on this and the enterprise search market in this Double Shot Interview that Interest.co.nz has posted to YouTube. In the 17-minute conversation with interviewer Andrew Patterson, Ryan is full of confidence as he shares his thoughts on the future of his company and his industry.
See the interview for more, but here are a few highlights. Ryan acknowledges that his company’s biggest competition is Endeca, who he says is the only company to surpass SLI. They actually found it helpful when Oracle bought Endeca, saying that move opened a “hole in the market.” Interesting.
Customer service is a priority for SLI. Since their business follows a SAS (software-as-a-service) model, customer retention is key, so taking good care of the best ones is “vital,” says Ryan. Besides, the company has gotten some of their best ideas from listening to customer suggestions.
SLI’s decision to go public comes after an average of 30 percent annual growth over last five years. The company considered going the private-venture-capital route, but the best options there would have required a move to the U.S. Though Ryan describes the process of becoming publically listed as difficult (and stresses the importance of a good CFO), he says it was worth it. Patterson asks, How big could the company grow? Ryan responds:
“We see there’s a lot of room for growth in ecommerce. Ecommerce is growing globally, in every country. The U.S. is the world’s largest ecommerce market, but it’s also growing in every country in the world. And you’ll find this, you’re shopping more online, your friends and family are shopping more and more online, that’s just a worldwide phenomenon, so we see there’s a lot of potential. And I’m sure, once you look at it, you’ll notice that search on a lot of websites is really poor, and you’ll sort of get a feel if you go and have a look at a few different websites, you’ll get a feel for how much of a need there is for our sort of services.”
We agree, there is no shortage of retail sites crying for improved search functionality. When asked what SLI hopes to achieve over the next five years, Ryan replies quite sensibly that they hope to continue to grow, pushing into more markets since “the whole world needs better search.” At the moment, SLI serves customers in the U.S., the U.K., Australia, New Zealand, and Brazil. Next in their sites is Japan, but Ryan emphasizes that they get customer requests from a number of other countries.
The interview concludes with Ryan’s thoughts on cultivating New Zealand’s tech industry. His two suggestion: turn out more qualified computer science graduates (that sounds familiar), and celebrate the success of companies who have done well. That is a category in which SLI Systems is happy to claim membership, and they show no signs of slowing down now.
Cynthia Murrell, July 03, 2013
June 17, 2013
The developer of Oorace is Search’XPR. The company has set up operations in New York to complement its two offices in France. You can read an exclusive interview with Jean-Luc Marini. I will explore the idea of software which goes beyond key word retrieval and facets in an upcoming KMWorld column. In the meantime, check out the interview on Search Wizards Speak. SWS is the largest collection of first-person explanations of concepts in search, content processing, and analytics. The entire collection is available from the index at http://arnoldit.com/wordpress/wizards-index/.
Stephen E Arnold, June 17, 2013
Sponsored by Xenky, the portal to ArnoldIT