Oracle Teams with ekiwi

September 8, 2008

ekiwi, based in Provo, Utah, has formed a relationship with Oracle. The company was founded in 2002. It focuses on Web based data extraction. The firm’s Screen-Scraper technology is, the news release asserts, “platform-independent and designed to integrate with virtually any existing information technology system.”

The company describes Screen Scraper this way here:

It consists of a proxy server that allows the contents of HTTP and HTTPS requests to be viewed, and an engine that can be configured to extract information from Web sites using special patterns and regular expressions. It handles authentication, redirects, and cookies, and contains an embedded scripting engine that allows extracted data to be manipulated, written out to a file, or inserted into a database. It can be used with PHP, .NET, ColdFusion, Java, or any COM-friendly language such as Visual Basic or Active Server Pages.

Oracle’s revenues are in the $18 to 20 billion range. ekiwi’s revenues may be more modest. Oracle, however, has turned to ekiwi for screen scraping technology to enhance the content acquisition capabilities of Oracle’s flagship enterprise search system, Secure Enterprise Search 10g or SES10g. In May 2008, one of Oracle’s senior executives told me that SES10g was key player in the enterprise search arena and SES10g sold because it was secure. Security, I recall being told, was the key differentiation.

This deal suggests that SES10g has to turn to up-and-coming screen scraping vendors to expand the capabilities of SES10g. I’m still puzzling over this deal, but that’s clearly my inability to understand the sophisticated management thinking that fuels SES10g to its lofty position among the search and content processing vendors.

The news release makes it clear that e-kiwi can access content from the “deep Web”. This buzzword means to me dynamic, database-driven sites. Google has its “deep Web” technologies which may be in part described in its five Programmable Search Engine patents, published by the USPTO as patent applications, in February 2007.

e-kiwi, which offers a very useful Web log here, is:

…a member of the Oracle PartnerNetwork, has worked with Oracle to develop an adaptor that integrates ekiwi’s Screen Scraper with Oracle Secure Enterprise Search to help significantly expand the amount of enterprise content that can be searched while maintaining existing information access and authorization policies. The Oracle Secure Enterprise Search product provides a secure, easy-to-use enterprise search platform that connects to a broad range of enterprise applications and data sources.

The release continues:

The two technologies have already been coupled in a number of cases that demonstrate their ability to work together. In one instance cell phones from many of the major providers were crawled by Screen-Scraper and indexed by Oracle Secure Enterprise Search. A user shopping for cell phones is then able to search, filter, and browse from a single location the various cell phone models by attributes such as price, form factor, and manufacturer. In yet another case, Screen-Scraper was used to extract forum postings from various photography aficionado web sites. This information was then made available through Oracle Secure Enterprise Search, which made it easy to conduct internal marketing analysis on recently released cameras.

I did some poking around and came up short after a quick look at my files and running a couple of Web searches. Information is located, according to the news story about the deal, here. The url is http//:www.screen-scraper.com/ss4ses/. The link redirected for me to http://www.w3.org/Protocols/. The company’s Web site is at http://www.screen-scraper.com, and it looks like this on September 7, 2008, at 8 pm Eastern:

I am delighted that SES10g can acquire Web-based content in dynamic systems. I remain confused about the functions included with SES10g. My understanding was that SES10g was easily extensible, compatible with Oracle Applications, Fusion, and other Oracle technologies. If this were true, SES10g’s ability to pull content from databased services should be trivial for the firm’s engineering team. I was hoping for an upgrade to SES10g, but that seems not to be in the cards at this time. Scraping Web pages seems to be a higher priority that getting a new release out the door. What’s your understanding of Oracle’s enterprise search strategy? I’m confused. Help me out, please.

Stephen Arnold, September 8, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Enterprise, News, Online (general), Search, Technology, Text analytics, Text processing

Comments

2 Responses to “Oracle Teams with ekiwi”

Todd Wilson on September 12th, 2008 4:06 pm

Hi,

Todd Wilson from screen-scraper here. Thanks much for taking the time to comment on this. I thought I’d continue the conversation a bit.

Not being a part of Oracle I can’t speak much to the long-term strategy for Oracle Secure Enterprise Search, but I can talk about how I think our technology fits with theirs. In partnering with us I believe the primary objective is simply to expand the sources which Oracle SES can index and make searchable. Right now Oracle SES is great at indexing local file systems and certain industry-standard web-based applications, such as Business Object repositories and Documentum content. Ideally, though, Oracle SES would be able to index any web-based content. That’s where we come in. Our tool is flexible enough to work with most any web-based content out there, which essentially results in a universal web connector for Oracle SES. Without technology like ours, it would likely mean hand-coding a separate connector each time a group wanted to index web-based content. Our software still requires customizationm, but cuts down the time required to index a site dramatically.

Regarding your comment on the ease with which Oracle should be able to pull content from databased services, I agree with you, but there are a few points you might consider. First, in a number of cases the Oracle SES administrator may not have direct access to the database containing the content to be indexed (e.g., in a hosted app like Salesforce.com). If no API to the data is available, screen-scraping is a possible way to get at the content. Second, it’s often the case that data displayed in a web page is actually located in several tables across a database. For example, consider the simple example of an address book where a person has multiple addresses and is affiliated with multiple organizations. It may require traversing multiple tables and their relationships in order to get data that’s visible in a single location on a page. Also, data pulled from a database and displayed on a web page has also often gone through a series of business and formatting rules that get it ready for human consumption. When content is indexed, it should be indexed in such a way that makes it usable to people. If you grab the data directly from a web page, all of these rules and data aggregation have already occurred.

There are actually a few other advantages to this approach, which we dilineate in a bit more detail on this page: http://www.screen-scraper.com/ss4ses/ (by the way, you might double-check the two links you have to this URL in your blog entry as they appears to be malformed–in the first case the URL is prepended with the URL to your blog entry, and in the second you’ve inadvertently placed the colon after the forward slashes in the URL).

Thanks again for taking a look at this. I’ll keep an eye on the comments section of this entry, and would be happy to provide other details.

Kind regards,

Todd Wilson
Sara on October 27th, 2009 3:20 pm

Nice post on screen scrapers, simple and too the point :), For screen scrapers i use python for simple things, but for larger projects i used extractingdata.com screen scraper which worked great, they build custom screen scrapers and data extracting programs

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.