FeaturedYahoo: A Portion of Its Fantastical Search History
I have a view of Yahoo. Sure, it was formed when I was part of the team that developed The Point (Top 5% of the Internet). Yahoo had a directory. We had a content processing system. We spoke with Yahoo’s David Filo. Yahoo had a vision, he said. We said, No problem.
The Point became part of Lycos, embracing Fuzzy and his round ball chair. Yahoo, well, Yahoo just got bigger and generally went the way of general purpose portals. CEOs came and went. Stakeholders howled and then sulked.
I read or rather looked at “Yahoo. Semantic Search From Document Retrieval to Virtual Assistants.” You can find the PowerPoint “essay” or “revisionist report” on SlideShare. The deck was assembled by the director of research at Yahoo Labs. I don’t think this outfit is into balloons, self driving automobiles, and dealing with complainers at the European Commission. Here’s the link. Keep in mind you may have to sign up with the LinkedIn service in order to do anything nifty with the content.
The premise of the slide deck is that Yahoo is into semantic search. After some stumbles, semantic search started to become a big deal with Google and rich snippets, Bing and its tiles, and Facebook with its Like button and the magical Open Graph Protocol. The OGP has some fascinating uses. My book CyberOSINT can illuminate some of these uses.
And where is Yahoo in the 2008 to 2010 interval when semantic search was abloom? Patience, grasshopper.
Yahoo was chugging along with its Knowledge Graph. If this does not ring a bell, here’s the illustration used in the deck:
The date is 2013, so Yahoo has been busy since Facebook, Google, and Microsoft were semanticizing their worlds. Yahoo has a process in place. Again from the slide deck:
I was reminded of the diagrams created by other search vendors. These particular diagrams echo the descriptions of the now defunct Siderean Software server’s set up. But most content processing systems are more alike than different.
InterviewsCyber Wizards Speak Publishes Exclusive BrightPlanet Interview with William Bushee
Cyber OSINT continues to reshape information access. Traditional keyword search has been supplanted by higher value functions. One of the keystones for systems that push “beyond search” is technology patented and commercialized by BrightPlanet.
A search on Google often returns irrelevant or stale results. How can an organization obtain access to current, in-depth information from Web sites and services not comprehensively indexed by Bing, Google, ISeek, or Yandex?
The answer to the question is to turn to the leader in content harvesting, BrightPlanet. The company was one of the first, if not the first, to develop systems and methods for indexing information ignored by Web indexes which follow links. Founded in 2001, BrightPlanet has emerged as a content processing firm able to make accessible structured and unstructured data ignored, skipped, or not indexed by Bing, Google, and Yandex.
In the BrightPlanet seminar open to law enforcement, intelligence, and security professionals, BrightPlanet said the phrase “Deep Web” is catchy but it does not explain what type of information is available to a person with a Web browser. A familiar example is querying a dynamic database, like an airline for its flight schedule. Other types of “Deep Web” content may require the user to register. Once logged into the system, users can query the content available to a registered user. A service like Bitpipe requires registration and a user name and password each time I want to pull a white paper from the Bitpipe system. BrightPlanet can handle both types of indexing tasks and many more. BrightPlanet’s technology is used by governmental agencies, businesses, and service firms to gather information pertinent to people, places, events, and other topics
In an exclusive interview, William Bushee, the chief executive officer at BrightPlanet, reveals the origins of the BrightPlanet approach. He told Cyber Wizards Speak:
I developed our initial harvest engine. At the time, little work was being done around harvesting. We filed for a number of US Patents applications for our unique systems and methods. We were awarded eight, primarily around the ability to conduct Deep Web harvesting, a term BrightPlanet coined.
The BrightPlanet system is available as a cloud service. Bushee noted:
We have migrated from an on-site license model to a SaaS [software as a service] model. However, the biggest change came after realizing we could not put our customers in charge of conducting their own harvests. We thought we could build the tools and train the customers, but it just didn’t work well at all. We now harvest content on our customers’ behalf for virtually all projects and it has made a huge difference in data quality. And, as I mentioned, we provide supporting engineering and technical services to our clients as required. Underneath, however, we are the same sharply focused, customer centric, technology operation.
The company also offers data as a service. Bushee explained:
We’ve seen many of our customers use our Data-as-a-Service model to increase revenue and customer share by adding new datasets to their current products and service offerings. These additional datasets develop new revenue streams for our customers and allow them to stay competitive maintaining existing customers and gaining new ones altogether. Our Data-as-a-Service offering saves time and money because our customers no longer have to invest development hours into maintaining data harvesting and collection projects internally. Instead, they can access our harvesting technology completely as a service.
The company has accelerated its growth through a partnering program. Bushee stated:
We have partnered with K2 Intelligence to offer a full end-to-end service to financial institutions, combining our harvest and enrichment services with additional analytic engines and K2’s existing team of analysts. Our product offering will be a service monitoring various Deep Web and Dark Web content enriched with other internal data to provide a complete early warning system for institutions.
BrightPlanet has emerged as an excellent resource to specialized content services. In addition to providing a client-defined collection of information, the firm can provide custom-tailored solutions to special content needs involving the Deep Web and specialized content services. The company has an excellent reputation among law enforcement, intelligence, and security professionals. The BrightPlanet technologies can generate a stream of real-time content to individuals, work groups, or other automated systems.
BrightPlanet has offices in Washington, DC, and can be contacted via the BrightPlanet Web site atwww.brightplanet.com.
The complete interview is available at the Cyber Wizards Speak web site at www.xenky.com/brightplanet.
Stephen E Arnold, April 7, 2015
Latest NewsQuote to Note: Search and Its Infancy
Navigate to “Moving Search Forward.” Here’s the Marissa Mayer quote which I highlighted: We firmly believe that search is still in its infancy – and this... Read more »France Cooks Boeuf Google Be Gone
I read “French Senate Backs Bid to Force Google to Disclose Search Algorithm Workings.” The Google is going to be Googley. My hunch is that the GOOG... Read more »ttwick Deal Search
At lunch on Friday, one of the 20 somethings who gnaw at me like locusts in an Illinois corn field, I learned about a “revolutionary”, “Google killing,”... Read more »New Age Fortune Teller Reveals the Secret Google in 2015
I enjoy pundits, poobahs, self appointed experts. I had a heck of a good time reading about what one wizards foretells as Google’s search results trajectory in... Read more »The Law of Moore: Is Information Retrieval an Exception?
I read “Moore’s Law Is Dead, Long Live Moore’s Law.” The “law” cooked up by a chip company suggests that in technology stuff gets better, faster, and... Read more »Gartner VP Claims Researching “Ethical Programming” Necessary for Future of Smart Machines
The article on TweakTown titled Gartner: Smart Machines Must Include Ethical Programming Protocols briefly delves into the necessity of developing ethical programming... Read more »Improving the Preservica Preservation Process
Preservica is a leading program for use in digital preservation, consulting, and research, and now it is compatible with Microsoft SharePoint. ECM Connection has... Read more »Quote to Note: Putting LA Sports News in the NY Times
Here’s a keeper for my quotes to note folder. The source is the New York Times, April 16, 2015, page 8 in the business section (where else?). The article has the... Read more »Digital Reasoning Goes Cognitive
A new coat of paint is capturing some tire kickers’ attention. IBM’s Watson is one of the dray horses pulling the cart containing old school indexing functions... Read more »Exorbyte Pivots and Slows Twitter Stream
I was doing a routine check of search vendor Web sites. I noticed that Exorbyte, a search vendor recognized as a Deloitte Technology Fast 50 company in 2o10, has... Read more »