C. Matuszek, M. Witbrock, R. Kahlert, J. Cabral, D. Schneider, P. Shah and D.B. Lenat. Searching for Common Sense: Populating Cyc from the Web. In Proceedings of the Twentieth National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, July 2005.
The paper proposed a mechanism for automating the process of gathering common sense for Cyc knowledge base from the world wide webs. In contrast to traditional method which Cyc KB gather knowledge from manually entering common sense by experts, the scheme the paper proposed is more efficient than the traditional one.
In the last twenty years, over 3 million facts and rules have been entered manually in the Cyc knowledge base by ontologists. It’s obvious that follow this traditional method to gather knowledge will take lots of efforts from human experts and not to be so efficient. Since world wild web had become more and more popular and were full of large amount of the human knowledge, so designing a mechanism to automatically retrieve common sense knowledge from WWW seems to be so feasible for the Cyc KB. The automating process of gathering knowledge for Cyc KB can be described as follows. First, we will choosing queries that can not be answered by Cyc KB and parse them into search string by using predefined templates. Second, pass all the queries strings for GOOGLE search and later we will get lots of search results. Third, we parse the search results to get the GAFs, then we will using inference to check the consistency between the old facts and the new facts (new GAFs) and discard the inconsistent ones. Later, we re-parse the consistent facts into search string for GOOGLE to check whether or not it can return search results, and we will discard the GAFs that return no search result. Finally , we pass the GOOGLE verified GAFs to human expert for reviewing for the correctness , and the correct facts will be asserted into Cyc KB.The work being done here is immediately useful as a tool that makes human knowledge entry faster , easier , and more effective, but it’s a pity that the queries generated by this mechanism are limitary. Only 233 predefined search templates for 134 binary predicates! It means that the knowledge we can gather was limited to focus on pre-selected 134 binary predicates. Finally, the question I ponder on is that why the experiment developed in the paper compares the non-sampled verified GAFs with the sampled unverified-GAFs, since we can trickily choose the best sampled unverified GAFs to strike out the great performance of the mechanism.