
TruData:A trust-based method for truth data discovery
YU Zu-kun, XU Jing-nan, ZHENG Xiao-lin, CHEN De-ren
Systems Engineering - Theory & Practice ›› 2013, Vol. 33 ›› Issue (9) : 2404-2414.
TruData:A trust-based method for truth data discovery
Internet has become an important platform for people to get information, but not all the information from internet is true (or correct). For one thing, many data sources get data from fallible providers or copy from other data sources. For another thing, data sources inevitably make some mistakes in the process of providing data and consequently different data sources frequently provide conflict data. Untrue data can cause a loss to users, so it is important to identify truth data. This work proposed an algorithm to compute trust network of data sources based on the degree of agreement among data sources. Based on the trust network of data sources, this work proposed a trust-based method, called "TruData", by way of computing the trust values of data sources and data items, to discovery truth data. The experiments show that TruData has a good result in identifying truth data for big data set.
trust / truth data discovery / conflict data / data mining {{custom_keyword}} /
[1] Pipino L L, Lee Y W, Wang R Y. Data quality assessment[J]. Communications of the ACM, 2002, 45(4): 211-218.
[2] Batini C, Scannapieca M. Data quality: Concepts, methodologies and techniques[M]. Springer, Heidelberg, 2006: 16-17.
[3] Keast G, Toms E G, Cherry J. Measuring the reputation of web sites: A preliminary exploration[C]//Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, Virginia, United States: ACM, 2001: 77-78.
[4] Yin X X, Han J W, Yu P S. Truth discovery with multiple conflicting information providers on the Web[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796-808.
[5] Galland A, et al. Corroborating information from disagreeing views[C]//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, NY, USA: ACM, 2010: 131-140.
[6] Wu M, Marian A. Corroborating answers from multiple web sources[C]//Proceedings of the 10th International Workshop on Web and Databases, Beijing, China: 2007: 38-4.
[7] Buneman P, Khanna S, Tan W. Data provenance: Some basic issues[J]. Lecture Notes in Computer Science, 2000, 1974: 87-93.
[8] Dai C, Wang T, Zhang P. Survey of data provenance technique[J]. Application Research of Computers, 2010, 27(9): 3215-3221.
[9] Benjelloun O, Sarma A D, Hayworth C, et al. An introduction to ULDBs and the trio system[J]. IEEE Data Engineering Bulletin, 2006, 29(1): 5-16.
[10] Benjelloun O, Sarma A D, Halevy A, et al. Databases with uncertainty and lineage[J]. VLDB Journal, 2008, 17(2): 243-264.
[11] Dai C, Lin D, Bertino E, et al. An approach to evaluate data trustworthiness based on data provenance[C]//Proceedings of the 5th VLDB Workshop on Secure Data Management, Auckland, New Zealand, 2008: 82-98.
[12] Dong X L, Berti-Equille L, Srivastava D. Integrating conflicting data: The role of source dependence[C]//Proceedings of the VLDB Endowment, Lyon, France: 2009, 2(1): 550-561.
[13] Dong X L, Berti-Equille L, Srivastava D, et al. SOLOMON: Seeking the truth via copying detection[C]//Proceedings of the VLDB Endowment, Singapore: 2010, 3(1-2): 1617-1620.
[14] Dong X L, Naumann F. Data fusion: Resolving data conflicts for integration[C]//Proceedings of the VLDB Endowment, Lyon, France, 2009, 2(2): 1654-1655.
[15] Princeton Survey Research Associates International. Leap of faith: Using the internet despite the dangers, results of a national survey of internet users for consumer reports webwatch[R]. Consumer Reports Webwatch, 2005.
[16] Balakrishnan R, Kambhampati S. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement[C]//Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India: ACM, 2011: 227-236.
[17] Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. J ACM, 1999, 46(5): 604-632.
[18] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: Bringing order to the web[R]. Technical Report, Stanford InfoLab, 1998.
/
〈 |
|
〉 |