Category Archives: Uncategorized

Publications in 2014

Bin Bi, Yuanyuan Tian, Yannis Sismanis, Andrey Balmin, Junghoo Cho, Scalable Topic-Specific Influence Analysis on Microblogs, In Proceedings of the International ACM WSDM Conference (WSDM), February 2014.

Eric Yi Liu, Andrew P Morgan, Elissa J Chesler, Wei Wang, Gary A Churchill, Fernando Pardo-Manuel de Villena, Starting at the ends: high-resolution sex-specific linkage maps of the mouse reveal polarized distribution of recombination in male germline, Genetics, 2014.

Wei Wang and Zhenyuan Wang, Total Orderings Defined on the Set of All Fuzzy Numbers, Fuzzy Sets and Systems, 2014.

Jason Phillippi, Yuying Xie, Darla R Miller, Timothy A Bell, Zhaojun Zhang, Alan B Lenarcic, David L Aylor, S Harsha Krovi, David W Threadgill, Fernando Pardo-Manuel de Villena, Wei Wang, William Valdar, and Jeffrey A Frelinger, Using the Emerging Collaborative Cross to Probe the Immune System, Genes and Immunology, 2014.

Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang, Searching Dimension Incomplete Databases, IEEE Transactions on Data Engineering (TKDE), 2014

 

Publications in 2010

Michael J. Welch, Junghoo Cho, Walter Chang, Generating Advertising Keywords from Video Content, In Proceedings of the 19th International Conference on Information and Knowledge Management (CIKM), October 2010.

Jun-Seok Heo, Junghoo Cho, Kyu-Young Whang, The Hybrid-Layer Index: A Synergic Approach to Answering Top-k Queries in Arbitrary Subspaces, In Proceedings of the 26th IEEE International Conference on Data Engineering (ICDE), March 2010.

Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. “MapReduce Online.” In In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2010.

Wang, Jeremy, Fernando Pardo-Manuel de Villena, Wei Wang, and Leonard McMillan, Genome-wide compatible SNP intervals and their propertiesProceedings of the ACM International Conference on Bioinformatics and Computational Biology (ACMBCB), pp. 43-52, 2010.

Pakatci, Isa, Wei Wang, and Leonard McMillan, Gene set analysis using principal components, Proceedings of the ACM International Conference on Bioinformatics and Computational Biology (ACMBCB), pp. 330-333, 2010.

Eric Yi Liu, Qi Zhang, Leonard McMillan, Fernando Pardo-Manuel de Villena, andWei Wang, Efficient genome ancestry inference in complex pedigrees with inbreeding, Proceedings of the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)Special Issue of Bioinformatics, vol. 26, no. 12, pp. 199-207, 2010.

Xiang Zhang, Shunping Huang, Fei Zou, and Wei Wang, TEAM: Efficient two-locus epistasis tests in human genome-wide association study, Proceedings of the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)Special Issue of Bioinformatics, vol. 26, no. 12, pp. 217-227, 2010.

Ning Jin, Calvin Young, and Wei Wang, GAIA: Graph classification using evolutionary computation, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 879-890, 2010.

Xiang Zhang, Feng Pan, Yuying Xie, Fei Zou, andWei Wang, COE: a general approach for efficient genome-wide two-locus epistasis test in disease association study, Journal of Computational Biology (JCB), vol. 17, no. 3, pp. 401-415, 2010.

Carlo A. Curino, Hyun J. Moon, Alin Deutsch and Carlo Zaniolo, Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++,PVLDB 4(2), 117-128(2010).

Barzan Mozafari, Kai Zeng, Carlo Zaniolo, From Regular Expressions to Nested Words: Unifying Languages and Query Execution for Relational and XML Sequences. PVLDB 3(1): 150-161 (2010).

Hyun Jin Moon, Carlo Curino and Carlo Zaniolo, Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas. SIGMOD Conference Indianapolis, Indiana, June 6-11, 2010: 207-218.

Arnold C.W., El-Saden S.M., Bui A.A., Taira R., “Clinical Case-based Retrieval Using Latent Topic Analysis,” AMIA Annu Symp Proc. 2010 Nov 13;2010:26-30.

Publications in 2011

Michael Welch, Junghoo Cho, Christopher Olston, Search Result Diversity for Informational Queries, In Proceedings of the 20th International World Wide Web Conference (WWW), March 2011.

Michael J. Welch, Uri Schonfeld, Dan He, Junghoo Cho Topical Semantics of Twitter Links, In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), February 2011.

Tuan M. V. Le, Tru H. Cao, Son M. Hoang, Junghoo Cho Ontology-based proximity search, In Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services (iiWAS), December 2011.

Markus Weimer, Tyson Condie and Raghu Ramakrishnan. Machine learning in a higher order cloud computing language. In BigLearn workshop on parallel and large-scale machine learning (NIPS), 2011.

Niketan Pansare, Vinayak R. Borkar, Chris Jermaine, Tyson Condie. Online Aggregation for Large MapReduce Jobs. In International Conference on Very Large Data Bases (VLDB), 2011.

Wei Cheng, Xiaochuan Ni, Jian-Tao Sun, Xiaoming Jin, Hye-Chung Kum, Xiang Zhang, and Wei Wang, Measuring opinion relevance in latent topic space, Proceedings of the IEEE International Conference on Social Computing (SocialCom), pp. 323-330, 2011.

Summer G Goodson, Zhaojun Zhang, James K Tsuruta, Wei Wang, and Deborah A O’Brien, Classification of mouse sperm motility patterns using an automated multiclass support vector machines mode, Biology of Reproduction, vol. 84, no. 6, pp. 1207-1215, 2011.

Eric Yi Liu, Zhaojun Zhang, and Wei Wang, Clustering with relative constraints, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 947-955, 2011.

Aylor DL, Valdar W, Foulds-Mathes W, Buus RJ, Verdugo RA, Baric RS, Ferris MT, Frelinger JA, Heise M, Frieman MB, Gralinski LE, Bell TA, Didion JD, Hua K, Nehrenberg DL, Powell CL, Steigerwalt J, Xie Y, Kelada SN, Collins FS, Yang IV, Schwartz DA, Branstetter LA, Chesler EJ, Miller DR, Spence J, Liu EY, McMillan L, Sarkar A, Wang J, Wang W, Zhang Q, Broman KW, Korstanje R, Durrant C, Mott R, Iraqi FA, Pomp D, Threadgill D, Pardo-Manuel de Villena F, Churchill GA. Genetic analysis of complex traits in the emerging collaborative cross, Genome Research, vol. 21, pp. 1213-1222, 2011.

Ning Jin and Wei Wang, LTS: Discriminative subgraph mining by learning from search history, Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE), pp. 207-218, 2011.

Xiang Zhang, Shunping Huang, Fei Zou, and Wei Wang, Tools for efficient epistasis detection in genome-wide association study, Source Code for Biology and Medicine, vol. 6, no. 1, pp. 1-3, 2011.

Yan-Nei Law, Haixun Wang, and Carlo Zaniolo, Relational Languages and Data Models for Continuous Querieson Sequences and Data Streams. ACM Trans. Datab. Syst. 36, 2, Article 8 (May 2011).

Hamid Mousavi, Carlo Zaniolo, Fast and Accurate Computation of Equi-Depth Histograms over Data Streams. EDBT 2011: 69-80.

Hetal Thakkar, Nikolay Laptev, Hamid Mousavi, Barzan Mozafari, Vincenzo Russo, Carlo Zaniolo, SMM: A data stream management system for Knowledge Discovery.ICDE 2011: 757-768.

Singleton K.W., Lan M., Arnold C., Vahidi M., Arangua L., Gelberg L., Bui A.A., “Wireless Data Collection of Self-administered Surveys using Tablet Computers,” AMIA Annu Symp Proc. 2011;2011:1261-9. Epub 2011 Oct 22

Publications in 2012

Youngchul Cha, Junghoo Cho, Social-network analysis using topic models. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), August 2012.

Chu-Cheng Hsieh, Junghoo Cho, Finding similar items by leveraging social tag clouds. In Proceedings of the ACM Symposium on Applied Computing (SAC),March 2012

Yingyi Bu, Vinayak Borkar, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer and Raghu Ramakrishnan. “Scaling Datalog for Machine Learning on Big Data.” Tech Report (arXiv:1203.0160), 2012

Xiang Zhang, Shunping Huang, Zhaojun Zhang, Wei Wang, Mining Genome-Wide Genetic Markers, PLOS Computation Biology, vol. 8, no. 12, e1002828, 2012.

Eric Yi Liu, Zhishan Guo, Xiang Zhang, Vladimir Jojic, and Wei Wang, Metric learning from relative comparisons by minimizing squared residual, Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), pp. 978-983, 2012.

Wei Cheng, Xiang Zhang, Feng Pan, and Wei Wang, Hierarchical co-clustering based on entropy splitting, Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM), pp. 1472-1476, 2012.

Wei Cheng, Xiang Zhang, Wei Wang, Yubao Wu, Xiaolin Yin, Jing Li and David Heckerman, Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model, Proceedings of the ACM International Conference on Bioinformatics and Computational Biology (ACMBCB),pp. 466-472, 2012.

Xiang Zhang, Wei Cheng, Jennifer Listgarten, Carl Kadie, Shunping Huang,Wei Wang, and David Heckerman, Learning transcriptional regulatory relationships using sparse graphical models, PLoS ONE, vol. 7, no. 5, e357622012, 2012.

Eric Yi Liu, Steven Buyske, Aaron K. Aragaki, Ulrike Peters, Eric Boerwinkle, Chris Carlson, Cara Carty, Dana C. Crawford, Jeff Haessler, Lucia A. Hindorff, Loic Le Marchand, Teri A. Manolio, Tara Matise, Wei Wang, Charles Kooperberg, Kari E. North, and Yun Li, Genotype imputation of Metabochip SNPs using a study-specific reference panel of ~4,000 haplotypes in African Americans from the Women′s Health Initiative, Genetic Epidemiology, vol. 36, no. 2, pp.107-117, 2012.

Xiang Zhang, Shunping Huang, Wei Sun, andWei Wang, Rapid and robust resampling-based multiple testing correction with application in genome-wide eQTL study, Genetics, vol. 190, no. 4, pp. 1511-1520, 2012.

James J Crowley, Yunjung Kim, Jin Peng Szatkiewicz, Amanda L Pratt, Corey R Quackenbush, Daniel E Adkins, Edwin van den Oord, Molly A Bogue, Hyuna Yang, Wei Wang, David W Threadgill, Fernando Pardo-Manuel de Villena, Howard L McLeod, and Patrick F Sullivan, Genome-wide association mapping of loci for antipsychotic-induced extrapyramidal symptoms in mice, Mammalian Genome, vol. 23, no. 5-6, pp. 322-335, 2012.

Mingsheng Long, Jianmin Wang, Guiguang Ding, Wei Cheng, Xiang Zhang, and Wei Wang, Dual transfer learning, Proceedings of the 12th SIAM International Conference on Data Mining (SDM), pp. 540-551, 2012.

Kai Xia, Andrey A Shabalin, Shunping Huang, Vered Madar, Yi-Hui Zhou, Wei Wang, Fei Zou, Wei Sun, Patrick F Sullivan, Fred A Wright, seeQTL: a searchable database for human eQTLs, Bioinformatics, vol. 28, no. 3, pp. 451-452, 2012.

Collaborative Cross Consortium, The genome architecture of the Collaborative Cross mouse genetic reference population, Genetics, vol. 190, no. 2, pp. 389-401, 2012.

Zhaojun Zhang, Xiang Zhang, and Wei Wang, HTreeQA: using semi-perfect phylogeny trees in quantitative trait loci study on genotype data, G3: Genes, Genomes, Genetics, vol. 2, no. 2, pp. 175-189, 2012.

Mirjana Mazuran, Edoardo Serra, Carlo Zaniolo, Extending the Power of Datalog Recursion. VLDB Journal, accepted November 2012.

Nikolay Laptev, Carlo Zaniolo and Tsai-Ching Lu, BOOT-TS: A Scalable Bootstrap for Massive Time-Series Data. Big Learning: NIPS 2012 Workshop. December 8, Lake Tahoe, Nevada, USA.

Nikolay Laptev, Kai Zeng and Carlo Zaniolo, Early Accurate Results for Advanced Analytics on MapReduce. PVLDB 5(10): 1028-1039 (2012).

Shi Gao, Carlo Zaniolo, Supporting Database Provenance under Schema Evolution. ER Workshops 2012: 67-77.

Carlo Zaniolo, Logical Foundations of Continuous Query Languages for Data Streams.Datalog 2012: 177-189.

Shi Gao and Carlo Zaniolo, Provenance Management in Databases Under Schema Evolution. 4th USENIX Workshop on the theory and practice of provenance. June 14-15, Boston MA.

Nikolay Laptev, Kai Zeng, Carlo Zaniolo, Early Accurate Results for Advanced Analytics on MapReduce. PVLDB 5(10): 1028-1039 (2012).

Braverman, Rafail Ostrovsky, Carlo Zaniolo, Optimal Sampling From Sliding Windows.J. Comput. Syst. Sci., 78(1): 260-272 (2012).

Maurizio Atzori, Carlo Zaniolo, SWiPE: searching wikipedia by example. WWW (Companion Volume) 2012: 309-312.

Barzan Mozafari, Kai Zeng, Carlo Zaniolo, High-performance complex event processing over XML streams. SIGMOD Conference 2012: 253-264.

Nikolay Laptev, Carlo Zaniolo, Optimization of Massive Pattern Queries by Dynamic Configuration Morphing. ICDE 2012: 917-928

Publications in 2013

Jun-Seok Heo, Junghoo Cho, Kyu-Young Whang, Subspace top-k query processing using the hybrid-layer index with a tight bound. Data and Knowledge Engineering, 83: 1-19 (2013).

Youngchul Cha, Bin Bi, Chu-Cheng Hsieh, Junghoo Cho, Incorporating Popularity in Topic Models for Social Network Analysis, In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), July 2013.

Wei Cheng, Wei Wang, and Sandra Batista, Grid-based Clustering, Data Clustering: Algorithms and Applications Chapter 6, by Charu C. Aggarwal and Chandan K. Reddy, CRC Press, 2013.

Zhaojun Zhang, Shunping Huang, Jack Wang, Xiang Zhang, Fernando Pardo Manuel de Villena, Leonard McMillan, and Wei Wang, GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment, Proceedings of the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB), Special Issue of Bioinformatics, 2013.

Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang, Searching Dimension Incomplete Databases, IEEE Transactions on Data Engineering (TKDE), 2013.

Jin Szatkiewicz, Weibo Wang, Patrick Sullivan,Wei Wang, and Wei Sun, Improving detection of copy number variation by simultaneous bias correction and read-depth segmentation, Nucleic Acids Research, vol. 41, no. 3, pp. 1519-1532, 2013.

Eric Yi Liu, Mingyao Li, Wei Wang, and Yun Li, MaCH-Admix: genotype imputation for admixed populations, Genetic Epidemiology, vol. 37, no. 1, pp. 25-37, 2013.

Alexander Shkapsky, Kai Zeng, Carlo Zaniolo, Graph Queries in a Next-Generation Datalog System. PVLDB 6(12): 1258-1261 (2013).

Hamid Mousavi, Shi Gao, Carlo Zaniolo, IBminer: A Text Mining Tool for Constructing and Populating InfoBox Databases and Knowledge Bases. PVLDB 6(12): 1330-1333 (2013).

Mirjana Mazuran, Edoardo Serra, Carlo Zaniolo, A declarative extension of horn clauses, and its significance for datalog and its applications. TPLP 13(4-5): 609-623 (2013).

Carlo Curino, Hyun Jin Moon, Alin Deutsch, Carlo Zaniolo, Automating the database schema evolution process. VLDB J. 22(1): 73-98 (2013).

Mirjana Mazuran, Edoardo Serra, Carlo Zaniolo, Extending the power of Datalog recursion. VLDB J. 22(4): 471-493 (2013).

Nikolay Laptev, Kai Zeng, Carlo Zaniolo, Very fast estimation for result and accuracy of big data analytics: The EARL system. ICDE 2013: 1296-1299.

Kai Zeng, Mohan Yang, Barzan Mozafari, Carlo Zaniolo, Complex pattern matching in complex structures: The XSeq approach. ICDE 2013: 1328-1331.

Elio Masciari, Shi Gao, Carlo Zaniolo, Sequential pattern mining from trajectory data. IDEAS 2013: 162-167.

Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo, A New, Fast and Accurate Algorithm for Hierarchical Clustering on Euclidean Distances. PAKDD (2) 2013: 111-122.

Hamid Mousavi, Carlo Zaniolo, Fast computation of approximate biased histograms on sliding windows over Data Streams. SSDBM 2013: 13 (best Paper award).

 

Welcome to the Scalable Analytics Institute!

The vast volume of data produced every day is creating major transformative opportunities in science and industry. The bottleneck limiting progress in science and industrial productivity has now shifted from the generation of massive datasets to their interpretation and exploitation by the sophisticated analytical applications needed to extract actionable knowledge from massive data sets. High‐performance scalable analytics are needed to tame the ever‐growing volume of data and application complexity. The UCLA Henry Samueli School of Engineering and Applied Sciences (SEAS) launched in 2013 the Scalable Analytics Institute (ScAI) to address the research challenges and opportunities in the new technology area of Big Data. It currently has more than 20 Ph.D. students and research scientists.