BD-UCLA (Big Data – UCLA) Seminar

Time: 12:00pm-1:00pm Fridays; Room: 3551P Boelter Hall

*To invite a guest speaker or to schedule a talk, contact Mohan Yang (yang at cs dot ucla dot edu)

Schedule for past.

Schedule for 2014:

1. Evolving Internet into the Future via Named Data Networking (3 parts)

Speaker: Prof. Lixia Zhang

Time: December 6, January 17, January 31


The success of TCP/IP protocol architecture has brought us an explosive growth of Internet applications. Since applications operate in terms of data and more end points become mobile, however, it becomes increasingly difficult and inefficient to satisfy IP’s requirement of determining exactly where (at which IP address) to find desired data. The Named Data Networking project (NDN) aims to carry the Internet into the future through a conceptually simple yet transformational architecture shift, from today’s focus on where — addresses and hosts — to what — the data that users and applications care about. By naming data instead of their locations, NDN transforms data into first-class entities, enabling direct security of data instead of data containers as well as radically scalable communication mechanisms such as multicast delivery and in-network storage.

2. Principles behind ResearchMaps: a web tool for integrating and planning experiments in neuroscience

Speaker: Prof. Alcino Silva

Time: February 21


The growth of the biological literature in the last 30 years has been astronomic. The library of medicine now includes more than 20 million articles; Our own discipline (neuroscience) includes nearly two million research articles with an estimated 15 million experiments , most published in the last 20 years. Therefore, there is a great need to develop maps (simplified abstractions) of published information that could be used to characterize what is known and to guide research decisions. With that goal in mind our laboratory developed a web tool (ResearchMaps) to help biologists with the process of integrating and planning experiments. I will discuss the principles behind this web tool, including the concept of weighted causal networks, with the hope that we may be able to establish collaborations that could accelerate its development.

3. Learning Generative Models for Natural Image Patterns

Speaker: Prof. Ying Nian Wu

Time: February 28


Images of natural scenes contain rich varieties of patterns. Knowledge of these image patterns can be represented by statistics models that can generate such patterns. Such generative models can be learned from training images with minimal supervision, and the learned models can be useful for object recognition and scene understanding. In this talk, I shall present our recent work on a class of generative models of object patterns and explain their connections to sparse linear regression and Markov random fields. The talk is based on the joint work with Jianwen Xie, Wenze Hu and Song-Chun Zhu.

4. Hypothesis Exploration across Disciplines

Speaker: Prof. Stott Parker

Time: March 7


A consequence of the abundance of data of all forms is that scientific research efforts are increasingly cutting across disciplines. Interdisciplinary research is difficult for many reasons, but among these are the difficulties of analyzing heterogeneous data and the lack of methods for collaborative construction of hypotheses. This is particularly true in fields like neuroscience, where the data is complex and ranges over many orders of magnitude in scale — and no single individual can hope to master it all.

In this talk I describe a system for exploration of hypotheses in phenotype data, implemented with a database obtained from several studies at UCLA. ViVA is a web-based system for analyzing hypotheses about variance structure, permitting exploratory analysis of GLMs. It permits visual identification of phenotype profiles (patterns of values across phenotypes) that characterize groups (subpopulations), and includes a variety of methods for visualization of variance. Visualization supports interdisciplinary collaboration, and enables screening and refinement of hypotheses about sets of phenotypes. With several examples we illustrate how this approach supports “natural selection” on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data.

ViVA was designed for investigation of data concerning the biological bases of traits such as memory and response inhibition phenotypes — to explore whether they can aid in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. The hypotheses and data are increasingly trans-disciplinary and sophisticated, and the impact of better methods can be enormous.

 5. Computer Vision meets Big Data: Complexity and Compositionality

Speaker: Prof. Alan L. Yuille

Time: March 14


Big data arises naturally in computer vision because of the enormous number and variety of images and the large range of visual tasks that we want to perform on them. Computer vision researchers must pay increasingly attention to complexity issues as they develop algorithms that work on large image datasets. This talk has two parts. The first part describes practical issues that arise when working with large datasets such as Pascal and ImageNet. These include efficient algorithms, parallel implementations (e.g., GPUs), and special purpose hardware. The second part describes theoretical work that addresses arguably the fundamental problem of vision — how can a visual system store (represent), rapidly access (do inference), and learn the enormous number and variety of objects — and configurations of objects — that occur in the world? We propose and analysis a simplified hierarchical compositional model that can address many of these issues, and which may relate to the structure of the human visual system.

 6. Modeling networks when data is missing or sampled

Speaker: Prof. Mark S. Handcock

Time: April 04


Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors.

Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g. recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data).

In this talk we develop the conceptual and computational theory for inference based on partially observed network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs.

We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network, and of missing data in a friendship network among adolescents.

This is joint work with Krista J. Gile, University of Massachusetts, Amherst and Ian Fellows, University of California – Los Angeles.

 7. The Declarative Imperative: A logic-based approach to better algorithms

Speaker: Prof. Carlo Zaniolo

Time: April 25

The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solutions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in computing history, in part because these languages parallelize naturally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foundation for the next generation of parallel and distributed programming languages. Our recent Datalog inspired results show that logic can lead to new, more efficient big-data algorithms over a wide range of computing platforms.

Leave a Reply