(For seminars in spring 2011 or before, follow this link.)
Speaker: Paul Horn, Harvard University
Abstract: The Kronecker graphs, originally proposed by Leskovec et. al, is a graph model which has been designed to model real-world graphs. These graphs have been shown to have a wide variety of nice properties observed in graph data, and have been successfully fit to many real-world graphs. In this talk, we will look at a specific random variation of the original Kronecker graph and look at the rise of the giant component in this model, obtaining sharp estimates for the size of the component when it exists. An interesting aspect that the analysis must overcome is that the Kronecker graphs do not expand as nicely and uniformly as in the case of the Erdos-Renyi random graphs, in analogy with community clustering in real world networks. A combination of spectral and probabilistic tools are used to establish the result.
Speaker: Jeffry Gaston
Abstract: Traditional supervised learning for training a classfier involves a static set of fully-labeled data. However, labels can be costly to obtain. Active learning often provides lower misclassificiation rates for an equivalent number of labeled datapoints by allowing the learner to choose particularly important points for labeling. Standard paradigms for choosing points to label may be to request points near the decision boundary, or points where uncertainty is maximum. We show how to do directly optimize the target criterion by estimating error reduction directly in a probabilistic framework (the Gaussian Mixture Model) and show that this can lead to the need for significantly fewer labeled training examples in several cases than the maximum uncertainty approach.
Speaker: Mithun Chakraborty
Abstract: Prediction markets – betting platforms designed for aggregating information about some uncertain future event from a population of informants (traders) into a quantitative forecast – have gone from novelties to serious platforms that can be useful for policy and decision-making to scientists, the government, the military etc. One of the key challenges to running prediction markets is to provide liquidity which, roughly speaking, refers to the open interest or ongoing activity level in the market. Traditionally this problem has been tackled by using an algorithmic agent called a market maker: an intermediary that is always willing to take one side of the trade at prices dictated by itself. The talk, which is based on my review of relevant literature, is organized in three parts. First, I will discuss the state of the art in prediction markets: inventory-based or cost function-based market making. This is exemplified by Hanson's Logarithmic Market Scoring Rule (LMSR) which is by far the most widely used prediction market mechanism in practice. I will also touch upon an interesting real-life application (Gates Hillman Prediction Market) and other market makers influenced by LMSR. Next, I will present some important results on the operational and informational aspects of speculative markets (that subsume prediction markets) and discuss a novel information-based market making paradigm that builds on these concepts: Bayesian Market Making. Finally, I will point out how market making can be thought of as a problem of online learning from censored noisy data and will describe a few similar problems from diverse fields such as medicine, finance, and computer vision for this line of research has the potential to influence and be influenced by the area of information-based market making.
Subset Selection for Linear Regression
Abstract: One of the central problems in many data-driven sciences is the right selection of attributes to observe in order to accurately predict a variable of interest. Applications abound in areas as diverse as medical sciences (predicting medical conditions based on observable attributes), social sciences (predicting future behaviors or outcomes) and sensor networks, among others.
In many of these cases, time or cost constraints prohibit sampling more than a few attributes. Also, many application domains use linear regression as a method of prediction, and evaluate the quality of attributes in terms of the fit with the quantity to be predicted. This motivates the following formal problem definition: “Given the covariances between observable variables and a target variable , select of the variables such that the selected set has the best possible fit with Z.”
The main result presented in this talk is that so long as the covariance matrix between the variables is far from singular, greedy algorithms frequently used in practice are provably constant-factor approximations. The proof is based on extending the widely used concept of submodularity to a notion of approximate submodularity, and relating it to the spectrum of the covariance matrix. Furthermore, we will investigate various graph-theoretical properties of covariance matrices which allow for efficient exact or approximate algorithms. We conclude with several exciting open questions.
The Price of Civil Society
Abstract: Most work in algorithmic game theory assumes that players ignore the costs incurred by their fellow players. In this talk, we'll consider superimposing a social network over a game, where players are concerned with minimizing not only their own costs, but also the costs of their neighbors in the network. Our goal is to understand how properties of the underlying game are affected by this alteration to the standard model. In particular, we'll consider the ratio of the social cost of the worst Nash equilibrium when each player cares about both herself and her friends relative to the worst Nash under standard selfish play. We initiate this study in the context of a simple class of games. Counterintuitively, we show that when players become less selfish (optimizing over both themselves and their friends), the resulting outcomes can in fact be worse than they would have been in the base game. We give tight bounds on this degredation in a simple class of load-balancing games, over arbitrary social networks, and present some extensions.
An FPTAS for the Weighted Optimal Length Resolution Refutations of Difference Constraint Systems
Abstract: We study the problem of determining the weighted optimal length resolution refutation (WOLRR) of a system of difference constraints over an integral domain using the Fourier-Motzkin (FM) elimination procedure. Although the FM elimination procedure cannot be applied to integer programs in general, FM is a sound and complete procedure for difference constraint systems (DCS), since the constraint matrix of a DCS is totally unimodular. The literature already established that every DCS has a short refutation; however, computing the weighted optimal length resolution refutation (WOLRR) is NP-complete. In this work, we both present a pseudo-polynomial time algorithm and an efficient fully polynomial-time approximation scheme (FPTAS) for computing the WOLRR of a given weighted difference constraint system (WDCS). We also present some interesting special cases, where the WOLRR of a DCS can be computed in polynomial-time. For an unsatisfiable WDCS, the WOLRR is related to the Minimum Weight Unsatisfiable Subset (MWUS) of the WDCS. Our results directly imply the existence of a pseudo-polynomial time algorithm as well as an FPTAS for the MWUS of a given WDCS. Our work has applications in various areas of research including program verification, proof theory and operations research.
Strategic Pricing in Next-Hop Routing with Elastic Demands
Abstract: We consider a model of next-hop routing by self-interested agents. In this model, nodes in a graph (representing ISPs, Autonomous Systems, etc.) make pricing decisions of how much to charge for forwarding traffic from each of their upstream neighbors, and routing decisions of which downstream neighbors to forward traffic to (i.e., choosing the next hop). Traffic originates at a subset of these nodes that derive a utility when the traffic is routed to its destination node; the traffic demand is elastic and the utility derived from it can be different for different source nodes. Our next-hop routing and pricing model is in sharp contrast with the more common source routing and pricing models, in which the source of traffic determines the entire route from source to destination. For our model, we begin by showing sufficient conditions for prices to result in a Nash equilibrium, and in fact give an efficient algorithm to compute a Nash equilibrium which is as good as the centralized optimum, thus proving that the price of stability is 1. When only a single source node exists, then the price of anarchy is 1 as well, as long as some minor assumptions on player behavior is made. The above results hold for arbitrary convex pricing functions, but with the assumption that the utilities derived from getting traffic to its destination are linear. When utilities can be non-linear functions, we show that Nash equilibrium may not exist, even with simple discrete pricing models.