The IEEE provisional copyright policies require the following message be posted:
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Disclaimer

Working Papers


Selected Journal Publications:


  1. (2009) Ali Civril, Malik Magdon-Ismail
    "On Selecting a Maximum Volume Sub-Matrix of a Matrix and Related Problems",
    Theoretical Computer Science,
    Journal Version: postscript, pdf.
    Summary: We study the problem of selecting a set of columns of a matrix with maximum volume. This is useful in trying to construct representative columns of high "numerical rank", which may be of use in matrix reconstruction. We study the algorithmic problem, showing that it is NP-hard and inapproximable. We then study a natural greedy algorithm for this problem, giving a worst case approximation guarantee, also demonstrating a problem which almost achieves this worst case. The performance of greedy is considerably worse than the inapproximability result we prove, suggesting avenues for further work in strengthening the inapproximability result or finding better algorithms.


  2. (2009) Costas Busch, Malik Magdon-Ismail
    "Atomic Routing Games on Maximum Congestion",
    Theoretical Computer Science, (In Press)
    Journal Version: postscript, pdf. (link to journal web version)
    Conference Version: AAIM 2006, ps, pdf.
    Summary: We study atomic routing games on networks in which players choose a path with the objective of minimizing the maximum congestion along the edges of their path. The social cost is the global maximum congestion over all edges in the network. We show that the price of stability is 1. The price of anarchy, PoA, is determined by topological properties of the network. In particular, PoA=O(L+logn), where L is the length of the longest path in the player strategy sets, and n is the size of the network. Further, K-1<=PoA<=c(K^2+log^2 n), where K is the length of the longest cycle in the network, and c is a constant.


  3. (2009) Hung-Ching (Justin) Chen, Mark Goldberg, Malik Magdon-Ismail, William Wallace
    "Reverse Engineering an Agent-Based Hidden Markov Model for Complex Social Systems",
    International Journal of Neural Systems,
    postscript, pdf.
    Summary: We give heuristics for learning the parameters of a HMM describing dynamics of social groups in social networks. The input data are communications for the social network. The HMM is an agent based micro-law model which is highly interdependent. We apply our methodology to real data from blogs, newsgroups.


  4. (2008) Jeffery Baumes, Hung-Ching (Justin) Chen, Matthew Francisco, Mark Goldberg, Malik Magdon-Ismail, William Wallace
    "ViSAGE: A Virtual Laboratory for Simulation and Analysis of Social Group Evolution",
    ACM Transactions on Autonomous and Adaptive Systems (TAAS),
    postscript, pdf.
    Summary: We give a parameterised generative HMM model for social group evolution based on social capital theory, and demonstrate its potential for modeling of social groups in social networks. In particular, one may observe certain abrupt changes in the behavior of macroscopic properties such as the group size distribution as one continuously changes some of the parameters. Within this model we can reproduce many observed phenemona within the social science literature.


  5. (2008) Nathan Cole, Heidi Joe Newberg, Malik Magdon-Ismail, Travis Desell, Kristopher Dawsey, Warren Hayashi, Xinyang (Fred) Liu, Jonathan Purnell, Boleslaw Szymanski, Carlos Varela, James Wisniewski,
    "Maximum Likelihood Fitting of Tidal Streams with application to the Sagittarius Dwarf Tidal Tails",
    the Astrophysical Journal, to appear (2008).
    Journal Version: postscript, pdf.
    Conference Versions:
    eScience 2007 (pdf) . ISMIS 2005 (pdf).
    Summary: We give a maximum likelihood algorithm to identify streams in from SDSS like data. In particular, we use a cylindrical profile for the stream over small length scales and a Hernquist profile for the Milky Way halo. The algorithm finds the stream and separates it from the background halo, thus producing more accurate parameters of the stream and giving the first catalogue of stars extracted from the data to have the stream density profile. In general, we are studying the problem of separating geometric objects in spatial data bases.


  6. (2007) Costas Busch, Malik Magdon-Ismail, Marios Mavronicolas,
    "Efficient Bufferless Packet Switching on Trees and Leveled Networks",
    Journal of Parallel and Distributed Computing, Volume 67 (2007), pages 1168-1186.
    Journal Version: postscript, pdf.
    Conference Versions: EUROPAR 2005 (Leveled), EUROPAR 2004 (Trees),
    ps, pdf. (Leveled)
    ps, pdf. (Trees)
    Summary: We give bufferless hot-potato style scheduling algorithms for trees and leveled networks. For leveled networks, we give centralized and distributed algorithms. The centralized algorithm has routing time within a logarithmic factor of optimal, and the distributed algorithm is a logarithmic factor worse than the centralized. For trees, we give deterministic and randomized algorithms. The deterministic algorithm applies to bounded degree networks and has routing time within a logaritmic factor of optimal. The randomized algorithm applies to arbitrary networks, and has routing time within a log-squared factor from optimal.


  7. (2007) Costas Busch, Malik Magdon-Ismail, Jing Xi,
    "Optimal Oblivious Path Selection on the Mesh",
    (accepted to appear in) IEEE Transactions on Computers.
    Journal Version: postscript, pdf.
    Conference Version: International Parallel and Distributed Processing Symposium (IPDPS), 2005, postscript, pdf.
    Summary: We give an oblivious algorithm for the d-dimensional mesh in which packets select their paths independently of each other. The stretch and congestion are both within O(d^2) from the optimal stretch and congestion attainable by oblivious algorithms, and for oblivious algorithms, the congestion is within a logarithmic factor from the optimal congestion attainable by non-oblivious algorithms. Our algorithm is randomized, and we show that significant randomization is required for any algorithm which attains near optimal congestion. In particular, we show that our algorithm uses at most a factor O(d) more bits per packet than any algorithm which attains a comparable congestion.


  8. (2007) Costas Busch, Malik Magdon-Ismail Fikret Sivrikaya, Bulent Yener
    "Contention-free MAC protocols for asynchronous wireless sensor networks",
    (accepted to appear in) Journal of Distributed Computing
    Journal Version: ps, pdf.
    Conference Version: 18th Annual Conference on Distributed Computing (DISC), 2004, postscript, pdf.
    Summary: We study TDMA-based MAC protocols for asynchronous wireless sensor networks in very harsh environments. Specifically, the protocols are contention-free (avoid collisions), distributed and self-stabilize to topological changes in the network; any topological changes are contained, namely, affect only the nodes in the vicinity of the change; our protocols do not assume that nodes have a global time reference, that is, nodes may not be time-synchronized, and nodes may wake up in an arbitrary order. We also discuss how to accomodate clock skew, slot misalignment and inability for collision detection at a node.


  9. (2007) Volkan Isler, Malik Magdon-Ismail
    "Sensor Selection in Arbitrary Dimension",
    (accepted to appear in) IEEE Transactions on Automation Science and Engineering (TASE)
    ps, pdf.
    Summary: We address the problem of localizing an object in many dimensions by selecting a small number of sensors from which to obtain a measurement. We abstract a sensor measurement as a convex set in R^n and a localization as the intersection of all measurments available to the object. We show that a constant number of sensors is all that is required to obtain a constant factor approximation to the localization error obtained from using all the sensors. Instrumental in our proof is a new construction of an enclosing simplex of a convex polygon with bounded volume.


  10. (2007) Costas Busch, Malik Magdon-Ismail, Marios Mavronicolas,
    "Universal Bufferless Packet Switching",
    Siam Journal on Computing, Volume 37, Issue 4, pages 1139-1162, 2007.
    Journal Version: ps, pdf.
    Conference Version: Workshop in Approximate and On-line Algorithms (WAOA), 2004, postscript, pdf.
    Summary: We give universal packet switching algorithms for bufferless networks, settling an open question regarding the possibility of optimal (to within poly-log factors) bufferless packet switching algorithms. Our argument is constructive and the algorithm is polynomial time. The main idea is to convert a bufferless scheduling problem with prespecified paths in a graph G to a new routing problem in a related graph G'. The problem in G' is solved using buffers, and the solution in G' is emulated in a deterministic bufferless manner in G to give the final bufferless schedule. Buffering in G' is emulated by packet circulation in regions of G.


  11. (2007) Malik Magdon-Ismail, and Joseph Sill
    "A Linear Fit Gets the Correct Monotonicity Directions",
    Machine Learning, Volume 70, Number 1 / January, 2008, pages 21-43.
    Journal Version: postscript , pdf.
    Conference Version: Conference on Learning Theory (COLT), 2003, postscript, pdf.
    Summary: It is often the case that learning with a monotonicity constraint is appropriate, however the directions of the monotonicity constraints may not be available in a general d-dimensional setting. One approach is to use a simple learning model to learn the monotonicity constraints, which can then be applied to constrain a more complex learning model. We show that the optimal linear fit can extract the correct monotonicity directions in one dimension, which remains true in multi-dimensions provided that the input distribution is of a Mahalanobis type. The same results remain true asymptotically when the the OLS estimator is used instead of the optimal linear fit.


  12. (2007) Sibel Adali, Brandeis Hill, Malik Magdon-Ismail,
    "Information vs. Robustness in Rank Aggregation: Models, Algorithms and a Statistical Framework for Evaluation",
    (accepted to appear in) Journal of Digital Information Management (JDIM), special issue on Web Information Retrieval
    pdf.
    Summary: We develop a statistical framework for studying ranker aggregation and a new algorithm for ranker aggregation based on a Kernigan-Lin style iterative best flip approach. We give experimental results for varying amounts of noise in the rankers as well as dis-similarity in the rankers (mis-information). In this space the optimal ranker can vary significantly, illustrating an information vs. robustness tradeoff for ranker aggregation.


  13. (2006) Victor Boyarshinov, Malik Magdon-Ismail,
    "Optimal Boosting of A Pair of Classifiers",
    (accepted to appear in) IEEE Transactions on Neural Networks, November 2006.
    postscript, pdf.
    Summary: We relate the problem of boosting a pair of classifiers to an optimal weighted linear separation problem in two dimensions. We give efficient algorithms to perform this separation and compute the leave one out error. The main idea is to carefully enumerate the possible separators to finally obtain an O(n^2logn) algorithm, where n is the number of data points.


  14. (2006) Costas Busch, Malik Magdon-Ismail, Marios Mavronicolas, Paul Spirakis
    "Direct Routing: Algorithms and Complexity",
    Algorithmica, Volume 45, Number 1, Pages 45--68, June 2006. (Invited submission from ESA 2004).
    Journal Version: postscript , pdf.
    Conference Version: European Symposium on Algorithms (ESA), 2004, postscript , pdf.
    Summary: We consider the direct routing problem, in which packets must follow pre-specified paths in a bufferless network. In this routing model, the only parameter to be determined for a packet is the injection time. We give a general greedy algorithm which has routing time O(C D), and we show that there are routing problems where this is the best achievable by a direct algorithm. We show that optimal direct routing is NP-hard to approximate by gap preserving reduction from graph coloring. We give versions of the greedy algorithm which have optimal or near optimal routing time for trees, multi-dimensional meshes, hypercubes and the butterfly. Finally, we use hard direct routing problems to obtain lower bounds on the buffering requirements of any algorithm that requires packets to follow pre-specified paths. If such algorithms obtain near optimal routing time, then packets must be buffered Omega(N^{1/3}) times (on average), where N is the number of packets.


  15. (2005) Malik Magdon-Ismail, Fikret Sivrikaya, Bulent Yener
    "Joint Problem of Power Optimal Connectivity and Coverage in Wireless Sensor Networks",
    (accepted, to appear in) Wireless Networks.
    postscript , pdf.
    Summary: We model a node using a Markovian automaton which transitions between various modes which offer different capabilities and use power at different states. We develop a mean field theoretic analysis for the behavior of the system, which we optimize with respect to power consumption using connectivity and coverage as constraints. We apply the methodology to a 3 state automaton with off, listen and transmit states. We give results comparing the mean field theory analysis with a simulation, and we compare our approach with existing power saving approaches, which do not necessarily offer connectivity and coverage.


  16. (2005) Victor Boyarshinov, Malik Magdon-Ismail
    "Linear Time Isotonic and Unimodal Regression in the L1 and Linfinity Norms",
    (accepted, to appear in) Journal of Discrete Algorithms.
    postscript , pdf.
    Summary: We give linear time algorithms for (unweighted) isotonic and unimodal regression in the Linfinity norm. For the L1 norm, we give linear time algorithms when the output values are in a bounded, finite set, and an approximation algorithm when the output values are in a bounded range. An open question that remains is to construct linear time isotonic regression in the L1 norm for the arbitrary case.


  17. (2005) Costas Busch, Malik Magdon-Ismail and Mukkai Krishnamoorthy
    "Hardness Results for Cake-Cutting",
    Bulletin of the European Association for Theoretical Computer Science, Number 86, pp. 85-106, June 2005.
    Journal Version: postscript , pdf.
    Conference Version: Symposium on the Theoretical Aspects of Computing (STACS), 2003, postscript, pdf.
    Summary: We consider the algorithm aspects of fair division of a resource, in which limited input from the users may be obtained regarding how much they value a certain part of the resource. If there are n users, it is known that a fair division of the resource (one in which no user thinks they have a less than 1/n of the resource) can be obtained using n logn cuts of the resource (which is modeled as the unit interval). No matching lower bound is available. We take a first step in this direction by showing (through reduction from sorting) that Omega(n logn) comparisons are made by any fair division algorithm. We also consider strong envy-free division (in which no user thinks she has a lesser share than any other user), and give lower bounds of Omega(n^2) on the number of cuts that must be made.


  18. (2004) Malik Magdon-Ismail, Amir F. Atiya,
    "Maximum Drawdown",
    Risk Magazine, Volume 17, Number 10, pages 99-102, October, 2004.
    postscript (preprint) , pdf (preprint).
    Print Version.
    Summary: We use some recent results on the behavior of the Maximum Drawdown of a Brownian motion to develop a scaling law for the Maximum Drawdown which allows one to compare the Sterling/Calmar ratios of trading strategies when the historical data available for the different trading strategies extends over different lengths of time.


  19. (2004) Malik Magdon-Ismail, Amir F. Atiya, Amrit Pratap, and Yaser S. Abu-Mostafa
    "On the Maximum Drawdown of a Brownian Motion",
    Journal of Applied Probability, Volume 41, Number 1, pages 147-161, March, 2004.
    Journal Version: postscript (preprint) , pdf (preprint).
    Print Version.
    Conference Version: Conference on Computational Intelligence for Financial Engineering (CIFEr), 2003, postscript, pdf.
    Summary: We develop the distribution for the Maximum Drawdown (largest drop from a peak to a bottom) of a Brownian motion. We compute the expected value as an infinite integral series. We reduce this function to a single "universal" function, which can be computed numerically once, and used to compute the expected drawdown for any Brownian motion. We compute the asymptotic behavior of this infinite series to reveal three regimes for the behavior of the MDD -- logarithmic, square-root and linear, depending on the sign of the drift parameter in the Brownian motion.


  20. (2003) Malik Magdon-Ismail and Amir F. Atiya
    "A Maximum Likelihood Approach to Variance Estimation of a Brownian Motion Using the High, Low and Close",
    Quantitative Finance, volume 3, issue 5, pages 376 - 384, August 2003.
    Pre-print versions: postscript , pdf.
    Print version
    Summary: Since joint distribution of the high, low and close (conditioned on the open, drift and variance parameters) can be computed, we use this distribution to develop a likelihood for observing the high, low and close conditioned on the drift and variance parameters. By maximizing this likelihood, we construct an estimate of the variance parameter, which we compare to other well known estimators of variance parameter. Maximizing the likelihood yields a systematic gain in performance.


  21. (2002) Malik Magdon-Ismail and Amir F. Atiya
    "Density Estimation and Random Variate Generation Using Multi-Layer Networks",
    IEEE Transactions on Neural Networks, Volume 13, Number 3, pp 497--520, May 2002.
    Journal Version: postscript , pdf.
    Conference Versions:
    Advances in Neural Information Processing Systems (NIPS), 1998, postscript, pdf. (Density Estimation)
    Neural Networks for Signal Processing (NNSP99), 1999 postscript, pdf. (Control Theory Formulation of Random Variate Generation)
    Summary: We consider density estimation and random variate generation using neural networks. For density estimation, we give two methods, both based on learning the sample cumulative distribution function, one is randomized (SLC) and one is a deterministic interpolation using a smoothness constraint. We prove that the randomized algorithm converges to the deterministic version, asymptotically, however practically, randomization offers an additional smoothness property. We prove convergence of the deterministic algorithm to the true density given bounded derivative constraints on the true density.


  22. (2001)Malik Magdon-Ismail
    "The Equivalent Martingale Measure: An Introduction to Pricing Using Expectations",
    IEEE Transactions on Neural Networks, Volume 12, Number 4, pp 684-693, July 2001.
    postscript , pdf.
    Summary: We give an introduction to the Martingale measure, a tool for pricing options, which forml a link between pricing of financial derivatives and Monte Carlo simulation. We use this approach together with a limiting argument to give an elementary derivation of the price of the European call option, and discuss American options.


  23. (2001) Eric Breimer, Mark K. Goldberg, Brian Kolstad and Malik Magdon-Ismail
    "On the Height of a Random Set of Points in a d-Dimensional Unit Cube",
    Journal of Experimental Mathematics, Volume 10, Number 4, pp 583-597, 2001.
    Journal Version: postscript , pdf, source code.
    Conference Version: Workshop on Algorithm Engineering and Experiments (ALENEX), 2001, postscript, pdf.
    Summary: We give give efficient algorithms for estimating the height (maximum antichain) of a set of random points in a d-dimensional unit cube. The algorithms are based on a result we show which states that it suffices to consider only points near the diagonal of the cube. We develop efficient algorithms for generating these points directly, together with an efficient algorithm for constructing the height of this set. We use the co-convergence of estimates from different sized regions around the diagonal to the same value too develop an improved estimate. As a result, we give the first accurate estimates of c_d, the coefficients governing the convergence rate.


  24. (2000) Malik Magdon-Ismail and Amir F. Atiya
    "The Early Restart Algorithm",
    Neural Computation, Volume 12, Number 6, pp 1303-1313, June 2000.
    postscript , pdf.
    Summary: We give an analysis of the early restart phenomenon which can be used to speed up an algorithm whose running time is a function of the initial conditions. For some initial conditions, the runtime may be excessively long or unbounded (for example local minima in an optimization). Given some probability distribution over which the initial conditions are sampled, there is some distribution for the stopping time of the process. We give a analysis of how using restart can improve the expected runtime, and develop a condition for selecting the optimal restart time. Essentially, if the algorithm has run for long enough, it may be favorable to restart the algorithm.


  25. (2000) Malik Magdon-Ismail
    "No Free Lunch for Noise Prediction",
    Neural Computation, Volume 12, Number 3, pp547-565, March 2000.
    postscript , pdf.
    Summary: We show that if the noise is additive, then the prior and posterior distributions of the noise are equal when the prior over target functions is uniform. There is no free lunch in the sense that if no useful assumptions can be made about the target function, then it is not possible gain any extra information regarding the nature of the noise.


  26. (1998) Malik Magdon-Ismail, Alexander Nicholson and Yaser Abu-Mostafa,
    "Financial Markets, Very Noisy Information Processing",
    Proceedings of the IEEE, Special Issue on Intelligent Signal Processing, Volume 86, Number 11, pp 2184-2195, November 1998.
    postscript , pdf.
    addendum to definition A.1, postcript , pdf.
    Summary: We give a general characterization of the performance of a general class of learning systems. Specifically, when the learning system is stable (satisfies certain regularity conditions) and the noise is additive with bounded variance and the input distribution has compact support, the test performance approaches the optimal test performance at a universal rate of O(1/N), where N is the number of data points.


  27. (1999) Zehra Cataltepe, Yaser Abu-Mostafa and Malik Magdon-Ismail
    "No Free Lunch for Early Stopping",
    Neural Computation, Volume 11, Number 4, pp 995-1010, May 1998.
    postscript , pdf.
    Summary: We show that the success of early stopping is an artifact of training dynamics which tends to initialize the weights at smaller values and so the resulting function tends to have smaller magnitude weights, and hence tends to be smoother, which corresponds to having a smoothness prior on the target functions. When no such hidden prior assumption is made, i.e., when early stopping chooses uniformily among all hypotheses with the same (higher than optimal) error, we show that the resulting classifier has performance which is monotonically decreasing in the value of the early stopping error. Thus there is No Free Lunch for early stopping, in that it can only be succesfull if some bias is given toward the target function prior -- which means that some prior on the target function is necessary.


  28. (1998) Malik Magdon-Ismail and Yaser Abu-Mostafa,
    "Validation of Volatility Models",
    Journal of Forecasting, Volume 17, pp 349-368, 1998.
    Journal Version: postscript , pdf.
    Conference Version: Conference on Neural Networks in the Capital Markets (NNCM), 1996. postscript, pdf.
    Summary: We analyse some systematic errors in the prediction of volatility using Maximum Likelihood methods. In particular, there is a systematic underprediction even when the mean is predicted well. Further, Maximum Likelihood methods can systematically select the wrong model over the correct model, when faced with a choice of two models. We develop an expression for a correction factor for systematically correcting for this systematic underprediction.



Working Papers:


  1. () Ali Civril, Malik Magdon-Ismail, Eli Bocek-Rivele
    "SDE: Graph Drawing Using Spectral Distance Embedding",
    Submitted to International Journal on Foundations of Computer Science.
    postscript , pdf.
    Conference Version: Graph Drawing 2005, postscript , pdf.
    Technical Report: postscript, pdf.
    Summary: We give a spectral graph drawing algorithm using a spectral distance matrix reconstruction algorithm. We use a Multi-Dimensional Scaling (MDS) technique to center the coordinates and SVD to recover the coordinates. We show that when the distance matrix is nearly embeddable, the algorithm recovers (up to rotation) a close approximation to the optimal embedding. We show results on a variety of graphs of varying sizes up to about 20,000 nodes. The complexity of the algorithm is in O(|V||E|), which is dominated by an APSP algorithm.


  2. () Victor Boyarshinov, Malik Magdon-Ismail
    "Efficient Computation of Optimal Trading Strategies",
    submitted to Quantitative Finance.
    postscript , pdf.
    Summary: We show how, given two instruments, one can compute return, Sterling and Sharpe optimal strategies efficiently. We also give efficient algorithms for return and Sterling optimal strategies under constraints on the number of trades. For optimizing the risk adjusted measures, we develop a novel approach for optimizing quotients over intervals by relating the problem to convex hull operations on the plane. The ability to efficiently construct optimal strategies allows one to benchmark trading strategies on historical data, benchmark different markets with respect to profitability, and, if one has a model of future market behavior, develop optimal strategies ex-ante.


  3. () Jeffery Baumes, Mark Goldberg, Mykola Hayvanovich Malik Magdon-Ismail, William Wallace, Mohammed Zaki
    "Algorithms for Finding Hidden Group Structure from Communication Data",
    submitted,
    postscript, pdf.
    Conference Versions: Symposium on Intelligence and Security Informatics (ISI), 2004, 2006:
    postscript, pdf (2004); postscript, pdf (2006).
    Summary: We introduce the notion of a planning "hidden" group as one whose members persistently need to communicate, in order to plan. We address the problem of finding such hidden groups based only on the communication data of the form where information on the communication content is not required. Given some measure of the duration of the communication cycle (cyclic hidden groups) over which the hidden group communicates, there are two issues which we study. Efficient algorithms for detecting the hidden group. The statistical significance of a discovered hidden group, based on the fact that the hidden group communications are embedded within a "random" background communication of other members. We give efficient algorithms for discovering hidden groups, even when the interval of activity of the hidden group is not known. Further, find (experimentally) that the statistical reliability of the discovered groups follows the phase transitions in the connectivity of random graphs, as would be expected in assuming that the background is made up of random communications. We then extend the algorithms to streaming hidden groups which have no characteristic cycle. We discuss the more general problem of finding hidden groups that are almost always persistent, and show that it is NP-Hard.