| This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. |
|
    
Publication List (pdf)
     Talks (and Posters)      |
2009; 2008; 2007; 2006; 2005; 2004; 2003; 2002; 2001; 2000; 1999 and before.
Publications:
2009:
"On Selecting a Maximum Volume Sub-Matrix of a Matrix and Related
Problems",
Theoretical Computer Science,
Journal Version:
postscript,
pdf.
Summary:
We study the problem of selecting a set of columns of a matrix with maximum
volume. This is useful in trying to construct representative columns of
high "numerical rank", which may be of use in matrix reconstruction.
We study the algorithmic problem, showing that it is NP-hard and
inapproximable.
We then study a natural greedy algorithm for this problem, giving a worst
case approximation guarantee, also demonstrating a problem which almost
achieves this worst case. The performance of greedy is considerably
worse than the inapproximability result we prove, suggesting avenues for
further work in strengthening the inapproximability result or
finding better algorithms.
"Atomic Routing Games on Maximum Congestion",
Theoretical Computer Science, (In Press)
Journal Version:
postscript,
pdf.
(link to journal web version)
Conference Version: AAIM 2006,
ps,
pdf.
Summary:
We study atomic routing games on networks in which players choose
a path with the objective of minimizing the maximum congestion along
the edges of their path. The social cost is the global maximum
congestion over all edges in the network. We show that the price of
stability is 1. The price of anarchy, PoA, is determined by topological
properties of the network. In particular, PoA=O(L+logn),
where L is the length of the longest path in the player strategy
sets, and n is the size of the network. Further,
K-1<=PoA<=c(K^2+log^2 n), where
K is the length of the longest cycle in the network,
and c is a constant.
"Learning American English Accents Using Ensemble Learning with GMMs",
Proc. 8th International Conference on Machine Learning and
Applications (ICMLA),
pages (to appear), 2009.
pdf.
Slides:
pdf.
"Pricing the American Option using Reconfigurable Hardware",
Proc. 7th IEEE/IFIP Conference on Embedded and Ubiquituous Computing (EUC-2009)
pages 532-536, 2009.
pdf.
Slides:
pdf.
"Models of Communication Dynamics for Simulation of Information Diffusion",
Proceedings of Advances in Social Networks Analysis and Mining (ASONAM 2009),
pages 44-49, 2009.
pdf.
Slides:
ppt.
"graphOnt: An Ontology Based Library for Conversion from Semantic Graphs to JUNG",
Proceedings 2009 IEEE International Conference on Intelligence and Security Informatics, pages 170-172, 2009.
pdf.
Slides:
pdf.
"Stability of Individual and Group Behavior in a Blog Network",
Proceedings 2009 IEEE International Conference on Intelligence and Security Informatics, pages 7-12, 2009.
pdf.
Slides:
pdf.
"The Impact of Changes in Network Structure on
Diffusion of Warnings",
Proc. Workshop on Analysis of
Dynamic Networks (SIAM International Conference on Data Mining),
pages, 2009.
pdf.
Slides:
pdf.
2008:
"Reverse Engineering an Agent-Based Hidden Markov Model for
Complex Social Systems",
International Journal of Neural Systems,
postscript,
pdf.
Summary:
We give heuristics for learning the parameters of a HMM describing
dynamics of social groups in social networks. The input data are
communications for the social network. The HMM is an agent based micro-law
model which is highly interdependent. We apply our methodology
to real data from blogs, newsgroups.
"ViSAGE:
A Virtual Laboratory for Simulation and Analysis of Social Group Evolution",
ACM Transactions on Autonomous and Adaptive Systems (TAAS),
postscript,
pdf.
Summary:
We give a parameterised
generative HMM model for social group evolution based on social
capital theory,
and demonstrate its potential for modeling of social groups in social
networks. In particular, one may
observe certain abrupt changes in the behavior of macroscopic properties
such as the group size distribution as one continuously changes some of
the parameters. Within this model we can reproduce many observed phenemona
within the social science literature.
"Maximum Likelihood Fitting of Tidal Streams with application to the
Sagittarius Dwarf Tidal Tails",
the Astrophysical Journal, to appear (2008).
Journal Version:
postscript,
pdf.
Conference Versions:
eScience 2007 (pdf) .
ISMIS 2005 (pdf).
Summary:
We give a maximum likelihood algorithm to identify streams in from
SDSS like data. In particular, we use a cylindrical profile
for the stream over small length scales and a Hernquist profile for the
Milky Way halo. The algorithm finds the stream and separates it from the
background halo, thus producing more accurate parameters of the
stream and giving the first catalogue of stars extracted from the
data to have the stream density profile. In general, we are studying the
problem of separating geometric objects in spatial data bases.
"Deterministic Sparse Column Based
Matrix Reconstruction via Greedy Approximation of SVD",
Proc. 19th International Symposium on Algorithms and
Computation (ISAAC), pages, December
2008.
pdf.
Slides:
pdf.
"Collective Wisdom: Information growth in Wikis and Blogs",
NIPS Workshop
"Beyond Search: Computational Intelligence for the Web", December
2008.
ps,
pdf.
Poster:
pdf.
"Adapting to a Market Shock: Optimal Sequential Market-Making",
Proc. Advances in Neural Information Processing Systems (NIPS), pages, December
2008.
pdf.
Poster:
pdf.
"Communication Dynamics of Blog Networks",
Proc. SIGKDD Workshop on Social Network Mining and Analysis,
pages
August, 2008.
pdf.
Slides:
pdf.
"Stable Statistics of the Blogograph",
Proc.Interdisciplinary Studies in
Information Privacy and Security (ISIPS),
pages, 2008.
pdf.
Slides:
pdf.
"Micro-Simulation of Diffusion on Warnings",
Proc. 5th Int. Conf. on Information Systems
for Crisis Response and Management
ISCRAM, pages 424-430, 2008.
pdf.
Slides:
pdf.
"Discovery, analysis and monitoring
of hidden social networks and their evolution",
Proc. IEEE Conference on Technologies for Homeland Security,
pages 1--6
May, Boston, 2008.
pdf.
Slides:
pdf.
"A Generative Model for Statistical Determination of Information Content
from Conversation Threads",
Proc. International Conference on Intelligencs and Security Informatics
(ISI), pages 331--342
June 17-20, Taipei, Taiwan, 2008.
pdf.
Slides:
pdf.
"A Locality Model of the Evolution of Blog Networks",
Proc. International Conference on Intelligencs and Security Informatics
(ISI), pages 191--193
June 17-20, Taipei, Taiwan, 2008.
pdf.
Slides:
ppt.
"Discovering Hidden Groups in Communication Networks",
Invited Book Chapter, in "Security Informatics and Terrorism: Patrolling the Web",NATO
Science for Peace and Security Series, Sub-series D: Information and
Communication Security, eds.Cecilia S. Gal and Paul B. Kantor and Bracha Shapira, NATO, pages 82--108.
postscript
, pdf.
2007:
"Efficient Bufferless Packet Switching on Trees and Leveled Networks",
Journal of Parallel and Distributed Computing, Volume 67 (2007), pages
1168-1186.
Journal Version:
postscript,
pdf.
Conference Versions: EUROPAR 2005 (Leveled), EUROPAR 2004 (Trees),
ps,
pdf.
(Leveled)
ps,
pdf.
(Trees)
Summary:
We give bufferless hot-potato style scheduling algorithms for
trees and leveled networks.
For leveled networks, we give centralized and distributed
algorithms. The centralized algorithm
has routing time within a logarithmic factor of optimal,
and the distributed algorithm is a logarithmic factor worse
than the centralized. For trees, we give deterministic and
randomized algorithms. The deterministic algorithm applies to bounded
degree networks and has routing time within a logaritmic factor
of optimal. The randomized algorithm applies to arbitrary networks,
and has routing time within a log-squared factor from optimal.
"Optimal Oblivious Path Selection on the Mesh",
(accepted to appear in) IEEE Transactions on Computers.
Journal Version:
postscript,
pdf.
Conference Version:
International Parallel and Distributed Processing Symposium (IPDPS), 2005,
postscript,
pdf.
Summary:
We give an oblivious algorithm for the d-dimensional mesh in
which packets select their paths
independently of each other. The stretch and congestion are both
within O(d^2) from the
optimal stretch and congestion attainable by oblivious algorithms,
and for oblivious algorithms, the congestion
is within a logarithmic factor
from the optimal congestion attainable by non-oblivious algorithms.
Our algorithm is randomized, and we show that
significant randomization is required for any algorithm which attains
near optimal congestion. In particular, we show
that our algorithm uses at most a factor O(d) more bits per
packet than any algorithm which attains a comparable congestion.
"Contention-free MAC protocols for asynchronous wireless sensor networks",
(accepted to appear in)
Journal of Distributed Computing
Journal Version:
ps,
pdf.
Conference Version:
18th Annual Conference on Distributed Computing (DISC),
2004,
postscript,
pdf.
Summary:
We study TDMA-based MAC protocols for
asynchronous wireless sensor networks in very harsh environments.
Specifically, the protocols are contention-free
(avoid collisions),
distributed
and self-stabilize to
topological changes in the network;
any
topological changes
are contained, namely,
affect only the nodes in the vicinity of the change;
our protocols do not assume that nodes have a global time reference,
that is, nodes may not be time-synchronized, and nodes may wake up
in an arbitrary order.
We also discuss how to accomodate clock skew, slot misalignment and inability
for collision detection at a node.
"Sensor Selection in Arbitrary Dimension",
(accepted to appear in)
IEEE Transactions on Automation Science and Engineering (TASE)
ps,
pdf.
Summary:
We address the problem of localizing an object in many dimensions
by selecting a small number of sensors from which to obtain a measurement.
We abstract a sensor measurement as a convex set in R^n and a localization
as the intersection of all measurments available to the object.
We show that a constant number of sensors is all that is required
to obtain a constant factor approximation to the localization error
obtained from using all the sensors. Instrumental in our
proof is a new construction of an enclosing simplex
of a convex polygon with bounded volume.
"Universal Bufferless Packet Switching",
Siam Journal on Computing, Volume 37, Issue 4, pages 1139-1162, 2007.
Journal Version:
ps,
pdf.
Conference Version:
Workshop in Approximate and On-line Algorithms (WAOA),
2004,
postscript,
pdf.
Summary:
We give universal packet switching algorithms for bufferless networks,
settling an open question regarding the possibility of
optimal (to within poly-log factors) bufferless packet switching algorithms.
Our argument is constructive and the algorithm is polynomial time.
The main idea is to convert a bufferless scheduling problem with
prespecified paths in a graph G to a
new routing problem in a related graph G'.
The problem in G' is solved using buffers, and the
solution in G'
is emulated in a deterministic bufferless manner in G
to give the final bufferless schedule. Buffering in G' is emulated by
packet circulation in regions of G.
"A Linear Fit Gets the Correct Monotonicity Directions",
Machine Learning,
Volume 70, Number 1 / January, 2008, pages 21-43.
Journal Version:
postscript
, pdf.
Conference Version: Conference on Learning Theory (COLT), 2003,
postscript,
pdf.
Summary:
It is often the case that learning with a monotonicity constraint is
appropriate, however the directions of the monotonicity constraints
may not be available in a general d-dimensional setting. One approach is
to use a simple learning model to learn the monotonicity constraints, which can
then be applied to constrain a more complex learning model. We show that
the optimal linear fit can extract the correct monotonicity directions in one
dimension, which remains true in multi-dimensions provided that the
input distribution is of a Mahalanobis type. The same results remain true
asymptotically when the the OLS estimator is used instead of the
optimal linear fit.
"Information vs. Robustness in Rank Aggregation:
Models, Algorithms and a Statistical Framework for
Evaluation",
(accepted to appear in)
Journal of Digital Information Management (JDIM), special issue on
Web Information Retrieval
pdf.
Summary:
We develop a statistical framework for studying ranker aggregation
and a new algorithm for ranker aggregation based on a Kernigan-Lin
style iterative best flip approach. We give experimental results
for varying amounts of noise in the rankers as well as dis-similarity in the
rankers (mis-information). In this space the optimal ranker can
vary significantly, illustrating an information vs. robustness tradeoff for
ranker aggregation.
"Reverse Engineering an Agent-Based
Hidden Markov Model for Complex Social Systems",
Proc. International Conference on Intelligent
Data Engineering and Automated Learning (IDEAL), pages 940-949,
December 16-19, Birmingham, UK, 2007.
postscript,
pdf.
Slides:
postscript,
pdf.
"Discover The Power of Social and Hidden Curriculum
to Decision Making: Experiments with Enron Email and Movie Newsgroups",
Proc. International Conference on Machine
Learning and Applications (ICMLA), pages 154--159,
December 13-15, 2007, Cincinnati, Ohio, 2007.
postscript,
pdf.
Slides:
postscript,
pdf.
"Distributed and Generic Maximum Likelihood Evaluation",
[Best paper Award, 2nd place]
Proc. 3rd International Conference on e-Science and Grid
Computing, pp ,
Bangalore, India, Dec 10-13, 2007.
pdf.
Slides:
pdf.
"Learning What Makes a Society Tick",
Proc. Worksop on
Optimization-based Data Mining Techniques with Applications
at the Seventh IEEE International Conference on Data Mining (ICDM),
pages 195--200,
October 28-31, Omaha, Nebraska, 2007.
postscript,
pdf.
Slides:
postscript,
pdf.
"ASAND: Asynchronous Slot Assignment and Neighbor Discovery
Protocol for Wireless Networks",
OPNETWORK 2007,
August 27-31,Washington DC.
postscript,
pdf.
Slides:
powerpoint,
"Inferring Agent Dynamics from Social Communication Network",
Proc. Joint 9th Web Mining (WebKDD) and
1st Social Network Analysis (SNA-KDD) Workshop and KDD 2007,
August 12-15, San Jose, CA, 2007.
postscript,
pdf.
Slides:
ppt,
"Extracting Agent Based Models from a Social Communication Network",
Conference of Agent-based Modelers and Agent-based Modeling
Platform Users (SwarmFest 2007), July 12-14, Chicago, IL USA, 2007.
abstract(pdf),
Slides:
ppt,
"Strategies for Cleaning Organizational Emails with an Application to Enron
Email Dataset",
5th Conf. of North American Association for Computational Social
and Organizational Science (NAACSOS 07), Emory - Atlanta, Georgia , USA.
June 7-9, 2007.
postscript,
pdf.
Slides:
ppt,
"SIGHTS: A Software System for Finding Coalitions and Leaders in a
Social Network",
Proc. 5th Conference on Intelligence and Security Informatics
(ISI), May 23-24, Rutgers, NJ, 2007.
postscript,
pdf.
Slides:
postscript,
pdf.
"Identification of Hidden Groups in Communications",
Invited Book Chapter, in "Handbooks in Information
Systems 2 (National Security)", Elsevier,
eds. , pages 209--242.
postscript
, pdf.
Technical Report
2006:
"Optimal Boosting of A Pair of Classifiers",
(accepted to appear in) IEEE Transactions on Neural Networks, November 2006.
postscript,
pdf.
Summary:
We relate the problem of boosting a pair of classifiers to an
optimal weighted linear separation problem in
two dimensions. We give efficient algorithms to perform this
separation and compute the leave one out error.
The main idea is to carefully enumerate the possible separators
to finally obtain an O(n^2logn) algorithm, where
n is the number of data points.
"Direct Routing: Algorithms and Complexity",
Algorithmica, Volume 45, Number 1, Pages 45--68, June 2006.
(Invited submission from ESA 2004).
Journal Version:
postscript
, pdf.
Conference Version: European Symposium on Algorithms (ESA), 2004,
postscript
, pdf.
Summary:
We consider the direct routing problem, in which packets must follow
pre-specified paths in a bufferless network.
In this routing model, the only parameter to be determined for a
packet is the injection time. We give a general
greedy algorithm which has routing time O(C D), and we show
that there are routing problems where this is the best
achievable by a direct algorithm. We show that optimal direct
routing is NP-hard to approximate by gap preserving
reduction from graph coloring. We give versions of the greedy
algorithm which have optimal or near optimal routing time
for trees, multi-dimensional meshes, hypercubes and the butterfly.
Finally, we use hard direct routing problems to obtain lower
bounds on the buffering requirements of any algorithm that requires
packets to follow pre-specified paths. If such algorithms
obtain near optimal routing time, then packets must be
buffered Omega(N^{1/3}) times (on average),
where N is the number of packets.
"Learning Martingale Measures From High Frequency Finiancial Data
to Help Option Pricing",
5th International Conference on Computational
Intelligence in Economics and Finance (CIEF 2006)
in conjunction with
9th Joint Conference on Information Sciences
(JCIS 2006)
October 8 - 11, Kaohsiung, Taiwan.
postscript,
pdf.
Slides:
postscript,
pdf.
"NN-OPT: Neural Networks for Option Pricing Using Multinomial Trees",
The 13th International Conference on Neural Information Processing
(ICONIP2006),
October 3-6, Hong Kong.
postscript,
pdf.
Slides:
postscript,
pdf.
"Using Agent-Based Modeling to Traverse Frameworks in Theories of the Social",
International Conference on Complex Systems (ICCS06),
Boston, MA, June 25-30, 2006.
postscript,
pdf.
Slides:
postscript,
pdf.
"Modeling the Cultural Subjectivity: Towards Computational Critique",
4th Conf. of North American Association for Computational Social and Organizational Science (NAACSOS 06), Notre-Dame, Indiana, June 22-23, 2006.
postscript,
pdf.
Slides:
postscript,
pdf.
"Distance Matrix Reconstruction from Incomplete Distance Information
for Sensor Network Localization",
Third Annual IEEE Communications Society Conference on Sensor,
Mesh and Ad Hoc Communications and Networks (IEEE SECON06),
Sep. 25-28, Reston, VA, USA.
postscript,
pdf.
"SSDE: Fast Graph Drawing Using Sampled Spectral Distance Embedding",
14th International Symposium on Graph Drawing
(GD2006),
Karlsruhe, Germany, Sept 18-20, 2006.
postscript,
pdf.
Slides:
powerpoint.
"Social Capital Experiments",
4th Conf. of North American Association for Computational
Social and Organizational Science (NAACSOS 06),
Notre-Dame, Indiana, June 22-23, 2006.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Finding Hidden Group Structure in a Stream of Communications",
[Top 3 Paper Award]
Proceedings of the 4th
Symposium on Intelligence and Security Informatics (ISI 06),
San Diego, CA, May 23-24 2006.
postscript,
pdf.
Slides:
ps,
pdf,
"Atomic Routing Games on Maximum Congestion",
Proceedings of the
2nd International Conference on Algorithmic
Aspects in Information and Management (AAIM), Hong Kong, June 22-23, 2006.
postscript,
pdf.
Slides:
postscript,
pdf.
"The Impact of Ranker Quality on Ranker Aggregation Algorithms: Information
vs. Robustness",
Proc. ICDE Workshop on Challenges in Web Information
Retrieval and Integration (WIRI), pp 10-19,
Atlanta, Georgia, April 3, 2006.
postscript,
pdf.
Slides:
postscript,
pdf.
2005:
"Joint Problem of Power Optimal Connectivity and Coverage in Wireless
Sensor Networks",
(accepted, to appear in)
Wireless Networks.
postscript
, pdf.
Summary:
We model a node using a Markovian automaton which transitions between
various modes which offer different capabilities and use power at
different states. We develop a
mean field theoretic analysis for the behavior of the system,
which we optimize with respect to
power consumption using connectivity and coverage as constraints.
We apply the methodology to a
3 state automaton with off, listen and transmit states.
We give results comparing the mean field theory
analysis with a simulation, and we compare our approach
with existing power saving approaches, which do not
necessarily offer connectivity and coverage.
"Linear Time Isotonic and Unimodal Regression in the L1 and Linfinity
Norms",
(accepted, to appear in)
Journal of Discrete Algorithms.
postscript
, pdf.
Summary:
We give linear time algorithms for (unweighted)
isotonic and unimodal regression in the Linfinity norm.
For the L1 norm, we give linear time algorithms when the output values
are in a bounded, finite set, and an approximation algorithm
when the output values are in a bounded range.
An open question that remains is to construct linear time
isotonic regression in the L1 norm for the arbitrary case.
"Hardness Results for Cake-Cutting",
Bulletin of the
European
Association for Theoretical Computer Science,
Number 86, pp. 85-106, June 2005.
Journal Version:
postscript
, pdf.
Conference Version: Symposium on the Theoretical Aspects of Computing
(STACS), 2003,
postscript,
pdf.
Summary:
We consider the algorithm aspects of fair division of a resource,
in which limited input from the users may be obtained regarding
how much they value a certain part of the resource. If there
are n users, it is known that a fair division of the resource
(one in which no user thinks they have a less than 1/n
of the resource) can be obtained using n logn cuts of the
resource (which is modeled as the unit interval). No
matching lower bound is available. We take a first step in this direction
by showing (through reduction from sorting) that Omega(n logn)
comparisons are made by any fair division algorithm.
We also consider strong envy-free division (in which no user thinks
she has a lesser share than any other user), and give lower bounds
of Omega(n^2) on the number of cuts that must be made.
"Learning Martingale Measures to Price Options",
1st Workshop on Machine Learning in Finance at NIPS 2005
Vancouver/Whistler, Dec 9, 2005.
postscript,
pdf.
Slides:
postscript,
pdf.
"SDE: Graph Drawing Using Spectral Distance Embedding",
13th International Symposium on Graph Drawing (GD05),
Limerick, Ireland, Sept 12-14, 2005.
postscript,
pdf.
Slides:
postscript,
pdf.
powerpoint.
Technical Report:
postscript,
pdf.
"Optimal Link Bombs are Uncoordinated",
First International Workshop on Adversarial Information Retrieval on the Web
(AIRWeb 05) at the
14th International World Wide Web Conference (WWW2005),
Chiba, Japan, 10-14 May, 2005.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Dynamics of Bridging and Bonding in Social Groups, a Multi-Agent
Model",
3rd Conf. of North American Association for Computational
Social and Organizational Science (NAACSOS 05),
Notre-Dame, Indiana, June 26-28, 2005.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Efficient Identification of Overlapping Communities",
IEEE International Conference
on Intelligence and Security Informatics (ISI 05),
Atlanta, Georgia, May 19-20 2005.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Oblivious Routing
on Geometric Networks",
17th ACM Symp. on Parallelism in Alg. and Arch. (SPAA) 2005,
July 17-20, Las Vegas, Nevada, USA.
postscript,
pdf.
Slides:
ps,
pdf.
"Efficient Bufferless Routing on Leveled Networks",
EURO-PAR 2005,
Aug 30-Sep 2, Lisboa, Portugal.
postscript,
pdf.
Slides:
postscript,
pdf.
"A Probabilistic Approach to Finding Geometric
Objects in Spatial Datasets of the Milky Way",
15th International Symposium on Methodoligies for
Intelligent Systems (ISMIS 2005),
May 25-28, Saratoga Springs, NY, USA.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Experimental Evaluation of the Greedy and Random Algorithms for Finding
Independent Sets in Random Graphs",
4th International Workshop on Efficient and Experimental
Algorithms (WEA 2005),
May 10-13, Santorini, Greece.
postscript,
pdf.
Slides:
postscript,
pdf.
"Optimal Oblivious Path Selection on the Mesh",
International Parallel and Distributed Processing Symposium (IPDPS 2005),
Apr 3-8, Denver, Colorado, USA.
postscript,
pdf.
Slides:
postscript,
pdf.
"Finding Communities by Clustering a Graph into Overlapping Subgraphs",
International Conference on Applied Computing (IADIS 2005),
Feb 22-25, Algarve, Portugal.
postscript,
pdf,
doc.
Slides:
ps,
pdf,
powerpoint,
"Detecting Conversing Groups of Chatters: A Model, Algorithms and Tests",
International Conference on Applied Computing (IADIS 2005),
Feb 22-25, Algarve, Portugal.
postscript,
pdf,
doc.
Slides:
ps,
pdf,
powerpoint,
2004:
"Maximum Drawdown",
Risk Magazine, Volume 17, Number 10, pages 99-102, October, 2004.
postscript (preprint)
, pdf (preprint).
Print Version.
Summary:
We use some recent results on the behavior of the Maximum Drawdown of
a Brownian motion to develop a scaling law for the Maximum Drawdown which
allows one to compare the Sterling/Calmar ratios of trading strategies
when the historical data available for the different trading strategies
extends over different lengths of time.
"On the Maximum Drawdown of a Brownian Motion",
Journal of Applied Probability, Volume 41, Number 1, pages 147-161,
March, 2004.
Journal Version:
postscript (preprint)
, pdf (preprint).
Print Version.
Conference Version: Conference on Computational Intelligence
for Financial Engineering (CIFEr), 2003,
postscript,
pdf.
Summary:
We develop the distribution for the Maximum Drawdown (largest drop from a
peak to a bottom) of a Brownian motion. We compute the expected value
as an infinite integral series.
We reduce this function to a single "universal" function, which
can be computed numerically once, and used to compute the expected drawdown
for any Brownian motion.
We compute the asymptotic behavior of this
infinite series to reveal three regimes for the behavior of the
MDD -- logarithmic, square-root and linear, depending on the
sign of the drift parameter in the Brownian motion.
"Contention-Free MAC Protocols for Wireless Sensor Networks",
18th Annual Conference on Distributed Computing (DISC 2004),
Oct 4-7, Amsterdam, the Netherlands.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Universal Bufferless Routing",
Workshop in On-line Algorithms (with
ESA 2004 in conjunction with ALGO 2004),
Sept 14-17, Bergen, Norway.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Direct Routing: Algorithms and Complexity",
12th European Symposium on Algorithms
(ESA 2004 in conjunction with ALGO 2004),
Sept 14-17, Bergen, Norway.
postscript,
pdf.
Slides:
postscript,
pdf.
"Near-Optimal Hot Potato Routing on Trees",
EURO-PAR 2004,
31 Aug-3 Sept, Pisa, Italy.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
"Identifying Multi-ID users in Open Forums",
2nd NSF/NIJ Symposium on Intelligence and Security Informatics (ISI 04),
Tucson, AZ, June 11-12 2004.
postscript,
pdf.
Slides:
ps ,
pdf ,
powerpoint ,
"Discovering Hidden Groups in Communication Networks",
2nd NSF/NIJ Symposium on Intelligence and Security Informatics (ISI 04),
Tucson, AZ, June 11-12 2004.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
2003:
"A Maximum Likelihood Approach to Variance Estimation of a Brownian
Motion Using the High, Low and Close",
Quantitative Finance, volume 3, issue 5, pages 376 - 384, August 2003.
Pre-print versions: postscript
, pdf.
Print version
Summary:
Since joint distribution of the high, low and close (conditioned on the
open, drift and variance parameters) can be computed, we use this
distribution to develop a likelihood for observing the
high, low and close conditioned on the drift and variance parameters.
By maximizing this likelihood, we construct an estimate of the variance
parameter,
which we compare to other well known estimators of variance parameter.
Maximizing the likelihood yields a systematic gain in performance.
"Using a Linear Fit to Determine Monotonicity Directions",
The 16th Annual Conference on Learning Theory (COLT 2003),
Washington, DC, USA, August 24-27 2003.
ps
pdf
Slides:
postscript,
pdf.
"Statistical Modeling of Social Groups on Communication Networks",
First conference of the North American Association for Computational
Social and Organizational Science (CASOS 03),
Pittsburgh PA, June 22-25, 2003.
postscript,
pdf.
Slides:
postscript,
pdf.
"Locating Hidden Groups in Communication Networks Using Hidden Markov Models ",
NSF/NIJ Symposium on Intelligence and Security Informatics (ISI 03),
Tucson, AZ, June 2-3 2003.
postscript,
pdf.
Slides:
postscript,
pdf.
"The Maximum Drawdown of the Brownian Motion",
International Conference on Computational Intelligence
for Financial Engineering (CIFEr 03), Hong Kong, March 2003.
postscript,
pdf.
Slides:
postscript,
pdf.
"Pricing the American Put Using a New Class of Tight Lower Bounds",
International Conference on Computational Intelligence
for Financial Engineering
(CIFEr 03), Hong Kong, March 2003.
postscript,
pdf.
Slides:
postscript,
pdf.
"Cake-Cutting is Not A Piece of Cake",
Symposium on the Theoretical Aspects of Computer Science (STACS 03),
Berlin,
February 27 - March 1, 2003.
postscript,
pdf.
Slides:
ps,
pdf,
powerpoint,
2002:
"Density Estimation and Random Variate Generation
Using Multi-Layer Networks",
IEEE Transactions on Neural Networks, Volume 13, Number 3, pp 497--520,
May 2002.
Journal Version:
postscript
, pdf.
Conference Versions:
Advances in Neural Information Processing Systems (NIPS),
1998,
postscript,
pdf. (Density Estimation)
Neural Networks for Signal Processing (NNSP99), 1999
postscript,
pdf.
(Control Theory Formulation of Random Variate Generation)
Summary:
We consider density estimation and random variate generation using
neural networks. For density estimation, we give two methods, both
based on learning the sample cumulative distribution function,
one is randomized (SLC) and one is a deterministic interpolation
using a smoothness constraint.
We prove that the randomized algorithm converges to the
deterministic version, asymptotically, however practically, randomization
offers an additional smoothness property. We prove convergence
of the deterministic algorithm to the true density given bounded derivative
constraints on the true density.
"The Multilevel Classification Problem and a Monotonicity Hint",
Intelligent Data Engineering and Learning, Third International Conference,
Manchester, UK, August, 2002. Springer.
postscript,
pdf.
Slides:
postscript,
pdf.
"Compressing Protein Conformational Space",
ISMB02: Tenth International Conference on Intelligent Systems for
Molecular Biology, August 2002.
postscript,
pdf.
(submitted, rejected)
"Compressing Protein Conformational Space",
RECOMB02: Sixth International Conference on Research in Computational
Molecular Biology, April 2002 (poster).
postscript,
pdf.
(abstract)
"Maximum Likelihood Estimation (MLE)",
Book Chapter, to appear in
"Encyclopedia of Financial Engineering and Risk Management",
commissioning editor Gillian Lindsey, Fitzroy Dearborn Publishers, 2002.
postscript
, pdf.
"Ordinary Least Squares (OLS)",
Book Chapter, to appear in
"Encyclopedia of Financial Engineering and Risk Management",
commissioning editor Gillian Lindsey, Fitzroy Dearborn Publishers, 2002.
postscript
, pdf.
"Expected Value / Mathematical Expectation",
Book Chapter, to appear in
"Encyclopedia of Financial Engineering and Risk Management",
commissioning editor Gillian Lindsey, Fitzroy Dearborn Publishers, 2002.
postscript
, pdf.
2001:
"The Equivalent Martingale Measure:
An Introduction to Pricing Using Expectations",
IEEE Transactions on Neural Networks, Volume 12, Number 4, pp 684-693,
July 2001.
postscript
, pdf.
Summary:
We give an introduction to the Martingale measure, a tool for
pricing options, which forml a link between pricing of financial derivatives
and Monte Carlo simulation.
We use this approach together with a limiting argument to give an elementary
derivation of the price of the European call option, and discuss American
options.
"On the Height of a Random Set of Points in a d-Dimensional Unit Cube",
Journal of Experimental Mathematics, Volume 10, Number 4, pp 583-597, 2001.
Journal Version:
postscript
, pdf,
source code.
Conference Version: Workshop on Algorithm Engineering and Experiments (ALENEX),
2001,
postscript,
pdf.
Summary:
We give give efficient algorithms for estimating the height (maximum
antichain) of a set of
random points in a d-dimensional unit cube. The algorithms are
based on a result we show which states that it suffices to consider
only points near the diagonal of the cube. We develop efficient algorithms
for generating these points directly, together with an efficient
algorithm for constructing the height of this set. We use the co-convergence
of estimates from different
sized regions around the diagonal to the same value too develop an improved
estimate. As a result, we give the first accurate estimates of
c_d, the coefficients governing the convergence rate.
"A Learning Algorithm for String Assembly",
BIOKDD01: Workshop on Data Mining in Bioinformatics
(with SIGKDD01 Conference), August 2001.
postscript,
pdf.
Slides:
postscript,
pdf.
"Experimental Evaluation of the Height of a Random Set of Points in a d-Dimensional Cube",
3rd Workshop on Algorithm Engineering and Experiments (ALENEX), January 2001.
postscript,
pdf.
Slides:
postscript,
pdf.
2000:
"The Early Restart Algorithm",
Neural Computation, Volume 12, Number 6, pp 1303-1313, June 2000.
postscript
, pdf.
Summary:
We give an analysis of the early restart phenomenon which can be used to
speed up an algorithm whose running time is a function of the
initial conditions. For some initial conditions, the runtime may be excessively
long or unbounded (for example local minima in an optimization).
Given some probability distribution over which the initial conditions
are sampled, there is some distribution for the stopping time of
the process. We give a analysis of how using restart can improve the
expected runtime, and develop a condition for selecting the optimal
restart time. Essentially, if the algorithm has run for long enough, it
may be favorable to restart the algorithm.
"No Free Lunch for Noise Prediction",
Neural Computation, Volume 12, Number 3, pp547-565, March 2000.
postscript
, pdf.
Summary:
We show that if the noise is additive, then the prior and posterior
distributions of the noise are equal when the prior over target functions
is uniform. There is no free lunch in the sense that if no useful
assumptions can be made about the target function, then it is not possible
gain any extra information regarding the nature of the noise.
"Volatility Estimation Using High, Low and Close Data - A Maximum Likelihood Approach",
Computational Finance (CF2000), Proceedings, June 2000.
postscript,
pdf.
"Pricing the quality option for the bond futures contract in a
multifactor Vasicek framework",
Proceedings of the 16th IMACS World Congress, Lausanne, Switzerland,
August 2000.
postscript,
pdf.
1999 and before:
"Financial Markets, Very Noisy Information Processing",
Proceedings of the IEEE, Special Issue on Intelligent
Signal Processing, Volume 86, Number 11, pp 2184-2195, November 1998.
postscript
, pdf.
addendum to definition A.1, postcript
, pdf.
Summary:
We give a general characterization of the performance
of a general class
of learning systems. Specifically, when the learning system is stable
(satisfies certain regularity conditions) and the noise is
additive with bounded variance and the input distribution has
compact support, the test performance
approaches the optimal test performance at a universal rate of
O(1/N), where N is the number of data points.
"No Free Lunch for Early Stopping",
Neural Computation, Volume 11, Number 4, pp 995-1010, May 1998.
postscript
, pdf.
Summary:
We show that the success of early stopping is an artifact of training
dynamics which tends to initialize the weights at smaller values and so
the resulting function tends to have smaller magnitude weights, and
hence tends to be smoother, which corresponds to having a smoothness
prior on the target functions. When no such hidden
prior assumption is made, i.e.,
when early stopping chooses uniformily among all hypotheses with the same
(higher than optimal) error, we show that the resulting classifier
has performance which is monotonically decreasing in the value of the
early stopping error. Thus there is No Free Lunch for early stopping, in that
it can only be succesfull if some bias is given toward the target function
prior -- which means that some prior on the target function is necessary.
"Validation of Volatility Models",
Journal of Forecasting, Volume 17, pp 349-368, 1998.
Journal Version:
postscript
, pdf.
Conference Version: Conference on Neural Networks in the
Capital Markets (NNCM), 1996.
postscript,
pdf.
Summary:
We analyse some systematic errors in the prediction of
volatility using Maximum Likelihood methods. In particular, there is a
systematic underprediction even when the mean is predicted well.
Further, Maximum Likelihood methods can systematically select the wrong
model over the correct model, when faced with a choice of two models.
We develop an expression for a correction factor for systematically
correcting for this systematic underprediction.
"A Control Formulation for Random Variate Generation",
Neural Networks for Signal Processing (NNSP99) IX, Proceedings of the 1999
IEEE Workshop, eds. Yu-Hen Hu, Jan Larsen, Elizabeth Wilson,
Scott Douglas, August 1999.
postscript,
pdf.
"A Bayesian Approach to Estimating Mutual Fund Returns",
Computational Finance (CF99), eds Yaser S. Abu-Mostafa, Blake LeBaron, Andrew W. Lo and Andreas S. Weigend, MIT Press, 1999.
postscript,
pdf.
"Neural Networks for Density Estimation",
Advances in Neural Information Processing Systems (NIPS) 11,
Proceedings of the 1998 Conference, pp522-528, eds.
Michael S. Kearns, Sara A. Solla and David A. Cohen, MIT Press, 1998.
postscript,
pdf.
"Neural Networks for Density Estimation in Financial Markets",
Intelligent Data Engineering and Learning, First International Symposium,
pp 171-178,
eds. L. Xu, L. W. Chan, I. King and A. Fu, Springer, October, 1998.
postscript,
pdf.
"Estimating Model Limitation in Financial Markets",
Intelligent Data Engineering and Learning (IDEAL), First International Symposium, pp 19-26, eds. L. Xu, L. W. Chan, I. King and A. Fu, Springer, October, 1998.
postscript,
pdf.
"Incorporating Test Inputs into Learning",
Advances in Neural Information Processing Systems (NIPS) 10, Proceedings of the 1997 Conference, pp 437-443, eds. Michael I. Jordan, Michael J. Kearns and
Sara A. Solla, MIT Press, 1997.
postscript,
pdf.
"Systematic Underprediction of Volatility in Maximum Likelihood Methods",
Decision Technologies for Financial Engineering, Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets, NNCM 96 (currently Computational Finance), pp 125-137, eds. Andreas S. Weigend, Yaser S. Abu-Mostafa and A.-Paul N. Refenes, World Scientific, 1996.
postscript,
pdf.
"Learning in the Presence of Noise",
Book Chapter, in "Intelligent Signal Processing", eds. Simon Haykin and
Bart Kosko, IEEE Press, 2001.
postscript
, pdf.
Working Papers:
"On the Diversity Within Internet Search Engines: A Clustering Approach Using
Query Based Metrics",
Second International Workshop on
Adversarial Information Retrieval on the Web (at ACM SIGIR),
Seattle, Washington, USA, 6-11 August 2006.
postscript,
pdf.
Slides:
postscript,
pdf.
"SDE: Graph Drawing Using Spectral Distance Embedding",
Submitted to International Journal on Foundations of Computer
Science.
postscript
, pdf.
Conference Version: Graph Drawing 2005,
postscript
, pdf.
Technical Report:
postscript,
pdf.
Summary:
We give a spectral graph drawing algorithm using a spectral distance
matrix reconstruction algorithm.
We use a Multi-Dimensional Scaling (MDS) technique to center the
coordinates and SVD to recover the coordinates.
We show that when the distance matrix is nearly embeddable, the
algorithm recovers (up to rotation) a close approximation
to the optimal embedding. We show results on a variety of graphs
of varying sizes up to about 20,000 nodes. The complexity of the
algorithm is in O(|V||E|), which is dominated by an APSP algorithm.
"Efficient Computation of Optimal Trading Strategies",
submitted to Quantitative Finance.
postscript
, pdf.
Summary:
We show how, given two instruments, one can compute return,
Sterling and Sharpe optimal strategies efficiently. We also give
efficient algorithms for return and Sterling optimal strategies
under constraints on the number of trades.
For optimizing the risk adjusted measures, we develop a novel
approach for optimizing quotients over intervals by relating the problem to
convex hull operations on the plane. The ability to efficiently
construct optimal strategies allows one to benchmark trading strategies on
historical data, benchmark different markets with respect to
profitability, and, if one has a model of future market behavior,
develop optimal strategies ex-ante.
"Algorithms for Finding Hidden Group Structure
from Communication Data",
submitted,
postscript,
pdf.
Conference Versions: Symposium on Intelligence and Security Informatics (ISI),
2004, 2006:
postscript,
pdf (2004);
postscript,
pdf (2006).
Summary:
We introduce the notion of a planning "hidden" group as one whose members
persistently need to communicate, in order to
plan. We address the problem of finding such hidden groups based only on
the communication data of the form