.. Graduate Skills Seminar documentation master file, created by sphinx-quickstart on Wed Oct 7 09:21:20 2015. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Fall 2016 Computer Science Graduate Skills Seminar ==================================================== .. list-table:: Fall 2016 Schedule: Wednesdays at 10AM at Darrin (DCC) 239 :header-rows: 1 :widths: 8, 8, 8, 40, 40, 20 :stub-columns: 1 * - Date - Time - Location - Topic - Speaker - Slides * - Aug 31 - 10 AM - DCC 239 - Graduate school - `Sibel Adali `_ - `PDF <_static/fall2016/Graduate_School_Fall2016.pdf>`_ * - Aug 31 - 10 AM - DCC 239 - Fellowship applications - `Alice Broussard `_ - `CS Fellowships <_static/fall2016/CSFellowships.pdf>`_ `OGE Info Sessions <_static/fall2016/OGESessions.pdf>`_ * - Sep 14 - 10 AM - DCC 239 - What is research - `M. Zaki `_ - `PDF <_static/fall2016/WhatIsResearch_ZakiFall2016.pdf>`_ * - Oct 12 - 10 AM - DCC 239 - Reading research papers - `Stacy Patterson `_ - `PDF <_static/fall2016/Reading_PattersonFall2016.pdf>`_ * - Oct 19 - 10 AM - DCC 239 - Giving talks - `Elliot Anshelevich `_ - `PDF <_static/fall2016/GivingTalks_Anshelevichfall2016.pdf>`_ * - Oct 26 - 10 AM - DCC 239 - Writing papers - `Fran Berman `_ - `PDF <_static/fall2016/Writing_BermanFall2016.pdf>`_ * - Nov 2 - 10 AM - DCC 239 - Career paths after graduate school - `Bolek Szymanski `_ - `PDF <_static/fall2016/Jobs_Szymanski2016.pdf>`_ * - Nov 16 - 10 AM - DCC 239 - Graduate Student Panel - See speakers below - * - Dec 7 - 10 AM - DCC 239 - Writing proposals - `Jeff Trinkle `_ - `PDF <_static/fall2016/Proposal_Trinkle2016.pdf>`_ Graduate Student Panel ~~~~~~~~~~~~~~~~~~~~~~~~~ 3-minute research presentations followed by question and answer session Speakers: - Shigeru Imai, "Cost-Efficient Elastic Stream Processing Using Application-Agnostic Performance Prediction" Increasing demand for real-time data processing has led to successful developments of scalable stream data processing systems such as Apache Storm, Flink, and Spark Streaming. Cloud computing can add great on-demand elasticity to these systems to deal with fluctuating computing demand; however, to promise Service Level Agreements (SLAs) while keeping the virtual machine (VM) usage cost low is a challenging task due to the target application's unknown scalability and unknown future computing demand. When creating application performance models, we can take an application-specific (white-box) approach using detailed domain knowledge or an application-agnostic (black-box) approach using application-independent information such as the number of machines and network utilization. In this thesis, we take the latter approach to provide a wider applicability for multiple streaming applications/systems and investigate its cost-effectiveness for stream data processing systems running in public clouds. Shigeru Imai is a fifth-year PhD student supervised by Prof. Carlos A. Varela. Prior to joining RPI, he received his B.E. from Tokyo Institute of Technology, Japan, and also worked for Mitsubishi Electric Corp. as a research engineer. He has been working on cost-efficient cloud computing and fault-tolerant streaming. - Jonathan Crall, "Working on Individual Animal Identification as a Grad Student" In this talk I will introduce the problem I’ve been working on for my PhD thesis: Identification of individual animals. Given a new image of an animal, this problem requires us to search a database for any images of that individual animal. This technology can help estimate animal population sizes. It replaces expensive and invasive by (eg ear tagging) by using the animal’s visual appearance as a distinguishing identifier. My name is Jon Crall. Before RPI I interned at Kitware where I learned various programming languages and gained experience in developing and building software systems. During this internship I was able to work on several computer vision projects and I took an interest in the subject. I wanted to learn how I could instruct a computer to make sense of a grid of pixels. I joined RPI in 2010 as a graduate student under Professor Stewart. From here I started to learn the basics and was introduced to important ecological problems that might be addressed by computer vision. Eventually I learned enough to settle on the problem of individual animal identification. I've learned much since then, and I'm currently working on finishing my thesis and plan to graduate sometime next year. - Amar Viswanathan, "A Schema-and Data-Aware Reformulation Methodology for Knowledge Graphs" When the results of a given query fail to satisfy the user, the query is rewritten or reformulated to generate more results. In web search based systems, the user manually reformulates the input query based on his knowledge of the different interpretations of the same query. On the contrary in databases the user reformulates the input query based on his knowledge of schema. This is a well established paradigm in traditional Information Retrieval and Databases. It is also the basis for result aggregation in a lot of cooperative answering systems. Knowledge Graphs like DBpedia, YAGO, NELL differ from traditional data sources in the sense that they encode relationships between extracted data. This enables users to query more complex and precise definitions with ease. SPARQL is one of the query languages that is used to query such RDF and RDFS based knowledge graph systems. When a SPARQL query fails, the onus is on the user to understand the schema semantics and instance data distribution to find plausible reasons for its failure. Given the lack of a system aided mechanism to help the user recover from a SPARQL query failure, we tackle this problem by a Schema-and Data-Aware Reformulation Methodology. My work relies on augmenting a failed user query by reformulations that are similar to the original query and do not produce zero results or less than k results (given minimum). This is demonstrated in my system which takes a failed user query as input and provides alternate ranked reformulations that align to both the schema and the instance data distribution. These reformulations are data-aware (i.e. data availability and data distribution) and schema-aware (i.e. understanding the concept hierarchy and application of a set of RDFS entailment rules). Amar Viswanathan is a fifth-year graduate student at Tetherless World Constellation under Prof. James A. Hendler. He has previously worked on Sentiment Analysis and Question Answering with Watson. His current research interests include Query Reformulation in Knowledge Graphs and Discourse Pragmatics in search systems. - Yao Dong, "Static Analysis and Program Transformation for Secure Computation on the Cloud" Cloud computing allows clients to upload data and computation to untrusted servers, which leads to potential violations to the confidentiality of client data. We propose a static program analysis which transforms a Java program into an equivalent one, so that it performs computation over encrypted data and preserves data confidentiality. My name is Yao Dong and I joined RPI in the year of 2012. My advisor is Prof. Ana Milanova. Our research focuses on software engineering, compilers, and programming languages, particularly, static and dynamic program analysis, and its applications to software verification, testing and understanding. - Bassem Makni, "Deep Learning of RDFS rules" Recurrent neural networks are being used successfully for Natural Language Processing tasks. We extend their use for rules learning, namely RDFS rules. We reach a materialization level of LUBM benchmark similar to OWLIM, a state of the art Semantic reasoner. Bassem Makni joined TWC in February 2012 as a research assistant and PhD student of Professor Jim Hendler. His research interests include semantic technologies, linked data, natural language proccessing and deep learning. - Konstantin Kuzmin, "Synergy Landscapes: Supporting novel biomedical research via multilayer collaboration networks" The value of research containing novel combinations of molecules can be seen in many innovative and award-winning research programs. Despite calls to use innovative approaches to address common diseases, an increasing majority of research funding goes toward "safe" incremental research. Counteracting this trend by nurturing novel and potentially transformative scientific research is challenging, and it must be supported in competition with established research programs. Therefore, we propose a tool that helps to resolve the tension between safe/fundable research vs. high-risk/potentially transformational research. It does this by identifying hidden overlapping interests around novel molecular research topics. Specifically, it identifies paths of molecular interactions that connect research topics and hypotheses that would not typically be associated, as the basis for scientific collaboration. Because these collaborations are related to the scientists' present trajectory, they are low risk and can be initiated rapidly. Unlike most incremental steps, these collaborations have the potential for leaps in understanding, as they reposition research for novel disease applications. We demonstrate the use of this tool to identify scientists who could contribute to understanding the cellular role of genes with novel associations with Alzheimer's disease, which have not been thoroughly characterized, in part due to the funding emphasis on established research. - Noah Wolfe, "Extreme-Scale Network Simulation: Methods for Effective Utilization of Next Generation HPC Interconnection Networks" A large amount of uncertainty lies in the optimal selection, configuration, and interconnectivity of compute architectures necessary to achieve extreme-scale computing. Two techniques are scheduled to be tested in the next line of supercomputers, each presenting unique and relatively unknown network performance requirements under their respective parallel HPC workload environments. On the one hand, we have a system composed of many homogeneous nodes using one or more Intel Xeon Phi mic processors resulting in a relatively small amount of compute power per node. On the other hand, we have a system utilizing a small number of dense heterogeneous nodes composed of multiple GPUs and CPUs. To keep up with the increasing amount of compute power, HPC system designers must weigh the many options for improving network performance to match. Such configurable parameters include network topology (Fat-Tree, Dragonfly, Slim Fly, Torus), switch technology (FDR, HDR, EDR Infiniband) as well as adding additional network rails. Using real-world HPC application traces from the DOE Design Forward suite, this work seeks to quantify and provide further analysis on the methods for the efficient utilization of interconnection networks in the realm of HPC systems. Noah Wolfe is a 4th year Ph.D. student in the Computer Science Department. Noah started at RPI in August 2013 and is working with Professor Carothers in the areas of parallel discrete event simulation, HPC network simulation, and application development/optimization for non-CPU hardware architectures, including Intel many-core processors, Nvidia GPUs, and the IBM TrueNorth Neuromorphic processor. Resources ---------- - CRA-W workshops on Graduate Skills (look for slides under different workshops and cohort programs) (Presentations on topics like Master's vs. Ph.D, Publishing your research, finding advisor, balancing personal and professional life, career paths, building self confidence, building professional persona, finding a research topic) http://cra.org/cra-w/resources/resources-from-past-events/ - UToronto Graduate Skills seminar http://www.dgp.toronto.edu/~hertzman/courses/gradSkills/2010/ - How to do good research and get it published and cited (Eamonn Keogh) http://www.cs.ucr.edu/~eamonn/Keogh_SIGKDD09_tutorial.pdf - Critical Questions for Research Proposals (a.k.a. Heilmeier questions) http://www.design.caltech.edu/erik/Misc/Heilmeier_Questions.html - How to choose a research topic http://www.cs.waikato.ac.nz/GradConf/talks/bruce/ChoosingTopic.pdf Fall 2015 Schedule ------------------- ====== ==== ========== ========================== ================================================= ============================================ Date Time Location Topic Speaker Slides ====== ==== ========== ========================== ================================================= ============================================ Oct 9 4PM Lally 104 What is research? `Mohammed Zaki `_ `PDF <_static/fall2015/research.pdf>`_ Oct 30 4PM Lally 104 Graduate school and beyond `Petros Drineas `_ `PDF <_static/fall2015/graduate_life.pdf>`_ Nov 10 4PM Eaton 214 Writing papers `Fran Berman `_ `PDF <_static/fall2015/writing.pdf>`_ Dec 8 4PM Lally 102 Giving talk `Jim Hendler `_ ====== ==== ========== ========================== ================================================= ============================================