Data Systems Group

  • Stefanos Baziotis

    Stefanos Baziotis (2024 Data Systems Intern)I am a third-year CS PhD candidate at the University of Illinois Urbana-Champaign (UIUC). My advisor is Prof. Charith Mendis. I also work closely with Daniel Kang. My research aims at accelerating data systems through compilers/programming languages. More broadly, my interests extend to performance engineering and distributed systems. I love composing music (mostly for electric guitar) and reading books (of any kind as long as it’s good).

     

    Maximilian Kuschewski

    Maximilian Kuschewski (2024 Data Systems Intern)Maximilian Kuschewski is a third-year Ph.D. student at the Technical University Munich working with Viktor Leis. He completed his bachelors degree in computer science and engineering in 2018, and his masters degree in software engineering in 2021. His research interests include efficient query processing, modern NVMe SSDs, and cloud-native data analytics.

     

    Wan Shen Lim

    Wan Shen Lim (2024 Data Systems Intern)Hi! I’m Wan. I’m finishing my fourth year as a PhD student in the Carnegie Mellon University Database Group, advised by Prof. Andy Pavlo. My research focuses on database systems and machine learning for systems. Recently, I’ve been thinking about where the application of machine learning weakens accuracy requirements (e.g., in executing queries to obtain training data). My current life goal is to visit a capybara onsen.

     

    Jie Liu

    Jie Liu (2024 Data Systems Intern)I am Jie Liu, a 5th year PhD candidate in Computer Systems Laboratory (CSL) at Cornell University, working under the supervision of Prof. Zhiru Zhang. My research mainly focuses on programming abstractions and compiler optimizations for processing sparse data on heterogeneous computing platforms.

     

    Elena Milkai

    Elena Milkai (2024 Data Systems Intern)Hello! I’m Elena Milkai, a PhD student at the University of Wisconsin-Madison where I am fortunate to be advised by Prof. Xiangyao Yu and Prof. Jignesh M. Patel. My research focuses on real-time analytics for cloud-native databases and benchmarking HTAP databases. In my free time I enjoy running and cooking. Sometimes I even combine the two, sprinting to the kitchen, hoping my cake hasn’t turned into coal.

     

    Hubert Mohr-Daurat

    Hubert Mohr-Daurat (2024 Data Systems Intern)Hi. I am Hubert, 3rd year PhD student at Imperial College London, advised by Holger Pirk. My research revolves around high-performance data processing systems with a focus on DBMSs extensibility and composability. Particularly, I have worked on the use cases of in-kernel data imputation and CPU-GPU co-processing. In a previous life, I was a video game programmer specialized in animation technologies. In my free time, I like to play music, repair old game consoles, practice taekwondo, and play Fortnite with my daughter.

     

    Nikhil Pimpalkhare

    Nikhil Pimpalkhare (2024 Data Systems Intern)Hey! I’m Nikhil, a 3rd year PhD student at Princeton University, where I’m researching program analysis techniques with Professor Zachary Kincaid. Outside of work, I like running, playing chess, and piano.

     

    Danrui Qi

    Danrui Qi (2024 Data Systems Intern)I am Danrui Qi, a 4th-year Ph.D. candidate at Simon Fraser University under the supervision of Prof. Jiannan Wang. Before joining Simon Fraser University, I completed my Bachelor’s and Master’s degrees at Tsinghua University, working with Prof. Shaoxu Song. My research interests primarily focus on good data for AI and AI for good data, encompassing:

    • Automated data preparation, with a specific focus on the automatic augmentation of features for complex relational tables.
    • Text2SQL methodologies utilizing Large Language Models (LLMs).
    • Automated Machine Learning (AutoML), especially automating the feature preprocessing part in the AutoML workflow.
    • The application of Bayesian Optimization and Reinforcement Learning techniques in the realm of Data Preparation.

    Recently, I have also developed a keen interest in Business Intelligence powered by Large Language Models (LLMs).

     

    Aneesh Raman

    Aneesh Raman (2024 Data Systems Intern)I am Aneesh Raman, a PhD student at Boston University advised by Prof. Manos Athanassoulis. My research focuses on harnessing the benefits of inherent data order in database systems: exploiting near-sortedness or other pre-existing structure in data as a “resource” to accelerate indexing and joins, and reduce redundant effort.

     

    Richard Wen

    Richard Wen (2024 Data Systems Intern)I’m a first-year PhD student at the University of Maryland under the guidance of Laxman Dhulipala. My primary area of research is in databases, with a recent focus on vector similarity search.

     

    Haike Xu

    Haike Xu (2024 Data Systems Intern)Hi! My name is Haike Xu, and I am a second-year PhD student from EECS MIT, advised by Prof. Piotr Indyk. My research interests include algorithm design and machine learning. Currently, I am focused on designing and analyzing nearest neighbor search algorithms, as well as exploring their applications in large language models.

     

    Chenhao Ye

    Chenhao Ye (2024 Data Systems Intern)I am a Ph.D. student at the University of Wisconsin-Madison, advised by Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau. I am broadly interested in computer systems and databases. My recent work focuses on building reliable and high-performance storage systems.

     

    Jiongli Zhu

    Jiongli Zhu (2024 Data Systems Intern)Hi! I am Jiongli. I am a second-year Ph.D. student at University of California San Diego, advised by Prof. Babak Salimi. My current research focuses on understanding and mitigating the effect of data quality issues on downstream ML models. I enjoy playing badminton, board games and video games in my spare time.

     
  • Jiashen Cao

    Jiashen Cao (2023 Data Systems Intern)

    Hi! I am Jiashen. I am a third-year Ph.D. student from Georgia Tech. I am currently co-advised by Prof. Hyesoon Kim and Joy Arulraj. I am broadly interested in using modern hardware to accelerate database workloads. I am particularly excited about leveraging the power of GPU for both relational database workloads and emergent workloads like video analytics. During my spare time, I like playing basketball and spending time with my westie puppy.

     

    Xinyi Chen

    Xinyi Chen (2023 Data Systems Intern)

    Hi! I am Xinyi. Currently, I am a fourth year PhD student in the University of Pennsylvania advised by Prof. Vincent Liu. My interests include both database and distributed systems. Recently, I am working on reducing network expense within cloud database systems. I am a new learner with hiking, and I am happy to join any hiking trips. I also enjoy trying different food in new restaurants.

     

    Grace Fan

    Grace Fan (2023 Data Systems Intern)

    Hi everyone, I am Grace Fan, a third-year PhD student at Northeastern University. I am advised by Professor Renee J. Miller and my research largely focuses on dataset discovery in data lakes. Outside of research, I enjoy hiking and playing with my cat.

     

    Victor Giannakouris

    Victor Giannakouris (2023 Data Systems Intern)

    Hi everyone! I am Victor, I am a second-year PhD student at Cornell, fortunate to be advised by Immanuel Trummer. My research is focused in applying machine learning to develop adaptive query optimizers for distributed, federated query engines. I also like climbing mountain summits/peaks, trail running, lifting weights and exploring breweries.

     

    Xiangpeng Hao

    Xiangpeng Hao (2023 Data Systems Intern)

    I’m Xiangpeng Hao, I’m a third year PhD student at the University of Wisconsin-Madison, advised by Xiangyao Yu. My research focuses on larger-than-memory database systems with a recent emphasis on two-tier memory (i.e., remote memory). I enjoy building reliable systems, and I have two cats🐈‍⬛🐈.

     

    Yongjun He

    Yongjun He (2023 Data Systems Intern)

    I am a second-year PhD student in the Systems Group of the Department of Computer Science at ETH Zürich under the supervision of Prof. Ce Zhang, Prof. Dr. Gustavo Alonso and Dr. Theodoros Rekatsinas. My research interests lie at the intersection of data management, machine learning, and system, with a focus on efficient training, inference, and fine-tuning of large language models. Before joining ETH Zürich, I worked on database storage engines and data preparation in the Data Science Research Group at Simon Fraser University, and was co-advised by Prof. Tianzheng Wang and Prof. Jiannan Wang. I’m excited to be interning at Microsoft Research this summer and working on Approximate Nearest Neighbor Search with Dr. Jonathan Goldstein, Dr. Arvind Arasu, and Dr. Silu Huang!

     

    Eugenie Lai

    Eugenie Lai (2023 Data Systems Intern)

    I’m a 3rd-year PhD student advised by Prof. Michael Cafarella in the Data Systems Group at MIT CSAIL. My current research interests are on data preparation and preprocessing to improve data quality, with downstream tasks in mind, such as ML modelling and data analysis.

     

    Peng Li

    Peng Li (2023 Data Systems Intern)

    Hi! My name is Peng Li. I am a 4th year PhD student at Georgia Tech advised by Prof. Xu Chu and Prof. Kexin Rong. My research focuses on machine learning and data management. Outside of research, I enjoy traveling and hiking.

     

    Wan Shen Lim

    Wan Shen Lim (2023 Data Systems Intern)

    Hi! I’m Wan. I’m finishing my third year as a PhD student in the Carnegie Mellon University Database Group, advised by Prof. Andy Pavlo. My research focuses on database systems and machine learning for systems, with a recent emphasis on optimizing the training data generation process. I’m fond of trying to poach interesting techniques from other fields. In my free time, I love watching videos of cute animals (mostly cats, dogs, and capybaras). My favorite subreddit is currently /r/OneOrangeBrainCell. I also enjoy experimenting with new food combinations in the kitchen.

     

    Ziyun Wei

    Ziyun Wei (2023 Data Systems Intern)

    Hi! My name is Ziyun Wei, and I am a fifth-year Ph.D. candidate at Cornell University, advised by Prof. Immanuel Trummer. My research focuses on parallelizing adaptive query processing and applying optimization models to interactive data exploration. I am passionate about finding ways to make data analysis more efficient and accessible. When I’m not working, I enjoy playing soccer and exploring new restaurants. Additionally, I am an experienced cat babysitter and would be happy to help take care of your furry friends when you’re away.

     

    Xinjing Zhou

    Xinjing Zhou (2023 Data Systems Intern)

    Hi! I am Xinjing Zhou, a second-year PhD student at MIT Data Systems Group advised by Prof. Michael Stonebraker. My research interests include database systems and operating systems. Recently, I focus on making database systems more cost-effective and secure. Outside of work, I like petting cats, watching soccer games, and hiking.

     

  • Ishtiyaque Ahmad

    Ishtiyaque Ahmad (Data System Intern 2022)

    Hi, I am Ishtiyaque, a third year PhD student at Uninversity of California Santa Barbara. I am co-supervised by Professors Divy Agrawal, Amr El Abbadi, and Trinabh Gupta. My research interest lies in the intersection of distributed systems, data management, and applied cryptography. More specifically, my research goal is to break the prevailing tension between scalability and privacy in present days’ technological solutions.

     

    Maureen Daum

    Maureen Daum (Data Systems Intern 2022)

    My name is Maureen, and I am an upcoming 5th year PhD student at the University of Washington advised by Magda Balazinska. My research interests include database systems, and I focus on systems to support efficient video analytics. Outside of work I enjoy running, hiking, and reading thriller novels of questionable literary value.

     

    Tanmay Gautam

    Tanmay Gautam (Data Systems Intern 2022)

    Hello! My name is Tanmay and I am a Ph.D. student in EECS at UC Berkeley, where I am fortunate to be advised by Professor Somayeh Sojoudi. My research interests are broadly focused on applying concepts from optimization theory to tackle various challenges in the domains of reinforcement learning and deep learning. Within reinforcement learning, some topics that excite me include safe RL and working on establishing theoretical guarantees (e.g. convergence, sample efficiency) for current RL algorithms. Within the realm of deep learning, I’m particularly fascinated by the notion of implicit layers.

     

    Zeyuan Hu

    Zeyuan Hu (Data Systems Intern 2022)

    My name is Zeyuan Hu. I’m an upcoming 3rd year PhD student at UT-Austin advised by Daniel P. Miranker. My research focuses on query optimization and processing in both relational model and graph model. I like reading papers with practical impact guided by strong theoretical insights. I touched DB2 at IBM and Presto at AWS in my career. I’m fairly familiar with federation database architecture. Outside of the work, I like playing boardgames (I have a full collection of Arkham Horror: the Card Game) and enjoy spending time with my wife and our Australian cattle dog, Nico. We recently moved to the Seattle area. Marymoor Park is our favorite spot so far. Looking forward to knowing all of you.

     

    Peng Li

    Peng Li (Data Systems Intern 2022)

    My name is Peng Li. I am an upcoming 4th year PhD student at Georgia Tech advised by Prof. Xu Chu. My research interests include machine learning, data cleaning and data integration. Outside of research, I enjoy traveling and hiking

     

    Yiming Lin

    Yiming Lin (Data Systems Intern 2022)

    I am Yiming Lin, a PhD student at the University of California, Irvine, advised by Prof. Sharad Mehrotra. My research focuses on data cleaning and query processing. During my internship at DMX, I will work closely with Yeye on the data preparation project to automatically find joinable column pairs in large datasets. Outside of research, I love playing badminton, hiking, traveling and taking pictures.

     

    Yin Lin

    Yin Lin (Data Systems Intern 2022)

    My name is Yin Lin. I am a 3rd year Ph.D. student at the University of Michigan, Ann Arbor advised by Prof. H. V. Jagadish. My research focuses on data equality systems preventing misuse and misinterpretation of big data. In my spare time, I enjoy outdoor activities and playing badminton.

     

    Jianan Lu

    Jianan Lu (Data Systems Intern 2022)

    My name is Jianan. I am a third-year PhD at Princeton University, advised by Prof. Michael J. Freedman. My research focuses on building fast and cost-efficient storage systems using emerging memory and storage technologies.

     

    Amine Mhedhbi

    Amine Mhedhbi (Data Systems Intern 2022)

    I am a Ph.D. student in the Data Systems Group at the University of Waterloo advised by Prof. Semih Salihoglu. My research focuses on developing novel query processing, optimization, and storage techniques for queries over graph-structured relations. Specifically, I’ve focused on optimizing worst-case optimal join orderings and query processing on succinct representations.

     

    Sumit Kumar Monga

    Sumit Kumar Monga (Data Systems Intern 2022)

    I am a Ph.D. student at Virginia Tech, advised by Prof. Changwoo Min. My research interests primarily include distributed systems and certain aspects of operating systems. I am currently working on composing the superfast network and storage stacks to rethink the design of datacenter services.

     

    Karan Newatia

    Karan Newatia (Data Systems Intern 2022)

    I am Karan Newatia, a second-year PhD student at the University of Pennsylvania, where I am co-advised by Prof. Andreas Haeberlen and Prof. Linh Phan. My research interests include distributed systems, privacy, and applied cryptography. In my spare time, I enjoy playing cricket and table tennis, watching soccer, and cooking vegan food.

     

    Soujanya Ponnapalli

    Soujanya Ponnapalli (Data Systems Intern 2022)

    I am a PhD student at UT Austin advised by Prof. Vijay Chidambaram. My research interests are in distributed storage systems. Previously, I have worked on building efficient key-value stores and file systems for emerging applications and hardware like public blockchains and Persistent Memory. Outside of research, I have the most fun outdoors or doing anything that involves dancing or singing! I enjoy playing Tennis, Badminton, and am always up for board games 🙂

     

    Sajjad Rahnama

    Sajjad Rahnama (Data Systems Intern 2022)

    I am Sajjad, a fourth-year Ph.D. student at UC Davis advised by Mohammad Sadoghi. My research focuses on Scalable Byzantine Fault Tolerant protocols, Permissioned Blockchains, and Distributed Transaction Processing. I enjoy cooking Persian food and playing soccer (and Fifa too).

     

    Sambhav Satija

    Sambhav Satija (Data Systems Intern 2022)

    I am Sambhav, a second year PhD student at UW-Madison where I am advised by Prof. Andrea Arpaci-Dusseau and Prof. Remzi Arpaci-Dusseau. I focus on analyzing bottlenecks in existing distributed storage systems. In my spare time, I sleep, cycle, and mess with my sleep cycle.

     

    Nikolaos Tziavelis

    Nikolaos Tziavelis (Data Systems Intern 2022)

    I am Nikos, a fourth-year PhD student at Northeastern University, advised by Prof. Mirek Riedewald and Prof. Wolfgang Gatterbauer. My research spans both theoretical and practical aspects of data systems, with the goal of extending them with improved algorithms that achieve non-trivial guarantees. I am currently working on query optimization.

     

    Xiaoying Wang

    Xiaoying Wang (Data Systems Intern 2022)

    My name is Xiaoying Wang. I am a second-year PhD student at Simon Fraser University, advised by Prof. Jiannan Wang. My research interests are query optimization and speeding up data preparation.

     

    Weiyuan Wu

    Weiyuan Wu (Data Systems Intern 2022)

    Weiyuan Wu is a PhD student at SFU. His research interests include ML Debugging, Data Intensive Systems and blockchain. He likes to play video games and watch movies in his spare time.

     

  • Wentao Cai

    Wentao Cai (Data Systems Intern 2021)

    My name is Wentao Cai. I’m a research intern working on recoverability of distributed systems. I’m currently working remotely from Palo Alto.I’m a PhD candidate at the University of Rochester advised by Prof. Michael Scott. My research interests are concurrent data structures and persistent memory.

     

    Saehan Jo

    Saehan Jo (Data Systems Intern 2021)

    My name is Saehan Jo and I am a Ph.D. student at Cornell University, advised by Prof. Immanuel Trummer. My research focuses on compression and approximation of database objects, e.g., relational data or query workloads. In my spare time, I enjoy dancing, reading novels, and watching soccer.

     

    Konstantinos Kanellis

    Konstantinos Kanellis (Data Systems Intern 2021)

    I am Konstantinos (or Kostas for short), PhD student at the University of Wisconsin-Madison, advised by Shivaram Venkataraman. My current research focuses on instance-optimized systems: exploring ways to “squeeze” more performance out of existing data systems, by devising faster, robust, and more practical automatic tuning methods.

     

    Moe Kayali

    Moe Kayali (Data Systems Intern 2021)

    I’m Moe, a PhD student with the databases group at UW, working under Dan Suciu. My research focuses on data synopses and approximate queries. In the past, I’ve also worked on causal inference. I’m located in Seattle at the moment.

     

    Sekwon Lee

    Sekwon Lee (Data Systems Group Intern 2021)My name is Sekwon Lee who is a third-year Ph.D. student at the University of Texas at Austin, advised by Prof. Vijay Chidambaram. My research aims at building next-generation storage systems on emerging storage technologies and architectures with a focus on high performance and crash consistency. Especially, I have been mainly working on designing high-performance crash-consistent index structures on Persistent Memory (PM) and data stores for PM disaggregation.

     

    Tianyu Li

    Tianyu Li (Data Systems Intern 2021)My name is Tianyu, and I am a second-year PhD student at MIT working with Prof. Sam Madden. I work on new storage guarantees and programming models in the cloud.

     

    Gang Liao

    Gang Lia (Data Systems Intern 2021)I’m Gang, a third-year PhD student at the University of Maryland working with Prof. Daniel Abadi. My research focuses on schema evolution, distributed file systems, and serverless streaming.

     

    Amine Mhedhbi

    Amine Mhedhbi (Data Systems Intern 2021)

    I am a fourth-year Ph.D. student at the University of Waterloo, advised by Semih Salihoglu. My research aims at designing novel relational query processing and optimization techniques for large-scale graph data management. Specifically, I’ve focused on optimizing worst-case optimal join orderings and query processing on succinct representations. I’ve integrated my work in a new DBMS I am co-creating called GraphflowDB (graphflow.io).

     

    Zhengjie Miao

    Zhengjie Miao (Data Systems Intern 2021)

    My name is Zhengjie Miao. I am a PhD candidate at Duke University advised by Prof. Sudeepa Roy. My research focuses on explanations in databases that reduce user effort in getting insights into their data. I am also interested in machine learning techniques for data integration and data preparation.

     

    Supun Nakandala

    Supun Nakandala (Data Systems Intern 2021)

    My name is Supun Nakandala. I am a Ph.D. candidate at University of California San Diego advised by Prof. Arun Kumar. My research interest lies broadly in the intersection of Systems and Machine Learning, an emerging area that is increasingly referred to as Machine Learning Systems. My Ph.D. research work focuses on developing query optimization-inspired abstractions, algorithms, and systems to improve efficiency, scalability, and usability of deep learning workloads.

     

    Parimarjan Negi

    Parimarjan Negi (Data Systems Intern 2021)

    I am a fourth year PhD student at MIT advised by Professor Mohammad Alizadeh, and I also work closely with Professor Tim Kraska. I have been working on introducing learned components to database query optimizers, such as cardinality estimators, or the query planner.

     

    Tarikul Islam Papon

    Tarikul Islam Papon (Data Systems Intern 2021)Hello! I am Tarikul Islam Papon, a second-year PhD student at Boston University, advised by Prof. Manos Athanassoulis. My research interests broadly include Data Systems, specifically focusing on hardware-aware data management challenges, stemming from the evolution of storage and memory devices. Currently I am working on proposing a new I/O model which can capture the read/write asymmetry and access concurrency of contemporary devices.

     

    Leonhard Spiegelberg

    Leonhard Spiegelberg (Data Systems Intern 2021)

    Hello world! My name is Leonhard, and I am PhD candidate at Brown University, co-advised by Malte Schwarzkopf and Tim Kraska. My research focuses on making big data systems more robust and less painful to use, as well as fast and more efficient using speculative compilation techniques.

     

    Nihal Vatandas

    Nihal Vatandas (Data Systems Intern 2021)

    I’m a cryptography researcher and a PhD student in computer science at the City University of New York. I’m working under the supervision of Prof. Rosario Gennaro. Some of the areas I had research experience in the past are Computational Entropy, Lattice-based Cryptography, Authenticated Key Exchange and Deniable Key Exchange. My most recent study focuses on the deniability of Signal Protocol.

     

    Ziyun Wei

    Ziyun Wei (Data Systems Intern 2021)

    Hi! My name is Ziyun Wei, and I am a third-year Ph.D. candidate at Cornell University, advised by Prof. Immanuel Trummer. My research focuses on parallelizing adaptive query processing in database systems and applying optimization models to make data analysis more efficient and accessible.

     

    Jing Nathan Yan

    Jing Nathan Yan (Data Systems Intern 2021)

    My name is Jing Nathan Yan, and I am a second-year PhD at Cornell working on improving the machine learning pipelines through building relational data representation, robust machine learning fairness assessment and efficient data preparation.

     

    Qizhen Zhang

    Qizhen Zhang

    I am a fifth-year PhD student at the University of Pennsylvania where I am co-advised by Vincent Liu and Boon Thau Loo. My research is centered around the clouds/data centers, involving both data systems and the network, two layers that are processing and transferring data at unprecedented scales.

     

  • Yiru Chen

    Yiru Chen

    Hi, everyone! My name is Yiru Chen. I am a second-year Ph.D. student at Columbia University, advised by Prof. Eugene Wu. I am broadly interested in technologies that help users at all technical levels quickly and effectively make sense of their data. Specifically, I am interested in areas of database, human data interaction, data explanation, visualization, and machine learning. This is my first time with DMX group and my second time with Microsoft Research. When I am not at work, I enjoy playing the violin, watching shows, and sports.

     

    Nadiia Chepurko

    Nadiia Chepurko

    My name is Nadiia, I am a PhD student at MIT. I recently joined Computer Vision group where I am advised by Antonio Torralba. My research interests include theoretical and applied aspects of machine learning. Currently, I am working on the interpretability and robustness of deep learning. Prior to that, I worked with David Karger in the HCI group and Theory of Computation group on projects related to applied machine learning and algorithms for randomized numerical linear algebra.

     

    Yannis Chronis

    Yannis Chronis

    My name is Yannis Chronis and I am a Ph.D. student at the University of Wisconsin-Madison, advised by Prof. Jignesh M. Patel. My research focuses on query optimization and improving query processing by using emerging hardware technologies.

     

    Aarati Kakaraparthy

    Aarati Kakaraparthy

    Hello! My name is Aarati (pronounced RT) and I’m advised by Prof. Jignesh Patel at the Department of Computer Sciences, University of Wisconsin-Madison. The core theme of my research is to make database systems adaptive using techniques from (but not limited to) machine learning. Our goal is to promote adaptivity to different workload properties, the execution environment (the cloud for instance), and emerging hardware technologies such as non-volatile memory and solid-state drives.

     

    Beibin Li

    Beibin Li

    I am Beibin Li, a third-year Ph.D. student at the University of Washington. My research interest is applying machine learning to solve real-world problems. I have worked on biomarker discovery for autism, oculomotor behavior analyses, medical imaging for cancer diagnosis, and feature exploration.

     

    Alana Marzoev

    Alana Marzoev

    I’m Alana Marzoev, a second year PhD student at MIT, where I work with Sam Madden, Jacob Andreas, and Frans Kaashoek on problems at the intersection of databases and NLP. Prior to grad school, I did two internships with the Cloud Infrastructure team at MSR Cambridge.

     

    Kapil Vaidya

    Kapil Vaidya

    My name is Kapil Vaidya and I am a Ph.D. student at Massachusetts Institute of Technology(MIT), advised by Prof. Tim Kraska. My research focuses on improving data structures and algorithms using learning techniques.

     

    Junxiong Wang

    Junxiong Wang

    My name is Junxiong Wang, and I am a PhD student at Cornell University. My research focuses on leveraging the reinforcement learning approach (e.g., MCTS, Bandit) for database optimization problem (e.g., query plan optimization and index selection). During my spare time, I enjoy traveling, hiking, and playing board games.

     

    Zuozhi Wang

    Zuozhi Wang

    My name is Zuozhi Wang and I am a third year Computer Science PhD student at University of California, Irvine, advised by Professor Chen Li. My research interests are on the areas of query optimization, incremental query processing, and distributed query processing with a focus on debuggability. During my spare time, I like playing piano and guitar.

     

    Jinfeng Xiao

    Jinfeng Xiao

    Hi! My name is Jinfeng Xiao, and I am a 2nd-year Computer Science Ph.D. student at the University of Illinois at Urbana-Champaign working with Dr. Jiawei Han in text mining. My current research focus is question answering and information retrieval, including related feature-generating methods like text classification, summarization, and knowledge graphs. In my spare time, I like traveling, playing badminton, reading novels, watching shows and enjoying pop music.

     

    Junwen Yang

    Junwen Yang

    I am a fourth-year PhD student in system group at the department of computer science, the University of Chicago. I am working with Prof. Shan Lu on the performance and the scalability problems of data processing and data analytics systems.

     

    Qizhen Zhang

    Qizhen Zhang

    My name is Qizhen Zhang, and I am a fourth-year PhD student at the University of Pennsylvania, co-advised by Vincent Liu and Boon Thau Loo. My research is centered around the clouds/data centers, including designing large-scale data systems for the cloud and new cloud architectures.

     

  • Firas Abuzaid

    Firas Abuzaid

    Hi, I’m Firas Abuzaid; I’m a 4th-year PhD student at Stanford, working with Profs. Peter Bailis and Matei Zaharia. My research primarily focuses on the intersection of systems and machine learning: how to take high-level machine learning tasks and build software systems to improve the efficiency and usability of these tasks without sacrificing their accuracy. This is my first year at Microsoft Research, and I’m excited to be working with Srikanth Kandula. When I’m not working, you can find me playing basketball, hiking, going for a run, or solving puzzles.

    Walter Cai

    Walter Cai

    My name is Walter Cai. I just finished my 3rd year at the University of Washington. In particular, I focus on query optimization and the integration of open world knowledge in analytical systems. This is my first internship with DMX and I am working with Phil Bernstein and Wentao Wu on multiquery optimization in analytical streaming workloads. Like most Washingtonians I am an avid hiker, mountain climber, ultimate [frisbee] player, and overall outdoor enthusiast. Most weekends I will be forsaking my apartment and driving with abandon directly into the mountains.

    Beidi (Betty) Chen

    Beidi (Betty) Chen

    My name is Beidi (Betty) Chen, a fourth year PhD Student from Rice University working with Dr. Anshumali Shrivastava who was an intern in DMX a while ago :). I have been working on large-scale machine learning algorithms, involving designing algorithms for efficient accurate and secure representation of the data. Recently I also built a system, which blends smart randomized algorithms with multi-core parallelism and workload optimization, to train fully connected neural network models on a CPU faster than the best available GPU for some applications. I received my B.S. in EECS from UC Berkeley and had done some research in log mining and sensor networks at that time.

    I my spare time, I play all kinds of sports, basketball, badminton, pingpong etc. Specifically I play Dota2 A LOT. I also enjoy exploring all kinds of food especially I have this good opportunity to spend  this summer in the Greater Seattle Area.

    Zhiwei (Hacker) Fan

    Zhiwei (Hacker) Fan

    My name is Zhiwei (Hacker) Fan and I am a PhD Student in the Database Group at University of Wisconsin-Madison, advised by Prof. Paraschos Koutris. My primary research interests are in building scalable systems for complex data analytics with both analytical and learning-based approaches.

    Pedro Thiago Timbo Holanda

    Pedro Thiago Timbo Holanda

    I’m Pedro Holanda, and I’m a Ph.D. candidate in the Database Architectures group at the Centrum Wiskunde & Informatica (CWI). My research focuses on progressive/adaptive indexing techniques and in-database analytics. At MSR, I’m working on a compiled query execution engine for SQL Server. Hobby wise, as a regular Brazilian, I enjoy investing my free time producing memes.

    Xiao Huang
     
    Xiao Huang

    My name is Xiao Huang. I am a Ph.D. student at Texas A&M University, supervised by Dr. Xia Hu. My research interests lie in network embedding, attributed network analysis, and knowledge graphs & analytics. During my internship at DMX, I will work closely with Dr. Yao Lu on the sketch project. During the spare time, I like playing ping-pong and swimming.

     

    Zhongjun Jin 

    Zhongjun Jin

    I am Zhongjun and you can also call me “Mark”. I am a PhD student at University of Michigan co-advised by Mike Cafarella and H. V. Jagadish. My research interest lies in improving self-service data preparation using a combination of AI, PL and HCI techniques. When not researching, I like trying all kinds of sports for fun. Fun fact about me: my friend Ankit said I might be the only Chinese that can play cricket in Ann Arbor.

    Sulekha Kulkarni

    Sulekha Kulkarni

    I am Sulekha Kulkarni, a fifth year PhD student from University of Pennsylvania, advised by Prof. Mayur Naik. My research interests are in making static analyses that find bugs in industry-strength software, scalable and more effective. At DMX, I am working with Dr. Suman Nath on static analysis techniques to automatically infer fault-injection scenarios in the tool Torch that helps developers find obscure bugs. I find it very exciting to be here, in the midst of people who do high-impact research. Outside of research, I enjoy music, cooking and home-making.

    Guangpu Li

    Guangpu Li

    My name is Guangpu Li. I am a fourth year PhD student in University of Chicago, working with Prof. Shan Lu. It is my second chance to be an intern here. My primary research is attacking concurrency bugs in distributed system. Concurrency bugs is a big threat to the system reliability which is increasingly important recently. Developers need help to detect and fix these bugs. Besides this specific bug type, my broad interest is apply the programming language technology to automatically extract software semantic.

    I am a big basketball fan, from both playing and watching

    Tianyu Liu

    Tianyu Liu

    I am a PhD candidate at the University of Wisconsin—Madison, advised by Professor Jin-Yi Cai. My research mainly focuses on the computational complexity of approximate counting and sampling. This summer, I am working in the DMX group on approximate query processing.

    Christina Pavlopoulou

    Christina Pavlopoulou

    My name is Christina Pavlopoulou. I was born in Athens, capital of Greece. Currently, I am a PhD Candidate at the University of California, Riverside. Currently, I am working on adaptive query optimization for Big Data Systems under the supervision of Professor Vassilis Tsotras. In the past, I had, also, worked in the fields of data mining and pattern recognition. This is my first time at Microsoft Research where I am excited to be working with Phil Bernstein and Wentao Wu. In my free time, I enjoy going to the movies and cooking.

    Kexin Rong

    Kexin Rong

    My name is Kexin Rong, and I am a fourth year PhD student at Stanford working with Prof. Peter Bailis and Prof. Philip Levis. I am broadly interested in designing systems and algorithms to improve the efficiency and effectiveness of common data analytics such as visualization, similarity search and density estimation. This is my first visit at DMX and I am excited to work with Dr. Srikanth Kandula and Dr. Yao Lu this summer on approximate query processing.

    Tarique Ashraf Siddiqui  

    Tarique Ashraf Siddiqui

    My name is Tarique Siddiqui, and I’m a PhD student in the Databases and Information System (DAIS) Group at the University of Illinois, Urbana Champaign, advised by Aditya Parameswaran. My research interests broadly lie in data-management and analytics, with the focus on building expressive and scalable systems for interactive data exploration. Recently, I’ve also been working on applying learning-based techniques for improving performance of big data analytics. This is my first time at DMX, where I’m excited to be working with Surajit and Vivek. Previously, I’ve interned twice at Cloud and Information Services Lab (CISL) at Microsoft as well as at Tableau Research. During my free time, I like reading poetry, swimming, and watching cricket.

    Jie Song

    Jie Song

    My name is Jie Song, a PhD student at the University of Michigan, Ann Arbor. I work with Prof. H. V. Jagadish on data usability, focusing on helping users with little to no technical proficiency to make use of dataset search and preparation tools in an automated manner. This is my third internship in Seattle but first one here at Microsoft. I love Math and Statistics, and coding needless to say. Besides work,  I enjoy books and movies. I also love traveling and do a lot sports such as swimming, tennis, yoga, kickboxing and rock climbing. Recently, I started taking guitar class.

    Qingyun Wu

    Qingyun Wu

    My name is Qingyun Wu and I am a fifth year PhD student from the University of Virginia. My research interests include multi-armed bandits, interactive online learning, sequential decision optimization and their applications to a wide spectrum of information service systems, including recommendation, online learning to rank and ads placement.  During my internship at DMX, I will be working closely with Dr. Chi Wang on exciting projects about autoML. Besides research, I enjoy reading books and outdoor activities.

    Cong Yan
     
    Cong Yan

    My name is Cong Yan and I’m a PhD student in University of Washington. My research focuses on using program analysis to help improve the performance of in-memory database systems and help developers better reuse existing code in data preparation systems. This is my second time in DMX where I work on the latter topic with Yeye He. I like the project and am excited about hacking open source code! When I’m not at work, I enjoy music and sports.

    Qizhen Zhang
     
    Qizhen Zhang

    My name is Qizhen Zhang, a third year PhD student at the University of Pennsylvania, co-advised by Vincent Liu and Boon Thau Loo. My primary research interest is building systems for data processing at LARGE scale, e.g., in data centers. My past and ongoing works include graph systems, query execution/optimizations and data center networks. After research, I like hiking, movies and workout.

     
     
     
    Tobias Ziegler
     
    Tobias Ziegler

    My name is Tobias Ziegler, but everyone calls me Tobi. I am a PhD student at TU Darmstadt (Germany) advised by Prof. Binnig. My research mainly focuses on how to leverage remote direct memory access (RDMA) and fast networks in distributed database systems.

    Besides research I am interested in all kinds of sports from hiking to different team sports, especially handball. I am very excited to be a part of DMX and looking forward to learn from all the amazing people at MSR.

    Vasileios Zois

    Vasileios Zois

    My name is Vasileios Zois and i am a PhD Candidate at University of California – Riverside, advised by Vassilis Tsotras and Walid Najjar. I earned my B.S. in Computer Engineering and Informatics from University of Patras and my M.S. in Computer Science from University of Southern California. My research interest lie at the intersection of parallel processing/hardware acceleration and efficient query processing. In my spare time a like playing soccer, listening to music, video games, reading comics and watching movies.

  • Stella Giannakopoulou

    Stella Giannakopoulou

    My name is Stella and I am a third-year PhD student at EPFL working with Anastasia Ailamaki. My research focuses on the efficiency and coverage problems of data cleaning. I got my B.S. and M.S. in Computer Science from National and Kapodistrian University of Athens. Besides research, I enjoy running and doing acrobatics.

    Anna Fariha

    Anna Fariha

    I am a 2nd year PhD student at the University of Massachusetts Amherst. I work at the Database group there with Prof. Alexandra Meliou. At UMass, I work on SQuID (squid.cs.umass.edu), which is a semantic similarity aware system towards query intent discovery from user provided examples. This is my first time doing an internship at MSR and I am very excited about it! My interest other than research includes piano (unfortunately I don’t get a lot of time to practice), travelling, hiking, watching drama series, and reading books.

    Brian Hentschel

    Brian Hentschel

    My name is Brian Hentschel. I’m a graduate student at Harvard in the data systems laboratory, and my work there has alternated between time biased sampling, sketching algorithms for table scans, and predicting the performance of index structures. When not at work, I enjoy creative writing and improv theater, as well as hiking and biking. I’m excited to be here for the summer and looking forward to meeting everybody.

    Ashwin Naresh Kumar

    Ashwin Naresh Kumar

    I’m Ashwin, a masters student from the Language Technologies Institute at Carnegie Mellon University where I am advised by Florian Metze. Broadly, I am interested in Natural Language Understanding. I’m also a big fan of adventure sports and love to hike, play badminton and swim.

    I will be working closely with the Bing Answer experience team for my internship.

    Lin Ma

    Lin Ma

    My name is Lin Ma. I’m a finishing third-year PhD student from the Database Group at Carnegie Mellon University, advised by Andy Pavlo.

    My research now focuses on autonomous optimization for relational database systems. The goal is to reduce the cost of human database administration by automating different perspectives of the system, such as physical design, knob configuration, and data layout. I have been working on SQL workload prediction for RDBMSs, and database system design for autonomous operations. In my previous projects, I’ve also worked on larger-than-memory data management for In-Memory DBMSs with new hardware technologies.

    I got my bachelor degree from Peking University in China. I like reading, watching NBA games, and swimming when I’m free. Recently I enjoy watching the Westworld TV show. I’m very excited to work with everyone for the summer. I think there would be lots of fun!

    Yanqing Peng

    Yanqing Peng

    My name is Yanqing Peng. I am a PhD student from University of Utah, advised by Feifei Li (Professor @ Utah). My research interest is in query processing and optimization in large-scale data management systems.

    My favorite sport is soccer, although I’m not a good player. I enjoy visual novels, photography and playing piano. I’m very happy to work as an intern at MSR. Nice to meet you!

    Matt Perron

    Matt Perron

    Matt Perron is a first year PhD student in the Database Group at MIT working with Michael Stonebraker and Samuel Madden. His research interests include query optimization and cloud data warehousing. Matt received his M.S. in Computer Science from Carnegie Mellon University in 2016 and his B.S. in Computer Science from Rochester Institute of Technology in 2013. Outside of research he enjoys reading, hiking, travel, and brewery or distillery tours.

    Tianze Shi

    Tianze Shi

    I am a PhD student from Cornell University where I am advised by Lillian Lee. My research interests span across all aspects of human languages. Currently, I am focusing on syntactic structures. I combine deep learning-based representations and dynamic programming parsing algorithms for constructing simple, efficient and effective dependency parsers.

    Outside research, I enjoy hiking, movies and novels.

    Ios Kotsogiannis Teftsoglou

    Ios Kotsogiannis Teftsoglou

    I am PhD candidate in Computer Science at Duke University working under Ashwin Machanavajjhala. I am a data privacy researcher and my main focus is to create tools that bridge the gap between current innovations in privacy research and real world systems where privacy is desired but not yet applied.

    Alejandro Tomsic

    Alejandro Tomsic

    My name is Alejandro, I am from Argentina. I just finished a PhD at Sorbonne University, Paris, advised by Marc Shapiro. My research is in distributed systems and, in particular, in distributed and replicated transactional storage. During this summer, I’ll be part of a project whose goal is to enrich the stream processing capabilities of Orleans, Microsoft’s framework to ease the development of distributed applications.

    I’m also into running, cycling and electronic music production.

    Pei Wang

    Pei Wang

    I am Pei Wang, a PhD student from Simon Fraser University. I am passionate about data integration topics. This summer I work with Dr. Yeye He on data cleaning.

    Except for papers, I enjoy spending time on podcasts, Netflix, books, music and gym.

    Taining Wang

    Taining Wang

    I am a PhD student from National University of Singapore, advised by Professor Chee-Yong Chan. My research interests are in query processing, data sampling, and learning from data. I obtained my Bachelor’s degree also from NUS. I enjoy travelling and jogging, as well as playing board games with friends.

    Chen Zhao

    Chen Zhao

    I’m Chen Zhao, a second year PhD student from University of Maryland, College Park, working with Jordan Boyd-Graber. My research interests are Natural Language Processing and Machine Learning. This summer I work with Dr. Yeye He on record conflation.

    I spend most of my spare time playing bridge, with several achievements on recent national tournaments.

    Zhuoyue Zhao

    Zhuoyue Zhao

    My name is Zhuoyue Zhao. I am a second-year Ph.D. student from University of Utah and my advisor is Feifei Li. My research interest is in large-scale data management systems; in particular, approximate query processing and query optimization, OLTP and OLAP engines.

  • Matteo Brucato

    Matteo Brucato

    I’m a PhD candidate at the University of Massachusetts, Amherst, advised by Alexandra Meliou (UMass Amherst) and Azza Abouzied (NYU Abu Dhabi). My research interests include Data Management, Information Retrieval, and Natural Language Processing, with a focus on query languages, natural interfaces to databases, and algorithms for scalable query processing. Prior to joining UMass, I received my M.S. and B.S. degrees in Computer Science from the University of Bologna, Italy. When not doing research, I like traveling, cooking, playing guitar, and playing soccer.

    Lingjiao Chen

    Lingjiao Chen

    I am Lingjiao Chen, a graduate student from University of Wisconsin, Madison, who is broadly interested in data management, machine learning, and mathematical optimization. This summer I will stay in MSR focusing on query optimization using machine learning techniques. During the spare time, I play ping-pong, video games, chess, as well as go hiking.

    Adam Dziedzic

    Adam Dziedzic

    I’m a 2nd year PhD student at the University of Chicago, advised by Prof. Aaron Elmore. I’m passionate about empowering users with data driven decisions. Currently, my research is focused on data migration between diverse database systems and I work on the BigDAWG project. I obtained my Bachelor’s and Master’s degrees from Warsaw University of Technology. I was also studying at DTU in Denmark and worked in the DIAS group at EPFL. Previously, I had internships at CERN and Barclays Investment Bank in London.

    I spent the rest of my waking hours on reading books, taking MOOC courses, especially on Machine & Deep Learning, enjoying many sports and practicing salsa.

    Mohamed S. Hassan

    Mohamed S. Hassan

    I am Mohamed S. Hassan, a PhD candidate and teaching fellow at Purdue University in Computer Science, advised by Prof. Walid G. Aref. In my PhD thesis, I work on dynamic graph data management and stream processing, where I extend real relational database systems with native graph support. Last summer, I worked with the Systems Research Group in MSR to design and deliver (on production) a high-scale system for automatic cloud-troubleshooting based on stream processing. In my second internship at MSR, I am glad to join the DMX group to work on the AutoIndexer project, where I focus on tuning cloud-databases automatically and adaptively based on dynamic query-workloads. My homepage is https://www.cs.purdue.edu/homes/msaberab/index.html (opens in new tab).

    Silu Huang

    Silu Huang

    I am a third year PhD student from UIUC, working with Prof. Aditya Parameswaran. My research at UIUC focuses on data versioning problem, rising rapidly from collaborative data construction, curation and analytics among team members. Here in DMX I will be mainly working with Bolin and Chi on some AQP problem. This is my second time interning in DMX group and I really love it!

    Zhipeng Huang

    Zhipeng Huang

    I am Zhipeng Huang, a second year PhD student from the University of Hong Kong, supervised by Prof. Nikos Mamoulis. I am broadly interested in data management, data cleaning and data mining. During this summer I will be working with Yeye He on a problem of error detection on relational tables.

    Ryan Marcus

    Ryan Marcus

    I’m a Ph.D. student at Brandeis University, advised by Olga Papaemmanouil. My research focuses on applying machine learning techniques to problems in cloud databases and systems. Prior to joining Brandeis, I worked at Los Alamos National Laboratory, optimizing fluid dynamics and neutron transport codes for next-gen hardware.

    Christopher Meiklejohn

    Christopher Meiklejohn

    Hello, I am Christopher Meiklejohn, a second year PhD student at two universities: Universite catholique de Louvain in Belgium supervised by Peter Van Roy and Instituto Superior Technico supervised by Rodrigo Rodrigues.  My research interests are distributed systems and programming languages, specifically the intersection between the two: how can we make it easier to build large-scale distributed applications with replicated, shared state.  Most of my work in the first year of my PhD has been around developing (and performing a large-scale evaluation of) a new model of distributed programming based on CRDTs: a distributed data structure used to ensure deterministic conflict resolution of concurrent updates to replicated state.  Before starting my PhD, I spent about 15 years in industry, most recently working on various distributed systems and databases at Basho Technologies, Mesosphere, and Machine Zone.

    Holger Pirk

    Holger Pirk

    Until recently, Holger was a Postdoc in the Database group at MIT CSAIL. He received his PhD from the University of Amsterdam in 2015, where he worked in the Database Architectures group at CWI. His research interests lie in the optimization of data processing systems for modern hardware. In particular, he studies the efficient use of CPU features such as speculative execution, SIMD or transactional memory as well as emerging technologies such as GPGPUs and flash memory for analytical data processing. In addition to new algorithms and optimizations, Holger develops abstractions that allow the effective use of these low-level techniques in data processing systems.

    Utku Sirin

    Utku Sirin

    Hi, I am a third year PhD student at EPFL, Switzerland, working with Anastasia Ailamaki. I am interested in data management on modern hardware. Here at DMX, I will be working with Vivek on multi-tenant databases and performance isolation. I like playing soccer, hiking and jogging. In my free time, I learn French!

    Dong Xie

    Dong Xie

    I am a first year Ph.D. student from University of Utah advised by Prof. Feifei Li. Prior joining U of U, I got my Bachelor’s degree from the ACM Honored Class of Shanghai Jiao Tong Univeristy. My research interests spanning in data management (especially for spatio-temporal data), distributed systems, main-memory databases, and data privacy. In spare time, I enjoy hiking and playing video games.

    Cong Yan

    Cong Yan

    I’m Cong Yan, a graduate student in University of Washington advised by Alvin Cheung. My interest lies in the conjuection of databases, programming languages and systems, and now I mainly focus on leveraging program analysis to help automate database optimizations.

    In my off-work hours, I devote most of my time to arts and sports. I enjoy playing the piano, learning architecture, playing softball, badminton and all kinds; and above all, making friends!

    Sam Zhao

    Sam Zhao

    I am a PhD student at Brown University.  I am interested in bringing together theories and systems to help people analyze and manage their data both efficiently and safely.  To battle false discovery in data analysis, I helps developing statistical safeguards in visualization software.   I am also involved in building searchable encryption for end-to-end mobile text messaging in Signal to promote privacy.  I enjoy hiking, skiing, and playing soccer and tennis.

    Zhuoyue Zhao

    Zhuoyue Zhao

    I am a Ph.D. student in the School of Computing at University of Utah from August 2016. My advisor is Prof. Feifei Li and I’m a member of the data group. I hold a Bachelor’s degree from the ACM Class of Shanghai Jiao Tong University. My current research interest is in large-scale data management systems; in particular, approximate query processing and query optimization, OLTP and OLAP engines, and graph analytics and graph query processing systems.

  • Interns

    Dana Van Aken

    Dana Van Aken

    My name is Dana Van Aken and I am a third year PhD student at Carnegie Mellon University, advised by Andy Pavlo. My research interest is in database management systems, specifically on automatic database tuning. In my spare time I enjoy playing board games, traveling, and drinking coffee.

     Zhao Chang

    Zhao Chang

    Hi, everyone. My name is Zhao Chang. I am a second year PhD student at University of Utah. My advisor is Prof. Feifei Li. My research interests include large-scale data management and data privacy. I like doing sports and reading...

    Amit Chavan

    Amit Chavan

    Hi, I am Amit Chavan and I am a PhD student working on large scale data management at the University of Maryland, College Park. My research is about building sustainable and scalable tools for data analysts when they interact with data. I have worked on problems related to version control of large datasets and processing queries on said versioned data. Besides database research, I enjoy sci-fi (in any form J) and photography. More info: http://www.cs.umd.edu/~amitc/ (opens in new tab)..

    Yeounoh Chung

    Yeounoh Chung

    Hello, I am Yeounoh Chung, a PhD student from Brown. My general research interests span a variety of topics in data exploration; at Brown, my work has been on quantifying uncertainty in data exploration. At MSR, I will be working closely with Christian Konig and Wentao Wu. For more information, please feel free to visit my page at https://cs.brown.edu/~yeounoh/ (opens in new tab)..

    Mohammad Dashti

    Mohammad Dashti

    I’m a 4th-year PhD student at EPFL University in Switzerland, advised by Prof. Christoph Koch. I am originally from Iran, where I got my bachelor’s and master’s degrees, both from Sharif University of Technology in Tehran. I’m interested in database systems (in particular, transaction processing) and programming languages (in particular, compilation techniques). Achieving high throughput for transaction processing with low latency while keeping the state strongly consistent is a hard problem. In my Phd, I am pushing the limits on this problem, mainly by applying compilation techniques in this context.

    Kolya Malkin

    Kolya Malkin

    I’m a second year PhD student at Yale University. My Bachelor’s degree is from the University of Washington. Although most of my waking hours are spent studying algebraic geometry and graph theory, you’re also likely to find me in a coffee shop or at the top of a mountain..

    Lukas Maas

    Lukas Maas

    My name is Lukas Maas and I’m a second-year PhD Student at Harvard University, where I am advised by Stratos Idreos. My research interests lie in the intersection of databases, compilers and software engineering. In particular, I am interested in how computer programs can help us design more robust and flexible data processing systems. To learn more about my research, please see my website: www.lukasmaas.com (opens in new tab)..

    Abolfazl Asudeh Naee

    Abolfazl Asudeh Naee

    I am a CS Ph.D. candidate at the University of Texas at Arlington and a member of UTA-DBXLAB, supervised by Dr. Gautam Das. My research interests include Query Reformulation, Top-k Indices, and Hidden Web Databases.

    Volleyball is what I love to play. More info: http://asudeh.github.io (opens in new tab)..

    Azade Nazi

    Azade Nazi

    I am 5th year PhD student in Database Exploration Lab (DBXLAB) at University of Texas at Arlington. I am interested in different areas like Data Mining, Information Retrieval & Web Mining, Hidden Graph, Database, and Social Network Analysis. This is my second internship with DMX group and I am really excited about it. In my free time, I like to play volleyball, ping pong, badminton, or attend group exercise classes..

    Tim Kiefer

    Tim Kiefer

    I finished my PhD in the Database Systems Group at TU Dresden, Germany last October… and afterwards immediately went to New Zealand for three months to enjoy the beautiful countryside while travelling by (and living in a) car. When I am not pursuing my research in the areas of load balancing, workload placement, or distributed data management systems in general, I love trampoline gymnastics and rock climbing (actually anything outdoors or sports related)..

    Yi Lu

    Yi Lu

    I am a first-year PhD student from MIT. I am working with Prof. Sam Madden on adaptive data partitioning. I obtained my master degree from Chinese University of Hong Kong with focus on distributed graph processing systems. Besides research, I enjoy hiking, playing ping-pong and watching movies..

    Bruhathi Sundarmurthy 

    Bruhathi Sundarmurthy

    Hello, my name is Bruhathi. I am a PhD Student at the University of Wisconsin-Madison. I am interested in both database theory and database systems. At school I work on problems related to uncertain and incomplete databases and at MSR I will be working on offline query scheduling. Besides work, I love watching and playing tennis, and I also enjoy playing the violin.

    Yue Wang

    Yue Wang

    Hi, I’m a PhD candidate at the University of Massachusetts Amherst, supervised by Prof. Gerome Miklau and Prof. Alexandra Meliou. I focus on data cleaning and completion time estimation.

    I love swimming and hiking. Reading and movie watching also bring me a lot of fun. More about me? https://people.cs.umass.edu/~yuewang/ (opens in new tab).

    Yudian Zheng

    Yudian Zheng

    Hi, all! My name is Yudian Zheng, a 3rd year Ph.D. candidate from the University of Hong Kong. My research interests include a variety of topics such as leveraging the human intelligence to solve complex tasks (crowdsourcing), cleaning dirty data (data cleaning) and mining patterns from web data (data mining).

    Besides research, I love watching sports games (especially football) and films. I also like playing computer games with Chinese martial characters. You may refer to my website (http://i.cs.hku.hk/~ydzheng2/ (opens in new tab)) to know more about me.

    Erkang (Eric) Zhu

    Erkang (Eric) Zhu

    I am a PhD student in Computer Science at the University of Toronto under the supervision of Professor Renée J. Miller.

    I am interested in data management (searching, integration, and analytic) techniques for data on the Web and Open Data. I enjoy programming and maintain a number of open source projects.

    In addition to computer related activities, I love traveling, cooking and hanging out with friends and family.

    Honglei Zhuang

    Honglei Zhuang

    I am Honglei.  I am a PhD student from University of Illinois at Urbana-Champaign, supervised by Professor Jiawei Han.  My research interests include outlier detection in networks and text data.  I also work on crowdsourcing and social network analysis.  Besides research, I am also enthusiastic about badminton.  

  • Ahmed El-Kishky

    Ahmed El-Kishky

    I am a second-year PhD student in the Data Mining Group at the University of Illinois at Urbana-Champaign where I’m advised by Professor Jiawei Han. Before joining UIUC, I obtained my Bachelor’s degree in Computer Science and Mathematics from The University of Tulsa.

    Generally I’m interested in unstructured data mining, more particularly deriving insight and uncovering hidden structure from large quantities of unstructured text.

    In my spare time I enjoy playing racquetball, going to the gym, hiking, traveling to new places, and of course reading.

    Anja Gruenheid

    Anja Gruenheid

    I’m a third year PhD student at ETH Zurich, Switzerland, supervised by Donald Kossmann. My research focus is on data management and data integration, specifically on how data changes affect integration tasks. Here at MSR, I work in the related area of data cleaning.

    I like to travel a lot and I’m an enthusiastic photographer. I’m also a novice foodie who’s just learning to appreciate all the great cuisines out there.

    Ashish Tapdiya

    Ashish Tapdiya

    I am Ashish. I am currently a PhD student at the Vanderbilt University where I work with Dan Fabbri. This summer, I am interning in the DMX group and will be working with Vivek Narasayya to automate the performance management of SQL server in cloud. In my free time I like to hike, bike, watch movies, tennis etc.

    Azade Nazi

    Azade Nazi

    I am 4th year PhD student in Database Exploration Lab (DBXLAB) at University of Texas at Arlington under supervision of Dr. Gautam Das. I am interested in different areas like Data Mining, Information Retrieval & Web Mining, Hidden Graph, Database, and Social Network Analysis. In my free time, I like to play volleyball, ping pong, badminton, or attend group exercise classes.

    Fotios Psallidas

    Fotios Psallidas

    Hello, everyone. I’m a third year PhD student at Columbia University under the supervision of Prof. Luis Gravano. My research interests include (near) real-time structured knowledge discovery and exploration from noisy and possibly disparate sources. In my second internship at the DMX group I will be working with Suajit Chaudhuri and Vivek Narasayya on exciting and challenging problems of enterprise-centric knowledge discovery and searching.

    More details on my research interests @ http://www.cs.columbia.edu/~fotis/ (opens in new tab)

    Keqian Li

    Keqian Li

    I’m Keqian Li, a master student at University of British Columbia interested in both the analytics and system side of big data. My advisor is Prof. Laks Lakshmanan. You can get a sample of my research interest from my homepage: http://www.cs.ubc.ca/~keqianli/. My industry internship experience would often involve large scale data analysis. This summer at MSR I will be working with Kris and Yeye in the area of data cleaning. I like sports, good food, and traveling.

    Mayuresh Kunjir

    Mayuresh Kunjir

    Hi everyone! I’m Mayuresh Kunjir, third year PhD student at Duke University, advised by Dr. Shivnath Babu. At Duke, I work in big data analytics, specifically on cluster resource allocation and job scheduling. Here at Microsoft, I would be working in automated physical database tuning. I enjoy playing Volleyball and Tennis in my spare time. On weekends here, I would love to go hiking nearby.

    Shumo Chu

    Shumo Chu

    I am a PhD student from Database Group at University of Washington, Seattle. I am working with Dan Suciu and Magdalena Balazinska on parallel query processing. I had worked with Spanner Group at Google as an intern.

    I enjoyed dinning around Seattle and indoor climbing/bouldering with friends in my spare time.

    Silu Huang

    Silu Huang

    My name is Silu Huang. I’m a first-year PhD student in UIUC, under the supervision of Prof. Aditya Parameswaran. My research interest lies in data analytics and data management. I obtained my master degree from Chinese University of Hong Kong with focus on graph algorithms. In my spare time, I like doing sports, reading books and watching movies.

    Vasileios Verroios

    Vasileios Verroios

    Hi everyone!! I am a PHD student at Stanford University and my advisor is Hector Garcia-Molina. My research lies on the intersection of crowdsourcing and data integration/exploration/mining.

    Xu Chu

    Xu Chu

    Hello, everyone. My name is Xu Chu. I am a PhD student in Database Group at University of Waterloo. My advisor is Ihab Ilyas. My research focuses on various aspects of data quality management. Example topics are entity resolution, data quality rules discovery and enforcement, human involved data cleaning, and scalable data cleaning. In my free time, I like to watch some TV, hit the gym, and play badminton. This is my second internship with DMX group, working with Yeye He!

    My homepage is: https://cs.uwaterloo.ca/~x4chu/ (opens in new tab)

  • Interns

    Bailu Ding

    Bailu Ding

    I am a PhD student from Computer Science Department at Cornell. I am working with Johannes Gehrke on database systems.

    Xu Chu

    Xu Chu

    I am currently a 3rd year PhD student in Database Group at University of Waterloo, Canada. I am working with Prof. Ihab Ilyas. I am generally interested in structured data management. Recently I have been focusing on data cleaning, schema discovery, and data integration. My homepage is: https://cs.uwaterloo.ca/~x4chu/ (opens in new tab)

    In my free time, I like to watch some television, play ping pong, badminton, or hit the gym.

    Yuliang Li

    Yuliang Li

    My name is Yuliang Li. I am a 2nd-year PhD student in UC San Diego. In UCSD, I am working with Alin Deutsch and Victor Vianu in databases and database theory. I also enjoy solving puzzles, formal logics and home cooking.

    Jennifer Ortiz

    Jennifer Ortiz

    My name is Jennifer Ortiz. I just finished my second year as a PhD student at the University of Washington, advised by Magdalena Balazinska. The main focus on my work is thinking about new ways to help users choose a Cloud service when they wish to explore their data. On the data science side, I have also been involved in collaborating with astronomers to provide the tools needed to help them analyze and understand the merging history behind the galaxies that exist today. Other things I enjoy doing: drinking coffee, watching movies and spending time with my family.

    Fotis Psallidas

    Fotis Psallidas

    My name is Fotis Psallidas and I am a second year PhD student at Columbia University, working with Prof. Luis Gravano. Research-wise, I am interested in combining disparate sources under the goal of extracting interesting patterns. Sometimes I just give up waiting for exact solutions and I try to approximate them. Besides research, I spend time walking// watching movies-tv series// drinking coffee// and going places.

    Xiang Ren

    Xiang Ren

    I’m currently a 2nd year PhD student in Data Mining Group at University of Illinois at Urbana-Champaign, working with Prof. Jiawei Han. Before joining UIUC, I got my bachelor in Computer Science from Zhejiang University, China. My research mainly focuses on mining and constructing text-rich information networks, including applications like search, recommendation and structure enrichment in heterogeneous information networks. My home page is: http://web.engr.illinois.edu/~xren7/ (opens in new tab)

    In my spare time, I’d like to play basketball and foosball, go to gym, travel around and do some hiking. I’m also a food lover who will check out local restaurants for all kinds of great food :).

    Jayanta Mondal

    Jayanta Mondal

    I am a fourth year PhD student from the university of Maryland. I am a student of prof. Amol Deshapande and I work in the area of processing real-time queries on large-scale graph-structured data. This is my second internship at DMX/MSR and I will be exploring physical database designing with Sudipto Das this summer. Besides computer science, I enjoy trying out different types sports, a recent addition being boxing. I also like travelling (visited 12 national parks so far), photography, and anything related to food (starting from exploring new ingredients to doing online courses).

    Saravanan Thirumuruganathan

    Saravanan Thirumuruganathan

    My name is Saravanan Thirumuruganathan (you can also call me Sara). I am a fourth-year PhD student from University of Texas at Arlington. My advisor is Prof. Gautam Das. I am interested in data exploration, analytics over hidden web databases and social content mining. In my spare time, I love reading books, writing code (and poems!) and taking MOOC courses from other fields. I’m excited to be a part of MSR this summer and sitting in 112/3325.

    Jingjing Wang

    Jingjing Wang

    I am a 3rd year PhD student at University of Washington, working with professor Magdalena Balazinska on databases. Before joining UW, I obtained my bachelor degree in Computer Science from Fudan University, China. I also interned in Microsoft Research Asia in 2010, working on web data extraction with Haixun Wang. My research interest generally lies in the area of database systems.

    In my free time, I enjoy listening to various kinds of music, watching anime and reading novels. I’m also a fan of sports, I play basketball, Ping-Pong, and other ball games, also go hiking sometimes.

  • Visitors

    Eli Cortez

    Eli Cortez

    I am a Visiting Researcher at the Data Management, Exploration and Mining (DMX), which is part of the eXtreme Computing Group (XCG) of Microsoft Research. I received my Ph.D. degree in Computer Science from Federal University of Amazonas/Brazil in December 2012. My dissertation work titled “Unsupervised Information Extraction by Text Segmentation” was awarded by the Brazilian Computer Society as the Best PhD Thesis defended in 2012During my PhD i founded with some friends a startup that provides e-commerce technologies, such as: search, classification and extraction. My broad research interest lies in the area of databases and data mining, more specifically, data exploration and information extraction.

    Interns

    Renata Borovica

    Renata BorovicaI am a PhD student in the Data-intensive Applications and Systems Laboratory (DIAS) at Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland, supervised by Professor Anastasia Ailamaki. Before joining EPFL, I have been working for several years for an IT company as a member of a database team, while obtaining my Master’s degree from the school of Computer Science and Automatics at the University of Novi Sad, Serbia.

    My research interests span the area of autonomous database management systems and scientific data management. In particular, I find my passion in the topic of robust query processing. Toward that end, I envision database systems as adaptive systems with the ability to heal themselves in order to provide robust and predictable query execution performance.

    I like spending my free time outdoors, either cycling, running, hiking, or simply enjoying a wonderful view. I am also a yoga and karate fan.

    More information can be found at http://people.epfl.ch/renata.borovica/bio?lang=en&cvlang=en (opens in new tab)

    Fabian Hüske

    Fabian Hüske

    Hi, my name is Fabian. I am a PhD student in the Database Systems and Information Management (DIMA) group at Technische Universität Berlin working with Volker Markl. I received a master in computer science from University of Ulm, Germany. My research interests include massively parallel data processing and query optimization. Apart from research I like to spend time with my kids, try out new recipes, and play field hockey. Visit my website for more details http://www.user.tu-berlin.de/fabian.hueske/ (opens in new tab).

    Feng Li

    Feng Li

    I am currently a 4th year PhD Student in the Department of Computer Science, National University of Singapore. I was honored to join the database group in 2009, supervised by Prof. Ooi Beng Chin. From 2005 to 2009, I studied in Peking University, Beijing, China and obtained my BSc degree from the Department of Electronic Engineering and Computer Science. This is my third internship in DMX group and I am proud of this. My research interests are mainly in MapReduce, cloud computing and database system, including indexing and query processing. I am also interested in microblog data processing.

    In my free time, I like to play Ping-Pong and swimming. I also started to play tennis recently. My home page is http://www.comp.nus.edu.sg/~li-feng/ (opens in new tab).

    Lanyue Lu

    Lanyue Lu

    I’m a PhD student of Computer Sciences at University of Wsiconsin – Madison. I work under the guidance of Prof. Andrea C. Arpaci-Dusseau and Prof. Remzi H. Arpaci-Dusseau, as a member of ADvanced Systems Lab (ADSL) and Wisconsin Institute on Software-defined Datacenters in Madison (WISDoM).

    My research interests include file & storage systems, operating systems, and cloud computing. You may reach my webpage here: http://pages.cs.wisc.edu/~ll/ (opens in new tab). I play basketball regularly to keep energetic.

    Jayanta Mondal

    Jayanta Mondal

    I am Jayanta Mondal, a PhD student at the University of Maryland, College Park, working with Professor Amol Deshpande. My primary research focus has been real-time processing on large volume of graph-structured data, with the high level goal of building end-to-end scalable systems in the cloud. Before I started my graduate study, I was involved with a start-up, and hence like to look at research problems from the point of view of both system developers and application users. Personally, I get excited by cool things and find myself hopping from one hobby to another. My recent passion has been bouldering. Couple of my more long term hobbies have been photography (some of the photographs could be found here (opens in new tab)) and cooking.

    Markus Pilman

    Markus Pilman

    I am a 2nd year PhD student at ETH located in the nicest city worldwide (Zurich – for the very unlikely case you were wondering which one this might be). I am supervised by Donald Kossmann and I am working in the field of distributed databases. I am working there on a distributed shared memory database.

    Besides work, I am a fairly good skier (I grew up in a town in the alps and it was a five minute walk from my home to the next skiing resort) and I do bouldering (some kind of climbing).

    Sudip Roy

    Sudip Roy

    I am 4th year PhD candidate in the Big Red Data Group at Cornell University. I am advised by Prof. Johannes Gehrke. My broad research interest lies in the area of databases and data mining. At Cornell, I am currently working on exploiting transaction semantics to improve performance of geo-replicated data stores. Further details about me and my research can be found at www.cs.cornell.edu/~sudip.

    I play different racquet sports, including squash, badminton, tennis and ping-pong, at different skill levels. I also enjoy hiking, kayaking, and am generally interested exploring new places and activities.

    Yanyan Shen

    Yanyan Shen

    Hi, my name is Yanyan Shen, and I am working with Kaushik Chakrabarti & Surajit Chaudhuri on the relational data search project in this summer.

    I am a PhD student from National University of Singapore, supervised by Prof. Ooi Beng Chin. I am interested in various aspects in database, from web data management to cloud computing.

    It is so great for me to come to a place that has cool weather! And I really love here!If anyone loves movie, music, shopping, hiking, table tennis, I am very happy to talk and play with you. No basketball please, although I am a tall girl. 🙂

    Akash Das Sarma

    Akash Das Sarma

    I am a first year PhD student with Prof. Jennifer Widom at Stanford University. I have interned at MSR in the past (2011 summer with the Theory group) and am excited at the prospect of working in the DMX group this summer. I have most recently been working on crowd algorithms for human computing problems for my PhD, though I have dabbled in a number of other areas during my undergrad years. I enjoy playing chess, basketball and soccer and am always looking for new people to play with!

    Thrivikrama Taula

    Thrivikrama Taula

    I have finished first year of my Masters from University of Illinois at Urbana-Champaign, where I did my research in Data Mining under Prof. Jiawei Han.

    My other interests include following various tech-blogs and posts, financial markets, soccer (die-hard Manchester United fan!), singularity. In my free time, I like to play badminton, yoga or to hit the gym.

    Tomasz Tylenda

    Tomasz Tylenda

    I am a PhD student at the Max Planck Institute for Informatics in Saarbrücken, Germany, where I work with Prof. Gerhard Weikum. My research interests include knowledge base exploration and information extraction. Prior to joining the Max Planck Institute, I studied in Poland at the University of Wrocław.

    In my free time I do sports that don’t use a ball, in particular cycling and rock climbing. I watch alternative movies and read mostly non-fiction.

    You can find my homepage at: http://www.mpi-inf.mpg.de/~ttylenda/ (opens in new tab)

    Chi Wang

    Chi Wang

    I am a PhD candidate in my 5th year in University of Illinois at Urbana-Champaign, advised by Jiawei Han. I interned in MSR Redmond/XCG in last two summers and MSR Asia in 2009.

    I am interested in mining latent entity structures to better organize linked information.

    Find more at http://web.engr.illinois.edu/~chiwang1/ (opens in new tab)

    Sheng Wang

    Sheng Wang

    I am a PhD student of computer Science from National University of Singapore. I joined the database group in 2011, supervised by Prof. Beng Chin OOI. My research interests are mainly in cloud computing and database systems, including indexing, query processing and data management, especially for supporting write-intensive workloads. Details could be found on my web page: http://www.comp.nus.edu.sg/~wangsh/ (opens in new tab)

    I like to play all kinds of sports in my spare time. I fairly enjoy billiard sports: pool, snooker, carom and any possible variants.

    Mohan Yang

    Mohan Yang

    I am a second year Ph.D. student supervised by Professor Carlo Zaniolo in the Department of Computer Science at UCLA. I obtained my B.E. degree in computer science from Shanghai Jiao Tong University in 2010. Prior to joining UCLA, I worked in a startup company with my friends. My research interests include database systems and data mining. I enjoy kayaking and cycling in my free time.

  • Post-Doc

    Vadim Savenkov

    Vadim Savenkov

    I just finished my PhD at Vienna University of Technology (opens in new tab), under the supervision of Reinhard Pichler (opens in new tab). My research focus is on information integration. In particular, I’ve been working on management and optimization of schema mappings and on algorithms for data exchange (visit my homepage (opens in new tab) to learn more). I also hold a Master’s degree in Computational Logic (Vienna University of Technology (opens in new tab) and Dresden University of Technology (opens in new tab)) and an Engineer’s degree from Bauman Moscow State Technical University (opens in new tab). Here at Microsoft Research, I will be working with Phil Bernstein (opens in new tab) on extending the ADO.NET Entity Framework.

    Interns

    Aaron Elmore

    Aaron Elmore

    I am a third year PhD student at UC Santa Barbara, under supervision of Divy Agrawal and Amr El Abaddi. I have a BS from DePaul University and MS from the University of Chicago. My primary research focus is on building tools and primitives for elastic multitenant databases. I am also exploring data solutions for ecologists at the National Center for Ecological Analysis and Synthesis (NCEAS), and data consistency in multi-datacenter and mobile environments. More information can be found at http://cs.ucsb.edu/~aelmore (opens in new tab).

    Jeffrey Jestes

    Jeffrey Jestes

    Hello everybody! I am a fourth year PhD student studying in the School of Computing at the University of Utah with Feifei Li. This is my second internship at Microsoft Research, and I am very excited to work with Kris Ganjam in the DMX group. My general research interests are summarizing massive data in distributed and parallel frameworks (such as MapReduce); ranking, monitoring, and tracking big data; and scalable query processing in large databases. I am also interested in text processing and uncertainty in data. To learn more about my research you can visit my homepage at http://www.cs.utah.edu/~jestes/ (opens in new tab).

    Feng Li

    Feng Li

    I am currently a 3rd year PhD Student in the Department of Computer Science (opens in new tab), National University of Singapore (opens in new tab). I was honored to join the database group (opens in new tab) in 2009, supervised by Prof. Ooi Beng Chin (opens in new tab). From 2005 to 2009, I studied in Peking University (opens in new tab), Beijing, China and obtained my BSc degree from the Department of Electronic Engineering and Computer Science (opens in new tab). My research interests are mainly in cloud computing and database system, including indexing and query processing. I am also interested in microblog data processing. My home page is http://www.comp.nus.edu.sg/~li-feng/ (opens in new tab).

    Semih Salihoglu

    Semih Salihoglu

    I am a third year PhD student at Stanford University, advised by Jennifer Widom. I’m mainly interested in systems/algorithms to do distributed/parallel graph computations. Before PhD, I was a software engineer at Google. Outside of work, I enjoy playing soccer and going running.

    Tallat Shafaat

    Tallat Shafaat

    I am a final year Ph.D student at KTH – Royal Institute of Technology, Sweden. I’m working on large-scale distributed systems, with focus on P2P/decentralized techniques and distributed Key-value stores, under the supervision of Ali Ghodsi and Seif Haridi. I am originally from Pakistan; I did my under-graduate from GIK Institute.

    My homepage is located at: http://www.sics.se/~tallat/ (opens in new tab).

    Bilyana Taneva

    Bilyana Taneva

    My name is Bilyana Taneva and I am a PhD student at the Max-Planck Institute for Informatics in Germany. I am advised by Gerhard Weikum and I am working on analyzing and extracting data about entities. My general research interests are data mining and information retrieval.

    In my free time I enjoy skiing, climbing, and hiking, and I would love to explore the mountains in the region.

    Chi Wang

    Chi Wang

    I am a PhD candidate to be finishing my 3rd year in University of Illinois at Urbana-Champaign, advised by Jiawei Han. I am interested in mining information networks, especially finding latent roles and relations of linked objects. I interned in MSR Redmond and MSR Asia in 2011 and 2009.

    Visit my home page http://www.cs.illinois.edu/homes/chiwang1 (opens in new tab) for more information.

    I play basketball and baseball in my leisure time, and I watch pro Starcraft games.

    Mohan Yang

    Mohan Yang

    I am a first year Ph.D. student supervised by Professor Carlo Zaniolo in the Department of Computer Science at UCLA. I obtained my B.E. degree in computer science from Shanghai Jiao Tong University in 2010. Prior to joining UCLA, I worked in a startup company with my friends. My research interests include data stream management systems and data mining. I enjoy kayaking and cycling in my free time. My home page is http://www.mhyang.com (opens in new tab).

    Meihui Zhang

    Meihui Zhang

    My name is Meihui, a fourth year PhD student from National University of Singapore (NUS) supervised by Prof. Beng Chin OOI (opens in new tab). I am glad to have this opportunity to join DMX group. My research interests mainly focus on database issues. I am currently working on database exploration, which is to design algorithms to analyze database instances to efficiently and accurately discover database schema elements, such as keys, meaningful join paths, and implicit relationships inherent in the data.

    Please visit my homepage at http://www.comp.nus.edu.sg/~zmeihui (opens in new tab) for more about me.

  • Visiting Researchers

    Guillem Rull Fort

    Guillem Rull Fort

    I got my PhD on January 2011 in the Department of Software at the Technical University of Catalonia (UPC), in Barcelona, Spain. My thesis was about the application of query containment techniques to the validation of schema mappings, and was sponsored by Microsoft Research under the MSR European PhD Scholarship program. My supervisor was Dr. Ernest Teniente. My research interests so far have been focused on reasoning on database/conceptual schemas and mappings. Besides research, I enjoy a good book, watching movies, and I also like to play videogames.

    Interns

    Bolin Ding

    Bolin Ding

    I’m a PhD student in the Department of Computer Science at UIUC. My advisor is Prof. Jiawei Han. I’m interested in efficient algorithms and index structures for datamining and databases in general. Besides efficiency, I also care about people’s privacy. Before joining UIUC, I got my MPhil degree on System Engineering in the Chinese University of Hong Kong under the supervision of Prof. Jeffrey Xu Yu, and my BS degree on Math and Applied Mathematics in Renmin University of China. I play Go, Pingpong (table tennis), and basketball. My homepage: https://netfiles.uiuc.edu/bding3/www/ (opens in new tab).

    Daniel Fabbri

    Daniel Fabbri

    I am a fifth year Ph.D. student at the University of Michigan, working with Kristen LeFevre. My research interests are database systems and database security. My current research is focused on analyzing access patterns in electronic health record systems and explaining why these accesses occur. Outside of research, I enjoy playing a variety of sports ranging from soccer to volleyball.

    You can find my homepage at http://www.eecs.umich.edu/~dfabbri (opens in new tab).

    Yupeng Fu

    Yupeng Fu

    Hi, everyone, I am Yupeng, a third-year PhD student from UC San Diego working with Prof. Yannis Papakonstantinou. My research interest is in the intersection of database and web technologies.

    Outside of research, I play basketball, tennis, swimming and do many out-door activities.

    Manish Gupta

    Manish GuptaI completed my bachelors (TR) in computer science from Mumbai University in 2005. After finishing my Masters (TR) under the supervision of Dr. Soumen Chakrabarti in computer science at IIT Bombay, I worked at HotJobs, Yahoo! Bangalore from 2005 to 2007. I joined for a PhD in data mining under the guidance of Dr. Jiawei Han at University of Illinois at Urbana Champaign in Aug 2009. In summer 2010, I interned at IBM T. J. Watson Research Center under Dr. Charu Aggarwal. My research interests are in research and development in the areas of Data Mining, Information Retrieval and Web Mining, Social Computing and Algorithms.

    WebPage: http://www.cs.illinois.edu/homes/gupta58/ (opens in new tab).

    Willis Lang

    Willis Lang

    I am a PhD student from the University of Wisconsin-Madison advised by Prof. Jignesh Patel. I am also a graduate student member of the Microsoft Gray Systems Lab in Madison. My area of research within data management focuses on reducing datacenter operating costs through efficient cluster configuration, provisioning, and energy management as well as effective workload scheduling and load balancing.

    Additional details can be found at pages.cs.wisc.edu/~wlang (opens in new tab).

    Feng Li

    Feng Li

    I am currently a 2nd year PhD Student in the Department of Computer Science (opens in new tab), National University of Singapore (opens in new tab). I was honored to join the database group (opens in new tab) in 2009, supervised by Prof. Ooi Beng Chin (opens in new tab). From 2005 to 2009, I studied in Peking University (opens in new tab), Beijing, China and obtained my BSc degree from the Department of Electronic Engineering and Computer Science (opens in new tab). My research interests are mainly in cloud computing and database system, including indexing and query processing.

    My home page is http://www.comp.nus.edu.sg/~li-feng/ (opens in new tab).

    Jiexing Li

    Jiexing Li

    Hi everyone. My name is Jiexing Li (Jessie). I am a 2nd year PhD student in the Department of Computer Sciences, University of Wisconsin – Madison, under the supervision of Prof. Jeff Naughton. I am interested in database research, with a focus on query progress indicators, query optimization and parallel databases. This summer, I am working on admission control for database queries with Surajit Chaudhuri and Vivek Narasayya.

    My homepage can be found in http://pages.cs.wisc.edu/~jxli/ (opens in new tab).

    Ndapandula Nakashole

    Ndapandula Nakashole

    I am a PhD student at the Max Planck Institute for Informatics, Germany, where I work with Prof. Gerhard Weikum. Before that I completed a Bsc and an MSc at the University of Cape Town, South Africa.

    I am interested in Information Extraction and more broadly, in Large-Scale Data Analytics.

    I enjoy reading a good book. I also like to work up a sweat, outdoors or at the gym.

    Hyunjung Park

    Hyunjung Park

    I am a third year Ph.D. student at Stanford University working with Prof. Jennifer Widom. My research interests include database systems and cloud computing. In particular, I am currently working on a database system that can fetch data from various external sources like crowdsourcing and web. Besides work, I enjoy traveling and scuba diving.

    For more information about me, please visit my homepage at http://infolab.stanford.edu/~hyunjung/ (opens in new tab).

    Vamsidhar Thummala

    Vamsidhar Thummala

    I go by Vamsi for short. I’m a PhD student at Duke University working with Prof. Shivnath Babu and Prof. Jeff Chase. My research interests include database systems and energy-efficient computing. In particular, I work on improving query optimizer to deal with dynamic nature of the resources in virtualized environments.

    Besides research, I enjoy cooking, reading, and volleyball. More at: http://www.cs.duke.edu/~vamsi (opens in new tab)

    Chi Wang

    Chi Wang

    I am a PhD candidate in my 2nd year in University of Illinois at Urbana-Champaign, advised by Jiawei Han. I work in Data Mining and I am especially interested in mining information networks such as finding latent roles and relations of linked objects. Now I work in DMX group in Redmond and my mentor is Kaushik Chakrabarti. In 2009 I worked in Theory group of MSRA with Wei Chen and Yajun Wang on developping a scalable algorithm for influence maximization in social networks.

    Visit my home page http://www.cs.illinois.edu/homes/chiwang1 (opens in new tab) for more information.

    Mohamed Yakout

    Mohamed YakoutHi, my name is Mohamed Yakout. I am a PhD candidate in the computer science department at Purdue University. My research focuses on data cleaning and data integration, including the situations where data privacy is a concern. Precisely, I focus on user centric techniques to improve the data quality. My advisor is Prof. Ahmed K. Elmagarmid. I have just received the Bilsland Dissertation Fellowship by the Purdue Graduate School.

    My home page http://www.cs.purdue.edu/homes/myakout (opens in new tab)

    In my free time, I enjoy playing with my little princess Jasmine (20 months old) and I go to the gym.

    Tao Zou

    Tao Zou

    I am Tao Zou, a second year PhD student at Cornell University working with Johannes Gehrke. I am interested in database systems and cloud computing. In particular, my research focuses on building scalable data-driven systems, and validate their properties through extensive experiments.

    Besides research, I like playing table tennis, basketball, computer games, and traveling.

  • Interns

    Bahman Bahmani

    Bahman BahmaniI am a PhD student in Stanford University, working with Prof. Ashish Goel. My main research interest is in algorithmic aspects of large scale web applications. Recently, I have been focusing on recommendation and personalization over social networks.

    I listen to music almost all the time! Other than that, in my spare time, I enjoy a wide range of activities, including but absolutely not limited to going to gym, practicing martial arts (Kenpo Karate), reading books or papers on brain sciences (specially evolutionary psychology and neuroscience), dining out, etc.

    Klaus Berberich

    Klaus BerberichHi, my name is Klaus Berberich. I am a PhD student at the Max-Planck Institute for Informatics working with Srikanta Bedathur and Gerhard Weikum.

    The focus of my PhD thesis has been on improving search in web archives. My general research interests lie on the border between data management and information retrieval.

    When not at work, I enjoy travelling, reading a good book, and listening to music. Beyond that, I am interested in photography and guitars.

    For more information about me please visit: http://www.mpi-inf.mpg.de/~kberberi (opens in new tab).

    Bolin Ding

    Bolin Ding

    I’m a PhD student in the Department of Computer Science at UIUC. My advisor is Prof. Jiawei Han. I’m interested in efficient algorithms and index structures for datamining and databases in general. Besides efficiency, I also care about people’s privacy. Before joining UIUC, I got my MPhil degree on System Engineering in the Chinese University of Hong Kong under the supervision of Prof. Jeffrey Xu Yu, and my BS degree on Math and Applied Mathematics in Renmin University of China. I play Go, Pingpong (table tennis), and basketball.

    My homepage: https://netfiles.uiuc.edu/bding3/www/ (opens in new tab)

    Yeye He

    Yeye He

    My name is Yeye He. I am a PhD student at University of Wisconsin-Madison working with Professor Jeff Naughton on database privacy. Prior to returning to school for PhD I worked at Oracle Corporation on data warehousing performance. This summer I will be working with Dong Xin on data exploration projects. I like traveling and watching movies in my spare time.

    For more information please visit my homepage at http://www.cs.wisc.edu/~heyeye (opens in new tab).

    Hideaki Kimura

    Hideaki KimuraMy name is Hideaki Kimura. I’m a 3rd year PhD student of the database research group at Brown University advised by Stan Zdonik. My research topic is query optimization and automatic physical database design.

    Besides research and coding, I love biking.

    For more information about me please visit: http://www.cs.brown.edu/people/hkimura/ (opens in new tab)

    Jian Li

    Jian LiMy name is Jian Li. I am a Ph.D. student at University of Maryland, College Park.

    My research interests include databases and algorithms. In particular, I am working on ranking over probabilistic databases and some stochastic optimization problems.

    I got my BSc degree from Sun Yat-sen(Zhongshan) University, China and MSc degree in computer science from Fudan University, China.

    I like playing soccer, table tennis, and traveling.

    My homepage: www.cs.umd.edu/~lijian (opens in new tab)

    Kristi Morton

    Kristi MortonKristi Morton is a third year PhD student in the department of Computer Science and Engineering at the University of Washington (UW). At UW, Kristi works with her advisors, Magdalena Balazinska in the Databases Group and Dan Grossman in the Programming Languages Group, on improving the software tools in data-intensive, distributed systems.

    Details of her research projects can be found here: http://www.cs.washington.edu/homes/kmorton/ (opens in new tab)

    In her spare time, she plays drums, sings, and writes Computer Science-themed parodies in the UW Computer Science and Engineering Band (also known as Parody Bits). Their music can be found here: http://www.youtube.com/user/CSEBand (opens in new tab).

    Aditya Ganesh Parameswaran

    Aditya Ganesh ParameswaranI’m a third year PhD student in the Infolab at Stanford University, working with Prof. Hector Garcia-Molina. I’m broadly interested in algorithmic questions underlying the management and utilization of large data. I graduated from the Indian Institute of Technology (IIT) Bombay in 2007. Apart from work (opens in new tab), I enjoy trying out new restaurants and cuisines (averaging at 2 or so a week), tennis, geocaching (opens in new tab), playing the drums (badly) and watching inspiring (opens in new tab) youtube videos.

    My homepage is located at http://www.stanford.edu/~adityagp/ (opens in new tab).

    Hyunjung Park

    Hyunjung ParkI am Hyunjung Park, a second year Ph.D. student at Stanford University working with Jennifer Widom. My research at the Stanford InfoLab focuses on data provenance.  Besides work, I enjoy traveling and scuba diving.

    My homepage is at http://infolab.stanford.edu/~hyunjung/ (opens in new tab).

    Hyunjung recently won the ACM SIGMOD 2010 Programming Contest (opens in new tab) , where he (single-handedly) built a distributed query execution engine that was almost twice as fast as the runner-up team’s system!

    Senjuti Basu Roy

    Senjuti Basu RoyI am a PhD candidate at the UT Arlington, just completed my 3rd year in PhD there. My PhD research majorly focuses upon different exploration techniques on large databases. I have been a part of Database Exploration Lab at UT Arlington and I happily call that my second home :].

    Besides that, I love dancing, debating and traveling (a lot :]).

     My homepage : http://dbxlab.uta.edu/dbxlab/senjuti.html (opens in new tab)

    Ian Rae

    Ian RaeHi, my name is Ian Rae, and I’ve just completed my second year of Ph.D. studies at the University of Wisconsin—Madison, advised by Jeff Naughton. In my studies there, I work with the Microsoft Jim Gray Systems Lab to improve parallel database support for handling unstructured text data.

    In my somewhat non-existent spare time, I read science fiction and fantasy novels, play computer games, and plan my wedding (October!). I also occasionally take pictures that David DeWitt calls “goofy,” an example of which is available to the right.

    In a past life, before becoming a graduate student, I used to go indoor rock climbing. I’ve been told that there are several nice climbing gyms around here, so I hope to go again before the end of the summer.

    Gurgen Tumanyan

    Gurgen Tumanyan

    Hi, my name is Gurgen Tumanyan, and I am a Master’s student at San Francisco State University, advised by Dragutin Petkovic. I work with Helix group at Stanford Bioengineering department, where I employ computer science black and white magic to predict protein function from the 3D structure.

    When I am not pondering on bioengineering problems and not ruminating on software engineering challenges 🙂 , I usually can be found with my camera on the streets of the nearest big city with my camera, or playing pool. In my previous life I used to play tennis and would love to play again.

    Guozhang Wang

    Guozhang WangHi, my name is Guozhang. I’m a second year PhD student at Cornell University working with Johannes Gehrke. I am interested in data management and cloud computing. In particular, I am working on applying query optimization and parallel processing techniques to large scale behavioral simulations. Before that I have worked on privacy-preserving data publishing. You can find out more about me at www.cs.cornell.edu/~guoz (opens in new tab).

    I am a big fan of NBA (Celtics rocks!) and I also play basketball a lot myself. Also I like playing video games (NBA Live 2005, 2006, 2007…) and hiking in my free time.

    Mohamed Yakout

    Mohamed YakoutHi, my name is Mohamed Yakout. I am a third year PhD student at Purdue University. My advisor is Prof. Ahmed K. Elmagarmid. My research focuses on data cleaning and data integration, including the situations where data privacy is a concern.

    My home page http://www.cs.purdue.edu/homes/myakout (opens in new tab).

    I am originally from Egypt and had my Master degree from Alexandria University. Prior pursuing my PhD, I worked for Bibliotheca Alexandrina (The new library of Alexandria) where I participated in several digital library projects.

  • Parag Agrawal

    Parag Agrawal

    I’m a PhD student in the Department of Computer Science (opens in new tab) at Stanford University (opens in new tab). My advisor is Jennifer Widom (opens in new tab). I work in the Stanford InfoLab (opens in new tab). I graduated from IIT Bombay (opens in new tab) in 2005 with a Bachelors in Computer Science (opens in new tab).

    Bolin Ding

    Bolin Ding

    My name is Bolin Ding. I’m finishing my 2nd year of Ph.D. program at UIUC. My advisor is Prof. Jiawei Han. I’m interested in IR techniques in databases and data warehouses, and pattern-based query processing and mining. In general, I’m interested in efficient algorithms for database and datamining problems. Before joining UIUC, I got my M.S. on System Engineering in the Chinese University of Hong Kong under the supervision of Prof. Jeffrey Xu Yu, and my B.S. on Math and Applied Mathematics in Renmin University of China. I play Go, Pingpong (table tennis), and basketball. My homepage: https://netfiles.uiuc.edu/bding3/www/

    Michaela Goetz

    Michaela Goetz

    Hi, my name is Mila. I’m a second year PhD student at Cornell University working with Johannes Gehrke and Christoph Koch. I am interested in data management. In particular, I am working on uncertain databases and privacy-preserving data publishing. You can find out more about me here.

    In my free time, I enjoy playing tennis during the summer and ice hockey during the winter.

    Yeye He

    Yeye He

    My name is Yeye He. I am a PhD student at University of Wisconsin-Madison working with Professor Jeff Naughton on database privacy. Prior to returning to school for PhD I worked at Oracle Corporation on data warehousing performance. This summer I will be working with Dong Xin on data exploration projects. I like traveling and watching movies in my spare time. Please visit my homepage at http://www.cs.wisc.edu/~heyeye for more information about me.

    Mei Hui

    Mei HuiI’m a PhD candidate at Computer Science Department, National University of Singapore. My advisor is Prof. Ooi BengChin. My major research interests are Community Database Management, Information Retrieval and Web 2.0.

    In my free time I like to play badminton, swim and watch movies.

    If you want to find more about me, visit my website: http://www.comp.nus.edu.sg/~huimei

    Gjergji Kasneci

    Gjergji KasneciGjergji Kasneci is a doctoral student at the Max-Planck Institute for Informatics. He received his M.S. degree from the University of Marburg in Germany, where he was awarded with the Fellowship of the German National Academic Foundation.

    Most of his research focuses on graph-based Information Retrieval and Semantic Search. His main projects are the NAGA (http://www.mpi-inf.mpg.de/~kasneci/naga/) and the YAGO (http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/) system.

    Additional information about Gjergji can be found at: http://www.mpi-inf.mpg.de/~kasneci/

    Hongrae Lee

    Hongrae Lee

    Hi, my name is Hongrae Lee and I’m finishing my 2nd year of Ph.D. program at the University of British Columbia, Canada. My advisor is Prof. Raymond Ng and I’m interested in approximate query processing in text databases and its optimization. This summer, I’ll be working with Dr. Surajit Chaudhuri in Autoadmin project. In my spare time, I like doing sports like soccer, tennis, and ski. I hardly say no to coffee or delicious foods:) You could find more about me here (http://www.cs.ubc.ca/~xguy).

    Abhijeet Mohapatra

    Abhijeet Mohapatra

    Hello, my name is Abhijeet Mohapatra. I am first year PhD student at Stanford University. My advisor is Jennifer Widom. I am interested in Uncertain Data Modeling, Data Mining techniques(especially for Recommendation Systems). I love playing basketball and swimming. I do quite a bit of sketching too. This summer, I am working under Ravi Ramamurthy on the Auto Admin Project.

    For more information, you can visit my homepage (opens in new tab).

    Vijendra Singh Purohit

    Vijendra Singh Purohit

    Homepage: http://purdue.academia.edu/VijendraSinghPurohit (opens in new tab)

    Zhijun Yin

    Zhijun Yin

    My name is Zhijun Yin. I am a second-year PhD student at University of Illinois, Urbana and Champaign under the supervision of Professor Jiawei Han. Before coming to UIUC, I got my B.S. from Fudan University in 2007. My research interests focus on applying data mining techniques to solve interesting web applications.

    You can find more about me at www.cs.uiuc.edu/homes/zyin3 (opens in new tab).

  • Ioannis Antonellis

    Ioannis AntonellisMy name is Ioannis (Yannis) Antonellis and i am a 2nd year CS Phd student at Stanford University. My advisor is Hector Garcia-Molina and i am working on collaborative techniques and their applications on query log analysis for web and sponsored search.

    This summer I am interning in the Data Management, Exploration and Mining group, working with Christian Konig on analyzing web browsing logs.

    For more information please visit my homepage: http://www.stanford.edu/~antonell

    Also, i regularly (plan to) write on the Stanford Infoblog: http://infoblog.stanford.edu

    Fei Chiang

    Fei ChiangI am a PhD student in the Department of Computer Science at the University of Toronto. I am a member of the Database Research Group and my advisor is Prof. Renée J. Miller. My current research interests are in the efficient management of uncertain and inconsistent data, data quality, data mining, and meta-data management.

    In my spare time, I like to play tennis, hike and run.

    This summer I’ll be working with Raghav Kaushik and Vivek Narasayyaon problems in data cleaning You can find out more about me at www.cs.toronto.edu/~fchiang.

    Hicham Elmongui

    Hicham ElmonguiHi, my name is Hicham Elmongui, and I am a PhD candidate in the Computer Science Department at Purdue. My research is in the area of databases. Specifically, I am interested in the query optimization for moving objects databases. My advisor is Prof Walid Aref.

    This summer, I am interning with the DMX group. My manager is Vivek Narasayya, and I will work with Ravi Ramamurthy as well. In 2006 and 2007, I interned with the Database group at MSR, where my manager was Paul Larson. I also worked with Jingren Zhou.

    I received Frederick N. Andrews Fellowship. I am also a recipient of the Purdue University Outstanding Teaching Award, and the Outstanding Service to the Purdue CS Department Award. Besides, I was the college valedictorian when I graduated Summa Cum Laude with my B.S. in Computer Science and Automatic Control from Alexandria University, Egypt.

    In my free time, I enjoy playing with my four year old son, Yahya. My hobbies include reading and travelling.

    Ling Hu

    Ling Hu

    My name is Ling Hu. I am a second year PhD student from Northeastern University. I work with Prof. Donghui Zhang in database lab, CCIS. My research interests are query optimization, data warehousing, database security issues.I like reading and watching movies. I do Yoga and go to gym regularly. You can get to know more about me here.

    Hongrae Lee

    Hongrae Lee

    Hi, my name is Hongrae Lee and I’m finishing my 2nd year of Ph.D. program at the University of British Columbia, Canada. My advisor is Prof. Raymond Ng and I’m interested in approximate query processing in text databases and its optimization. This summer, I’ll be working with Dr. Surajit Chaudhuri in Autoadmin project. In my spare time, I like doing sports like soccer, tennis, and ski. I hardly say no to coffee or delicious foods:) You could find more about me here(http://www.cs.ubc.ca/~xguy).

    Rimma Nehma

    Rimma NehmaMy name is Rimma Nehme. I am a PhD student at Purdue University. My research area is query optimization, more specifically I am working the application of machine learning techniques to query optimization and efficient query processing in streaming databases. In my free time, I like to run, draw, ski and watch movies (and I also like to make movies).

    This summer I am working in the DMX group with my mentor Nico Bruno.

    If you would like to find out more about me, visit my website: http://www.cs.purdue.edu/homes/rnehme/

    Christopher Re

    Christopher Re

    Christopher (Chris) Ré is a graduate student in the department of Computer Science and Engineering at the University of Washington advised by Dan Suciu. Chris’ interests are theoretical and practical problems in data management. Details of his work can be found here. His thesis work in probabilistic data management will be completed at the end of the current academic year (’09).

    Karl Schnaitter

    Karl SchnaitterI am a PhD candidate at University of California, Santa Cruz… GO SLUGS! My advisor is Alkis Polyzotis, and our main work has dealt with on-line physical tuning and processing top-k join queries. I also have interests in programming languages and algorithm analysis. I like to spend a lot of my free time on trails, either running or hiking. I have also played guitar for 12 years, and I am a member of a rock band called God of Shamisen.

    I am very excited to be visiting Microsoft and working with Nico this Summer! We will be working on problems in physical design tuning.

    My homepage: http://www.soe.ucsc.edu/~karlsch/

    Tianyi Wu

    Tianyi WuHi, my name is Tianyi Wu. I am a third-year Ph.D. student in the Department of Computer Science, University of Illinois at Urbana-Champaign, where I work under the supervision of Dr. Jiawei Han. My research interests include data warehousing and OLAP, ranking queries, and association mining. Before joining U of I, I got my B.S. in Fudan University, China. It’s exciting that I’ll be an intern working with Dr. Kaushik Chakrabarti and other researchers in DMX this summer. My hobbies maybe too much to be listed here. Generally I like various sports but for now I enjoy playing pool with my dad most.

    Please visit my homepage at http://ews.uiuc.edu/~twu5 to find out more about me.

  • Faculty

    Dan Suciu

    Dan SuciuDan Suciu is a professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, spent five years at AT&T Labs then joined the University of Washington in 2000. Dan is conducting research in data management, with an emphasis on topics that arise from sharing data on the Internet, such as management of semistructured and heterogeneous data, data security, and managing data with uncertainties. He is a co-author of the book Data on the Web: from Relations to Semistructured Data and XML, holds six US patents, received the 2000 ACM SIGMOD Best Paper Award, is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship.

    Interns

    Arjun Dasgupta

    Arjun DasguptaHi, I am Arjun Dasgupta. I am a Master’s graduate from the University of Texas at Arlington and a part of the Database Exploration Group at UT Arlington headed by Dr. Gautam Das. My research interests include Information Retrieval from databases, Data Mining and Exploration. I am starting off my PhD from fall where I plan to work on Data Mining from web sources. Wish me luck!

    In my free time I like to fish, bike and cook. I am very excited to be working here at Microsoft with Vivek Narasayya in the DMX group over the summer.

    For more about me, visit my personal webpage at http://arjundasgupta.com/default.aspx.

    Abhay Jha

    Abhay Jha

    I am a first-year graduate student from UW, Seattle. At UW, I was working with Dan Suciu on query evaluation over probabilistic databases with constraints.

    I am going to work with Arvind Arasu and Raghav Kaushik on the Data Cleaning project. Besides working, wasting time on internet and watching movies, this summer I am planning on exploring this beautiful city, and visit some places I have been meaning to for long.

    Bhargav Kanagal

    Bhargav KanagalMy name is Bhargav Kanagal. I am currently a 2nd year PhD student in the University of Maryland, College Park.

    I work with Dr. Amol Deshpande in the MauveDB project, that aims at efficiently incorporating Statistical & Probabilistic Models inside relational database systems; this offers accurate and meaningful answers to queries over uncertain / incomplete data.

    I will be working in the Auto-Admin Project with Dr. Ravi Ramamurthy this Summer. My hobbies include Classical music, Tennis and watching movies.

    Kristen LeFevre

    Kristen LeFevreHi, my name is Kristen LeFevre. I just completed my Ph.D. at the University of Wisconsin – Madison, where my co-advisors were David DeWitt and Raghu Ramakrishnan. My Ph.D. thesis was in the area of database privacy, including techniques for protecting individual anonymity in data publishing. For more information about this work, please visit my Wisconsin website (http://www.cs.wisc.edu/~lefevre).

    I am very pleased to be visiting Microsoft this fall as a post-doctoral intern, where I will be working with Arvind Arasu and Surajit Chaudhuri on the data cleaning project. Following my internship, I will join the University of Michigan as an Assistant Professor.

    Rimma Nehme

    Rimma NehmeMy name is Rimma Nehme.

    I am a PhD student in the Computer Science department at Purdue University. My PhD research is in the area of continuous spatio-temporal query processing and optimization. My advisors are Prof Walid Aref and Mourad Ouzzani. I obtained a Masters degree in Computer Science from Worcester Polytechnic Institute (WPI) in 2005 where I have worked with Prof Elke Rundensteiner. Prior to that I have worked at EMC Corporation for 4 years on the Symmetrix (enterprise storage) Solutions Enabler API (SYMAPI) and CLI for managing data storage arrays. My hobbies include skiing, running, delicious foods, travelling and watching movies. Please check my my homepage for more information about me.

    Svetlana Marinova

    Svetlana MarinovaI am Svetlana Marinova from Bulgaria, 3rd year Ph.D student at Technical University – Sofia in Database Systems.

    I am happy to be back for a second internship at MSR and to be part of the team again. Areas that interest me include cryptography, information systems, programming languages and distributed databases. In my free time I love to travel, read books, going to the movies and watch different kinds of sport (sometimes take part in it as well).

    If you are interested to know more, please visit my homepage.

    Rares Vernica

    Rares VernicaI am a Graduate Student at School of Information and Computer Sciences, University of California, Irvine. My research interests are in the area of databases. In particular, my focus is on data integration, data uncertainty, and data lineage.

    My advisor is Prof. Chen Li. For more information about my research and the projects I am involved in, please visit my home page at http://www.ics.uci.edu/~rvernica.

    In my spare time I practice Kendo, Japanese fencing, where I hold the rank of 3kyu.

  • Bee-Chung Chen

    Bee-Chung Chen

    My name is Bee-Chung Chen. I am a Ph.D. student at University of Wisconsin – Madison working with Prof. Raghu Ramakrishnan on data mining and database-related topics. I received my B.S. and M.S. degrees in Computer Science and Information Engineering from National Taiwan University. For more information about me, please visit my homepage.

    Eric Chu

    Eric Chu

    Eric is a 3rd-year PhD student from the University of Wisconsin-Madison, where he works with Prof. Jeff Naughton.

    He returns to DMX for a second summer to work with Sanjay Agrawal and Vivek Narasayya in the AutoAdmin project.

    Eric enjoys swimming and playing badminton in his free time.

    Shantanu Joshi

    Shantanu JoshiI am a PhD student in the Database Center at the University of Florida. My PhD thesis titled ‘Randomization Techniques for Approximate Query Processing’ is advised by Prof Chris Jermaine. I obtained a Masters degree in Computer Science from Florida in 2003 and Bachelors in Engineering from University of Bombay in 2000.

    My hobbies include music, exploring new places and sports. Please check my homepage for more about me.

    Svetlana Marinova

    Svetlana MarinovaMy name is Svetlana Marinova and I am a second year PhD student in the area of Database Systems in Technical University of Sofia.

    The topic of my dissertation is “Organization and program realization of information system for human resources management”.

    I have over 7 years of experience with Windows orientated programming including ASP/ASP.NET, Visual Basic, C/C++, Visual C++, and MFC. Other areas of my interest include DHTML; JavaScript; HTML.

    My scientific interests are relational Databases; Information Retrieval Systems; Distributed Database Systems; Cryptographic Algorithms and Protocols;

    Program languages; Algorithms. I am interested in social psychology, creativity and methods for self improvement.

    In my spare time I enjoy sightseeing, shopping, watching movies and recently cooking. I love traveling, swimming and tae-bo.

    Abhijit Pol

    Abhijit Pol

    I am a fourth year Ph.D. student at the University of Florida. I am advised by Dr. Chris Jermaine and co-advised by Dr. Alin Dobra. My primary research interests are in the area of Approximate Query Processing and Online Aggregation. I am also interested in Physical Database Design and Indexing. In my PhD thesis, I am investigating different challenges in supporting approximation in subset-based SQL queries. Before this, I did my bachelors in Mechanical Engineering and pursued my masters in Industrial and Systems Engineering at University of Florida. Off computer screen, I play tennis, squash, chess, and pool. I love traveling and camping and am a self-discovered poet.

    Anish Das Sarma

    My name is Anish Das Sarma. I finished my 2nd year as a PhD student in the Computer Science department at Stanford. I will be working on the Data Cleaning project here. It is great to be back here at MSR! Apart from CS, I am interested in Chess, Table Tennis, Foosball, Badminton, Tennis, and every sport! I also love puzzle solving, music, movies, and other activities. So, if any of you are interested in any of these, let me know!

    For more information, you are welcome to visit my homepage.

    Liying Sui

    Liying SuiBefore joining UCSD, I got my B.S. degree from Shandong University, P.R.China, and my M.E. degree from Institute of Computing Technology, Chinese Academy of Science.

    Having spent six years in the PhD program in UCSD database group gave me a lot of experience in logic and databases. I have been working with my advisor, Victor Vianu and Alin Deutsch, on specification and verification of interactive, data-driven Web Services/Application, workflow areas. We make extensive use of database optimization and model checking techniques in our research. I am happy to be here for my internship.

    Dong Xin

    Dong Xin

    I’m a fourth year PhD student in the data mining group at University of Illinois at Urbana-Champaign.

    I work with Prof. Jiawei Han on scalable data mining algorithms. Before joining UIUC, I received MS and BS in Computer Science from Zhejiang University, China, in 2002 and 1999.

  • Pedro Bizarro

    Pedro Bizarro

    I am a 4th year PhD at the University of Wisconsin – Madison. I have been working with Prof. David DeWitt on adaptive query processing both on the context of traditional databases and data stream systems. Currently I am interning with the DMX group at Microsoft Research.

    I am also a Fulbright student and I have a MS in CS from UW-Madison and another from New University of Lisbon in Portugal. I am crazy by coffee, soccer, and tennis. And I was born on April’s Fools!

    Bee-Chung Chen

    Bee-Chung Chen

    My name is Bee-Chung Chen. I am a Ph.D. student at University of Wisconsin – Madison working with Prof. Raghu Ramakrishnan on data mining and database-related topics. I received my B.S. and M.S. degrees in Computer Science and Information Engineering from National Taiwan University. For more information about me, please visit my homepage.

    Eric Chu

    Eric Chu

    Eric Chu is a second-year PhD student at the University of Wisconsin-Madison. His research interests are in database Systems and his advisor is Prof. Jeff Naughton. Prior to attending UW-Madison, Eric got his Bachelor degree in Computer Engineering at another UW – University of Washington, where he worked with Prof. Alon Halevy. This summer Eric is working in the DMX group with Sanjay Agrawal and Vivek Narasayya on the AutoAdmin project. For more information, visit his homepage.

    Luna Dong

    Luna Dong is a fourth-year student at Univ. of Washington. She’s advised by Alon Halevy and her research interests are personal information management and data integration. Before going to UW, Luna got her Master’s degree in Peking Univ. and her Bachelor’s degree in Nankai Univ. in China.

    Govind Kabra

    Govind Kabra

    I am a PhD student at University of Illinois, Urbana-Champaign advised by Dr. Kevin Chang. I am working with him on Metaquerier System for exploration and integration of deep web databases. This summer, I am working on developing a model for fine grained authorization in databases. In my leisure, I play squash, badminton, or basketball or watch movies. Visit my homepage for more info.

    Anish Das Sarma

    Anish Das SarmaAnish Das Sarma is a first year CS PhD student at tanford University. His PhD advisor is Jennifer Widom, and he is working on uncertainty and lineage in databases as part of the Trio project at Stanford (http://www-db.stanford.edu/trio/). This summer he is interning in the DMX group at MSR, working with Vivek Narasayya on the Auto Admin project. Prior to joining Stanford, Anish received his B-Tech. degree in Computer Science and Engineering from IIT-Bombay, where he was also awarded the Dr. Shankar Dayal Sharma Gold medal.

    Anish is keenly interested in various extra-curricular activities. He is an active chess player (Rating, FIDE: 2071, USCF: 2121), and also loves table-tennis, swimming, and various other sports. He is also interested in music and plays the keyboard and tabla (Indian percussion instrument).

    Anish’s homepage (opens in new tab) contains more information.

    Utkarsh Srivastava

    Utkarsh Srivastava

    I am a third year PhD student working with Prof. Jennifer Widom at the InfoLab (formerly database group) at Stanford University. My primary research interests lie in statistics and query optimization for emerging applications such as data streams, web services as well as classical relational databases.

    Dong Xin

    Dong Xin

    I’m a 3rd year PhD student in the data mining group at University of Illinois at Urbana-Champaign, studying under Prof. Jiawei Han.

    Before joining UIUC, I received MS and BS in Computer Science from Zhejiang University, China, in 2002 and 1999.

    Wei Vivian Zhang

    I am a MS student at University of Wisconsin – Madison. Currently I am an intern in the Data Management, Exploration and Mining Group at Microsoft Research. I am working on data cleaning. I like music, dancing and badminton.

  • Faculty

    S. Sudarshan

    S. SudarshanS. Sudarshan who is currently here as a Visiting Researcher in the DMX group, is on a years sabbatical from IIT Bombay, India where he is a Professor in the Computer Science and Engineering Department. Sudarshan got his PhD from the Univ. of Wisconsin, Madison, and worked in the database research group at AT&T Bell Laboratories for 3 years before moving to IIT Bombay in 1995. His research interests are in the area of query processing and optimization, and his most recent areas have included keyword querying on databases, parametric and nested query optimization, and fine-grained authorization in databases.

    Interns

    Nilesh Dalvi

    I am a third year Ph.D. student at University of Washington where I work with Dan Suciu on probabilistic models for data integration. At Microsoft, I am working with Surajit Chaudhuri on using user preferences for ranking database query results. In my spare time, I can be found playing tennis, bridge, solving cryptic crosswords or reading books.

    Seung-won Hwang

    I am a PhD student from University of Illinois at Urbana-Champaign. My primary research interest is ranked query processing. Besides computer science, I love playing the violin and backpacking. I’m excited to spend another summer at MSR and enjoy northwest outdoors.

    Shubha Nabar

    Shubha Nabar

    Hi! I’m a second year PhD student at Stanford University, where my advisor is Rajeev Motwani. My research interests include algorithms for applications areas like networking and databases. This summer I’m interning in the DMX group and working on the Autoadmin project. My non-academic interests include playing badminton, squash (racquet sports in general), reading and cooking.

    Stratos Papadomanolakis

    Stratos Papadomanolakis

    I am a 3d year Ph.D. student in Carnegie Mellon University. I am interested in self-tuning database systems, specifically in automating the design of database structures based on workload and hardware configuration information. For my internship I am working with Vivek Narasayya on the Database Tuning Advisor.

    Alpa Shah

    Alpa Shah

    I am a second year PhD student at Columbia University and my advisor is Luis Gravano. I am working on developing efficient strategies for relational query processing over plain text documents by relying on information extraction and information retrieval techniques.

    Dilys Thomas

    Dilys Thomas

    I am a second year PhD student at Stanford, advised by Rajeev Motwani.

    I am interested in Algorithm Design, and am applying it recently to Data Streams, Privacy, and Query Optimization.

    My other interests include outdoor sports esp, soccer.

    Ying Xu

    Ying Xu

    I am a first year PhD student at Stanford working with Professor Rajeev Motwani. I’m interested in algorithms, especially with database application background. This summer I’m working with Venky on a data cleaning problem. In my free time, hiking and reading are my favorite outdoor/indoor activities.

    Wendy Wang

    Wendy Wang

    I am a second-year PhD student at University of British Columbia. My advisor is Laks V.S. Lakshmanan. My research area is XML security. For this summer I will work with Nico on top-k ranking problem. My non-academic interest includes movies, classical music and badminton.

  • Faculty

    Prof. Raghu Ramakrishnan

    Raghu Ramakrishnan

    Raghu Ramakrishnan is a Professor of Computer Sciences and Vilas Associate at the University of Wisconsin-Madison. From 1999 to 2002, he served as Chairman and CTO of QUIQ, a company that developed a novel approach to customer support by facilitating collaboration among customers and capturing that interaction in a reusable knowledge base. His research is in the area of database systems and data mining. He is a Fellow of the Association for Computing Machinery (ACM), and has received a Packard Foundation Fellowship, a Presidential Young Investigator award, and an ACM SIGMOD Contributions Award. He has written the widely-used text “Database Management Systems” (with J. Gehrke).

    Prof. Gerhard Weikum

    Gerhard Weikum

    Gerhard Weikum is a Full Professor in the Department of Computer Science of the University of the Saarland at Saarbruecken, Germany. Gerhard is co-author of more than 100 refereed publications, and he has recently written a textbook on Transactional Information Systems, published by Morgan Kaufmann. He received the 2002 VLDB ten-year award for his work on automatic tuning. Gerhard serves on the editorial boards of ACM TODS and IEEE CS TKDE, and he will be program committee chair for the 2004 SIGMOD conference in Paris.

    Interns

    Eugene Agichtein

    Eugene Agichtein

    I am a PhD student at Columbia University. I am working on information extraction from unstructured text. Specifically, I like applying machine learning techniques to improve the quality and scalability of information extraction while requiring little or no manual input.

    Brian Babcock

    Im a third-year PhD student at Stanford, where my advisor is Rajeev Motwani. My research is mostly in the area of algorithms for processing streaming data, and Ive also done some work on approximate query processing. This is my second summer as an intern in the DMX group. My non-academic interests include soccer, backpacking, and rock climbing.

    Ashish Gupta

    Ashish Gupta

    I am a third year graduate student from University of Washington. Back at UW, I work with Prof. Dan Suciu on processing of streaming XML. Over the summer, I will be working with Vivek, Nico and Sanjay on the View Merging problem, a part of the Autoadmin project. In my free time, I like to watch movies, go trekking and try my hand at virtually any outdoor sport what so ever.

    Vagelis Hristidis

    Vagelis HristidisMy name is Vagelis Hristidis and I just finished my 4th year of the PhD program at UC San Diego. My advisor there is Yannis Papakonstantinou. My thesis (which will be completed within the next year) topic is Keyword Search in Databases. For more information on my research visit my web page at www.db.ucsd.edu/people/vagelis.

    In my free time I plan to visit the main sights of the city and explore the Pro Club, for which I heard good things. The weather may not be as good as in San Diego, but a change is always interesting J.

    By the way, I come from Greece.

    Zheng Huang

    Zheng HuangMy name is Zheng Huang. I finished my B.E. in Zhejiang University (P.R.China) in 2002, after which I came to University of Wisconsin-Madison working with Prof. Raghu Ramakrishnan on data mining and databases. Before I came to UW-Madison, I did some work of image/video processing and content-based image retrieval. I once worked as a visiting student at Microsoft Research Asia with Dr. Hongjiang Zhang, Dr. Mingjing Li and Dr. Lei Zhang.

    BTW, as many guys from China, I like playing ping-pong. I played tennis and badminton a lot too.

    Seung-won Hwang

    Hi, I am Seung-won Hwang, a third year Ph.D student from University of Illinois at Urbana-Champaign. I received my MS and BS from University of Illinois and KAIST (Korea Advanced Institute of Science and Technology) respectively. My primary research interest is ranked query processing. Besides computer science, I love playing the violin and backpacking. Im looking forward to visiting national parks in the northwest.

    Daniel Kifer

    Daniel Kifer

    I am a 3rd year PhD student in Computer Science at Cornell. My advisor is Johannes Gehrke, and my research interests are in data mining, specifically, interesting problems in data mining. I am also interested in ping pong, backgammon, ping pong,and, of course, making the occasional bad joke.

    Ravi Ramamurthy

    Ravi Ramamurthy

    I am a graduate student from UW-Madison and I am hoping to finish my PhD sometime next year. I am generally interested in adaptive query processing and this summer I will be working with Vivek and Surajit on the AutoAdmin project. My other interests include playing badminton and playing the violin.

    Utkarsh Srivastava

    Utkarsh Srivastava

    I am a first year PhD student at Stanford University working with Jennifer Widom. I am primarily interested in approximation techniques for stream processing and the design of an adaptive query processing architecture for the same. In my free time I pursue music and a game of pool never hurts!

    Qi Su

    Qi Su

    I am a first year PhD student at Stanford working with Professor Jennifer Widom. I did my undergrad at University of Wisconsin-Madison. I enjoy tennis, golf, volleyball and Muay Thai kickboxing.