|
Spaces home Joseph M. JoyPhotosProfileFriendsMore ![]() | ![]() |
Joseph M. JoySoftware Architect, Microsoft Research India
|
June 02 Jim Gray TributeI attended the tribute to Jim Gray, May 31st 2008, at UC Berkeley. It was inspiring. The way he conducted his life and interacted with people across the industry and academia, the impact he’s had on many, many individuals – these are things that inspired me most, and I think that we can learn from his example. I think it was Ed Lazowska who said, during the Tribute, that while one cannot hope to become the intellectual giant that Jim was -- to go to sleep at night, hoping to wake up the next morning intellectually stronger – one can always learn from Jim’s example and strive to be a better human. I’ve written down some things that really struck me. I encourage you to read the “proceedings” from the Tribute, which are published as Volume 37, Number 2 (June 2008) of the SIGMOD Record (the online version of the issue is not up there as I write this).
I do not know Jim Gray personally. I have attended a few talks by him in Redmond, and knew of his pioneering work on Transaction Processing and of his work with Tom Barclay and others on TerraServer and his more recent work with the Astronomy community. I do know some folks who have interacted with him personally and they always had good things to say about him. It was at the tribute that I realized just how great a person he was in so many respects, and I am glad I attended this event, to see and hear in person, accounts from people he has worked with over the years.
Working across the industry. Jim really was at the very center of defining the fundamental properties of Transactions, and on efficient ways to implement them, including defining several levels of consistency and efficient locking protocols and innumerable other implementation guidelines. He actually wrote a lot of code that went into IBM’s System-R at the time. Bruce Lindsay, who worked with Jim back at IBM Research during the 70s, talked about his contributions there. What I didn’t know is that it was Jim who also moved the entire database industry to adopt standardized transaction processing benchmarks. He defined the original benchmark and encouraged the formation of the TPC council. How he went about this provides a glimpse into his method of working and how wide ranging his impact has been. As David DeWitt described, Jim wrote a paper in 1984 called “A measure of Transaction Processing Power (and old TR version; a .DOC version is here.).” This paper was deliberately published in a Trade magazine, Datamation, and was authored, tongue in cheek, “Anon Et Al.” Jim had worked with some 24 folk from industry and academia, whom he anonymously credited because of the controversial nature of the results, which for the first time showed the relative performance of several of the database systems of the day (presented anonymously in the paper with monikers ranging from “Lean and Mean” through “Funny”) in the stark glare of the new benchmarks metrics. In his industry-spanning manner that is classic Jim Gray, he had worked with these individuals from fiercely competing companies to get the data. He had the singular authority and professional integrity to pull this off, and get the companies to move away from their own proprietary benchmarks to the establishment of the TPx benchmarks, which in turn has had far reaching impact in driving innovation beyond databases – in storage, CPU architectures and software. As Tom Barclay mentioned in a later talk, Jim was really of the industry at large – it’s just that at any point of time some company signed up to pay his salary J.
500 special relationships. Ed Lazowska gave a talk he titled “500 Special Relationships: Jim as A Mentor to Faculty and Students,” and many of the things he mentioned were echoed by other speakers, either explicitly or it came out when they were recounting personal stories interacting with Jim. The essence of what they were saying was that Jim had these incredibly close relationships with 100s of people, many outside the database community. Ed himself is not a database person. He talked about how “Jim provided extraordinary guidance to me, my students, and our colleagues in the ocean sciences community…” and how Jim anonymously endowed almost $500K in undergraduate scholarships at the University of Washington (not his alma mater). Professor and astronomer Alexander Szalay of John Hopkins University talked of how he met Jim in the late 90s, and of Jim “rolling up his sleeves” in 2000, working multiple 20 hour days (!) converting the Sloan Digital Sky Survey (SDSS) into SQL. He talked about Jim “going native” and becoming much of an astronomer himself and a much-loved member of the astronomer community, so much so that an asteroid is to be named after him. He says “my friendship and collaborations with Jim took my career in new, entirely different directions… He impacted lives of many others around him on the same way.” Others, such as Pat Helland (see his blog), Michael Stonebraker and Gordon Bell, had similar stories to tell. They talked about his great technical insights, mentorship, especially of students and upcoming researchers, and of freely giving credit. Curtis Wong, when explaining Jim’s role in inspiring the World Wide Telescope, talked about Jim’s almost embarrassing generosity while giving credit, and how it was instrumental in motivating Curtis to get the WWT project off the ground. As you can imagine, his reputation for integrity combined with having so many personal relationships across the industry enabled him to serve a powerful networking role. Rick Rashid called Jim a “Gap Bridger … someone who could connect people, groups, companies and disciplines.” David Vaskevitch talked about Jim’s role as a sort of “transaction coordinator” for people who were making major career change decisions, especially between companies – they would always get his advice and as David put it in humorously in TP terms, Jim would ensure that these transactions went through smoothly, and people were not left “in flight” mid way through transitions.
Write it down! Another common theme was the importance Jim gave to writing and presenting ideas. He said in a 2002 interview (re-published in the (27,2) SIGMOD Record) that he learned this from his PhD Advisor, Michael Harrison at UCB (who also gave a talk, about Jim’s student days at UCB). Jim would often do the lion’s share of paper writing. Many commented about his ability to write crisply about complex topics. He often encouraged (and goaded) his colleagues to write and present. Andreas Reuter, who with Jim Gray wrote the famous book “Transaction Processing – Concepts and Techniques”, gave a talk about the experience working with Jim writing that book. There are many interesting anecdotes you can read from the SIGMOD record of the tribute, but Jim’s approach is best summarized in Andreas words: “I got the impression that for Jim the whole exercise served as an “upload” of all the things he had learned and thought about for decades, so that other people could pick them up and he was free to take off in to new territories.”
Tidbits. Some other things I noted, from the Tribute as well as reading the transcript of Jim’s 2002 interview are jotted down here. · Alex Szalay mentioned how Jim inquired about the “20 queries” – what are the most important 20 queries that (in this case) the astronomy community would want to make of the SDSS data. This really brought out the underlying requirements and constraints (and precipitated the right kind of dialog and debate about what was really necessary). Alex said that this method of dialog turned out to be very effective in getting to the bottom of requirements when they span multiple disciplines. · In Jim’s 2003 interview, he talked about how paradigm-changing ideas are often rejected for (first) publication. He says: “The original B-Tree paper was bounced; the data cube paper was bounced. The original transaction paper was bounced. Any paper that is non-linear is going to get bounced”. His advise remains to “go for the home run”, and be persistent.
Jim Gray was a great person. We would do well to emulate even a fraction of his qualities. February 16 Microsoft Puzzle Hunt and Puzzle SafariIn this post I talk about a series of puzzle contests that my friends and I have participated in, and how it inspired some of us to hold our own contests, including for children.
I have always been interested in puzzles, and have amassed quite a large collection of puzzle books and physical puzzles over the years. Several years ago (1999), my friends and I learned that some puzzle enthusiasts within Microsoft (we were in Redmond at the time), were organizing something called Puzzle Hunt. Teams of up to 12 were invited to participate in a 2 day event, that involved solving a series of puzzles, that were given to us in packets in several stages. We formed a team of some 5 (all IITians, I recall, an somewhat cocky about our expectation of doing rather well in the competition, perhaps even coming in the top three). Some 30 or 40 teams participated.
So early Saturday morning, a whole bunch of puzzle enthusiasts assembled on the Microsoft campus and received our first collection of packets. We had reserved a conference room to serve as a sort of control room (and noticed that many other conference rooms were taken up by other teams, some of these teams quite large – 8 to 12 members). Well guess what? We were completely blindsided by the difficulty of the puzzles. The first thing that flummoxed us was the fact that there were no instructions! A typical puzzle would have a title, and some pictures which didn’t seem to have any rhyme or reason to them, and nothing else. An example from the 2nd Puzzle Hunt is shown below.
Here is another one from one of the later puzzle hunts: we got a small plastic packet with some Jelly Belly(r) jelly beans in them. That’s it! There was an online system for reporting the status of various teams as they solved. As we stared uncomprehendingly at our set of puzzles, and the hours passed, we saw in the online system various teams solving puzzles, one after another. This was a 2 day event, and on the 2nd day we were pretty dispirited. I think we solved 2 or 3 puzzles out of perhaps 20, and came in the bottom third. The top three teams, meanwhile, were in a completely different level, having solved pretty much all the puzzles and the meta puzzle as well. We were astounded.
So after participating in the first Microsoft Puzzle Hunt, we were bruised, but also hooked, and we have participated in several since. We got to know some of the “tricks of the trade” (chiefly, to try out lots of things very fast, look for encodings of any kind, and not be fazed by ambiguity. The Internet gets used a lot). We generally fall in the middle of the pack, still far below the few teams at the very top, but at least having a lot of fun. You can find out more about the Microsoft Puzzle Hunt on the Wikipedia here: http://en.wikipedia.org/wiki/Microsoft_Puzzle_Hunt. Here's a picture I took at the beginning of the 2002 Puzzle Hunt, and another one of our team hunkering down, solving puzzles...
In 2001, another set of enthusiasts started another version of puzzle solving, called Puzzle Safari. These tended to involve many more and easier puzzles, teams could be of max size 4, and it involved a lot of running around campus. The answers to most puzzles were frequently some location on campus (say a conference room identified by its number), where one of us had to rush. To confirm that you had found the location, you had to locate a unique stamp pad hidden in that location, and mark your little booklet. This was a one-day affair, and at the end of it, we were pretty exhausted, mentally and physically, but it was loads of fun and we participated every year, as long as we were in Redmond (until 2004). Our team would generally come in the middle, sometimes amongst the top one thirds of the participants. As with Puzzle Hunt, the very top teams were in a league by themselves, and having gone through the gauntlet ourselves, we just couldn’t believe how they could get so much done in the same amount of time. A humbling and at the same time inspiring experience. You can find out more about Puzzle Safari on the Wikipedia here: http://en.wikipedia.org/wiki/Microsoft_Puzzle_Safari. A picture I took of the 2002 puzzle safari beginning is below. Unfortunately I couldn't find any pictures of all the mad running around and letterboxing activity. Our experiences have inspired my wife and I to hold a mini puzzle safari kind of event for our elder daughter’s birthday. We started this when she was very young: just 3 years old(!). At that age the children mostly ran around confused about what was going on. But we were extremely gratified a year later, when many of these same children, now 4ish years old, were eagerly anticipating the next puzzle (treasure) hunt! In this day of eye candy, TV, and videos, “reverse gifts” and short attention spans, these children had remembered something that had happened a year back and wanted more of it! So we’ve been having “puzzle hunts” every year, the last 3 have been in Bangalore India. My colleagues and I have also organized a puzzle hunt during our annual Microsoft Research India retreat, and that was a big hit. I will add more information on each event and hopefully it will entice others to hold their own mini puzzle hunts for children (and grownups too). The following pictures are four different puzzle solving events I was involved with organizing, covering diverse age groups, as you can see!
February 14 Mining Software RepositoriesSoftware projects can be extremely large. For example, the Microsoft Windows codebase is estimated to be over 100 million lines of code. However, these projects did not spring up overnight. They have evolved over a long period, with many people contributing over the years. The question is, how much information of value to the current developers of the system can be gleaned by examining the various repositories that constitute the software project? Quite a lot, as it turns out.
Dr. Prem Devanbu from U C Davis is one of the authorities in the field of mining software repositories. Dr. Devanbu recently (in Jan 2008) visited with MSR India for a few days, and gave a fascinating talk “Babes in woods” that talked of the challenges of new developers taking on complex software projects. Prem went on to describe work he and his collaborators at UC Irvine did on a system that recommended a list of “related functions” that a programmer should look at, when she is planning to use a particular function. Relationships amongst functions are useful in other contexts. For example, they can be used to study the impact of a particular code change in order to estimate the risk of a regression and to decide which tests to run. Dr. Devanbu described their FRAN tool that implements their HEAR algorithm. HEAR analyzes call graphs to come up with related-function recommendations, using linear algebra techniques that simulated infinite random walks in the neighborhood of the query function in the call graph. It was found to be an effective and fast technique. I you are at all interested (as I am) in the general area of mining software repositories for information that can be valuable to software development, I strongly recommend you read their related paper Recommending Random Walks by Saul, Z. M., Filkov, V., Devanbu, B. and Bird, C. In Proc. 15th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE-15), September 2007.
The references in the paper point to other fascinating work in the general area of mining software repositories, for example: 1. D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: a project memory for software development. IEEE Transactions on Software Engineering, 31(6):446–465, 2005. 2. Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proc. ACM FSE -13 (2005). 3. V. B. Livshits and T. Zimmermann. Dynamine: Finding common error patterns by mining software revision histories. In Proc. ACM FSE-13, (2005). 4. T. Zimmermann, A. Zeller, P. Weissgerber, and S. Diehl. Mining version histories to guide software changes (earlier paper here). IEEE Transactions on Software Engineering, 31(6):429-445, 2005.
These references are all relatively recent, and to me this indicates that the potential of mining software repositories remains largely untapped with many interesting results still to come. December 24 On Software Systems Complexity and what we can do about itWe are building increasingly complex software. The tools (languages, infrastructure, development environments) we are building are getting more effective every year. This is great from the perspective of rapidly implementing new functionality, but I claim that our ability to understand the behavior of the systems we are building is reducing. It's not reducing across the board, but I claim that, on the whole, the complexity of the things we build is outpacing the capabilities of the techniques and tools we have to be able to analyze these systems.
Of course, this is a topic that is dear to my heart, and a good chunk of my professional life has been devoted to both increasing the complexity (working on several versions of the Windows OS) and battling it (through libraries, diagnostics, and more recently, collaboration with researchers on new programming models -- see my MSR India page -- and studying the larger development cycle that includes Sustained Engineering.
It seems that with so much attention paid to functionality, functionality, functionality, the battle against complexity is a losing one and I would like to turn the tide. I would like to spend the next few years of my career focused on taking some big bites out of software complexity, in collaboration with many others, in particular a deep engagement with researchers that are experts in the fields of program analysis, formal methods, information retrieval, and large scale data mining. Ideally, I would like to work on some big hairy goals, analogous to the Human Genome Project, where a bunch of us work in a coordinated fashion towards goals that may take a few years.
So what are candidate goals/challenges of sufficient magnitude? Here's my working list...
As I said, I will be increasingly focus on the challenge of battling systems complexity, and as I start to take a more concerted look at this space and what people are doing about it, I will attempt to record things here. August 13 2006 Code4bill FinalsThe code4bill finals were on July 21st. 19 finalists, who had spent 4-8 weeks on a project either at Microsoft IDC (Hyderabad) or at MSR India (Bangalore). I had the chance to spend time with many finalists, both in my role in evaluating them and working side-by-side with the MSR India finalists (and mentoring one of them). An impressive bunch, and I was also inspired by the intensity with which they attacked their projects and the camaraderie that developed amongst the finalists over the 2 months.
I placed some pictures I took on the final day in my 2006 Code4bill Finals album.
|
|
|||||||||
|
|