{"id":955,"date":"2008-10-08T08:06:19","date_gmt":"2008-10-08T07:06:19","guid":{"rendered":"http:\/\/ospublish.constantvzw.org\/?p=955"},"modified":"2008-10-08T10:34:02","modified_gmt":"2008-10-08T09:34:02","slug":"data-analysis-as-a-discourse","status":"publish","type":"post","link":"http:\/\/ospublish.constantvzw.org\/blog\/conversation\/data-analysis-as-a-discourse","title":{"rendered":"Data analysis as a discourse"},"content":{"rendered":"
At the Libre Graphics Meeting 2008<\/a> in Wroclaw, just before Michael Terry<\/a> presents ingimp<\/a> to an audience of curious Gimp developers and users, we meet up to talk more about ‘instrumenting The Gimp’ and about the way Terry thinks data analysis could be done as a form of discourse.<\/p>\n Michael Terry is a computer scientist working at the Human Computer Interaction Lab of the University of Waterloo, Canada and his main research focus is on improving usability in open source software. We speak about ingimp<\/a>, a clone of the popular image manipulation programme Gimp, but with an important difference: ingimp allows users to record data about their usage in to a central database, and subsequently makes this data available to anyone. Femke Snelting [FS]<\/strong> Maybe we could start this conversation with a description of the ingimp project you are developing and why you chose to work on usability for Gimp? <\/em><\/p>\n Michael Terry [MT]<\/strong> So the project is \u2018ingimp\u2019, which is an instrumented version of Gimp, it collects information about how the software is used in practice. The idea is you download it, you install it, and then with the exception of an additional start up screen, you use it just like regular Gimp. So, our goal is to be as unobtrusive as possible to make it really easy to get going with it, and then to just forget about it. We want to get it into the hands of as many people as possible, so that we can understand how the software is actually used in practice. There are plenty of forums where people can express their opinions about how GIMP should be designed, or what\u2019s wrong with it, there are plenty of bug reports that have been filed, there are plenty of usability issues that have been identified, but what we really lack is some information about how people actually apply this tool on a day to day basis. What we want to do is elevate discussion above just anecdote and gut feelings, and to say, well, there is this group of people who appear to be using it in this way, these are the characteristics of their environment, these are the sets of tools they work with, these are the types of images they work with and so on, so that we have some real data to ground discussions about how the software is actually used by people.<\/p>\n You asked me now why Gimp? I actually used Gimp extensively for my PhD work. I had these little cousins come down and hang out with me in my apartment after school, and I would set them up with Gimp, and quite often they would always start off with one picture, they would create a sphere, a blue sphere, and then they played with filters until they got something really different. I would turn to them looking at what they had been doing for the past twenty minutes, and would be completely amazed at the results they were getting just by fooling around with it. And so I thought, this application has lots and lots of power, I’d like to use that power to prototype new types of interface mechanisms. So I created JGimp, which is a Java based extension for the 1.0 G imp series, that I can use as a back-end for prototyping novel user interfaces. I think that it is a great application, there is a lot of power to it, and I had already an investment in its code base so it made sense to use that as a platform for testing out ideas of open instrumentation.<\/p>\n FS<\/strong> What is special about ingimp, is the fact that the data you generate is made as open part as the software you are studying itself. Could you describe how that works?<\/em><\/p>\n MT<\/strong> Every bit of data we collect, we make available: you can go to the website, you can download every log file that we have collected. The intent really is for us to build tools and infrastructure so that the community itself can sustain this analysis, can sustain this form of usability. We don\u2019t want to create a situation where we are creating new dependencies on people, or where we are imposing new tasks on existing project members. We want to create tools that follow the same ethos as open source development, where anyone can look at the source code, where anyone can make contributions, from filing a bug to doing something as simple as writing a patch, where they don\u2019t even have to have access to the source code repository, to make valuable contributions. So importantly, we want to have a really low barrier to participation. At the same time, we want to increase the signal-to-noise ratio. Yesterday I talked with Peter Sikking, an information architect working for Gimp, and he and I both had this experience where we work with user interfaces, and since everybody uses an interface, everybody feels they are an expert, so there can be a lot of noise. So, not only did we want to create an open environment for collecting this data, and analysing it, but we also want to increase the chance that we are making valuable contributions, and that the community itself can make valuable contributions. Like I said, there is enough opinion out there. What we really need to do is to better understand how the software is being used. So, we have made a point from the start to try to be as open as possible with everything, so that anyone can really contribute to the project.<\/p>\n FS<\/strong> ingimp has been running for a year now. What are you finding?<\/em><\/p>\n MT<\/strong> I have started analysing the data, and I think one of the things that we realised early on is that it is a very rich data set; we have lots and lots of data. So, after a year we\u2019ve had over 800 installations, and we\u2019ve collected about 5000 log files, representing over half a million commands, representing thousands of hours of the application being used. And one of the things you have to realise is that when you have a data set of that size, there are so many different ways to look at it that my particular perspective might not be enough. Even if you sit someone down, and you have him or her use the software for twenty minutes, and you videotape it, then you can spend hours analysing just that twenty minutes of videotape. And so, I think that one of the things we realised is that we have to open up the process so that anyone could easily participate. We have the log files available, but they really didn\u2019t have an infrastructure for analysing them. So, we created this new piece of software called \u201cStatsJam\u201d, an extension to MediaWiki, which allows anyone to go to the website and embed SQL-queries against the ingimp data set and then visualise those results within the Wiki text. So, I\u2019ll be announcing that today and demonstrating that, but I have been using that tool now for a week to complement the existing data analysis we have done. From that group, what we found is that use of ingimp is really short and versatile. So, most sessions are about fifteen minutes or less, on average. There are outliers, there are some people who use it for longer periods of time, but really it boils down to them using it for about fifteen minutes, and they are applying fewer than a hundred operations when they are working on the image. I should probably be looking at my data analysis as I say this, but they are very quick, short, versatile sessions, and when they use it, they use less than 10 different tools, or they apply less than 10 different commands when they are using it. FS<\/strong> Every time you start up ingimp, a screen comes up asking you to describe what you are planning to do and I am interested in the kind of language users invent to describe this, even when they sometimes don\u2019t know exactly what it is they are going to do. So inventing language for possible actions with the software, has in a way become a creative process that is now shared between interface designer, developer and user. If you look at the ‘activity tags’ you are collecting, do you find a new vocabulary developing? MT<\/strong> I think there are 300 to 600 different activity tags that people register within that group of ‘significant users’. I didn\u2019t have time to look at all of them, but it is interesting to see how people are using that as a medium for communicating to us. Some people will say, \u201cJust testing out, ignore this!\u201d Or, people are trying to do things like insert html code, to do like a cross-site scripting attack, because, you have all the data on the website, so they will try to play with that. Some people are very sparse and they say ‘image manipulation’ or ‘graphic design’ or something like that, but then some people are much more verbose, and they give more of a plan, \u201cThis is what I expect to be doing\u201d. So, I think it has been interesting to see how people have adopted that and what\u2019s nice about it, is that it adds a really nice human element to all this empirical data.<\/p>\n Ivan Monroy Lopez [IM]<\/strong>: I wanted to ask you about the data, without getting too technical, could you explain how these data are structured, what do the log files look like? <\/em><\/p>\n MT<\/strong> So the log files are all in XML, and generally we compress them, because they can get rather large. And the reason that they are rather large is that we are very verbose in our logging. We want to be completely transparent with respect to everything, so that if you have some doubts or if you have some questions about what kind of data has been collected, you should be able to look at the log file, and figure out a lot about what that data is. That\u2019s how we designed the xml log files, and it was really driven by privacy concerns and by the desire to be transparent and open. On the server side we take that log file and we parse it out, and then we throw it into a database, so that we can query the data set.<\/p>\n FS<\/strong> Now we are talking about privacy\u2026 I was impressed by the work you have done on this; the project is unusually clear about why certain things are logged, and other things not; mainly to prevent the possibility of ‘playing back’ actions so that one could identify individual users from the data set. So, while I understand there are privacy issues at stake I was wondering… what if you could look at the collected data as a kind of scripting for use? Writing a choreography that might be replayed later?<\/em><\/p>\n MT<\/strong> Yes, we have been fairly conservative with the type of information that we collect, because this really is the first instance where anyone has captured such rich data about how people are using software on a day to day basis, and then made it all that data publicly available. When a company does this, they will keep the data internally, so you don\u2019t have this risk of someone outside figuring something out about a user that wasn\u2019t intended to be discovered. We have to deal with that risk, because we are trying to go about this in a very open and transparent way, which means that people may be able to subject our data to analysis or data mining techniques that we haven\u2019t thought of and extract information that we didn\u2019t intent to be recording in our file, but which is still there. So there are fairly sophisticated techniques where you can do things like look at audio recordings of typing and the timings between keystrokes, and then work backwards with the sounds made to figure out the keys that people are likely pressing. So, just with keyboard audio and keystroke timings alone you can often give enough information to be able to reconstruct what people are actually typing. So we are always sort of weary about how much information is in there. FS<\/strong> It was not meant as a feature request, but as a way to imagine how usability research could flip around and also become productive work.<\/em><\/p>\n MT<\/strong> Yes, totally. I think one of the things that we found when bringing people into to assess the basic usability of the ingimp software and ingimp website, is that people like looking at things like what commands other people are using, what the most frequently used commands are, and part of the reason that they like that, is because of what it teaches them about the application. So they might see a command they were unaware of. So we have toyed with the idea of then providing not only the command name, but then a link from that command name to the documentation \u2013 but I didn\u2019t have time to implement it, but certainly there are possibilities like that, you can imagine. <\/p>\n FS<\/strong> Maybe another group can figure something out like that? That\u2019s the beauty of opening up your software plus data set of course. MT<\/strong> I think it is important to keep in mind that whatever instrument you use to study people, you are going to have some kind of bias, you are going to get some information at the cost of other information. So if you do a video taped observation of a user and you just set up a camera, then you are not going to find details about the monitor maybe, or maybe you are not really seeing what their hands are doing. No matter what instrument you use, you are always getting a particular slice. IM<\/strong> I don\u2019t know, I don\u2019t want to get paranoid. But if you are doing it, then there is a possibility someone else will do it in a less considerate way.<\/em><\/p>\n MT<\/strong> I think it is only a matter of time before people start doing this, because there are a lot of grumblings about, \u201cwe should be doing instrumentation, someone just needs to sit down and do it.\u201d Now there is an extension out for Firefox that will collect this kind of data as well, so you know\u2026<\/p>\n IM<\/strong> Maybe users could talk with each other, and if they are aware that this type of monitoring could happen, then that would add a different social dimension\u2026<\/em><\/p>\n MT<\/strong> It could. I think it is a matter of awareness, really, so when we bring people into the lab and have them go to the ingimp website, download and install it and use it, and go check out the stats on the website, and then we ask questions like, what kind of data are we collecting? FS<\/strong> So concretely… what information are you recording, and what information are you not recording? <\/em><\/p>\n MT<\/strong> We record every command name that is applied to a document, to an image. Where your privacy is at risk with that, is that if you write a custom script, then that custom script\u2019s name is going to be inserted into a log file. And so if you are working for example for Lucas or DreamWorks or something like that, or ILM, in some Hollywood movie studio and you are using ingimp and you are writing scripts, then you could have a script like \u201cfixing Shrek\u2019s beard\u201d, and then that is getting put into the log file and then people are going to know that the studio uses ingimp. FS<\/strong> As we are talking about this, I am already more aware of what data I would allow to be collected. Do you think by opening up this data set and the transparent process of collecting and not collecting, this will help educate users about these kinds of risks?<\/em><\/p>\n MT<\/strong> It might, but honestly I think probably the thing that will educate people the most is if there was a really large privacy error and that it got a lot of news, because then people would become more aware of it because right now \u2013 and this is not to say that we want that to happen with ingimp \u2013 but when we bring people in and we ask them about privacy, \u201cAre you concerned about privacy?\u201d, and they say \u201cNo\u201d, and we say \u201cWhy?\u201d Well, they inherently trust us, but the fact is that open source also lends a certain amount of trust to it, because they expect that since it is open source, the community will in some sense police it and identify potential flaws with it. <\/p>\n FS<\/strong> Is that happening? Are you in dialogue with the Open Source community about this?<\/em><\/p>\n MT<\/strong> No, I think probably five to ten people have looked at the ingimp code \u2013 realistically speaking I don\u2019t think a lot of people looked at it. Some of the Gimp developers took a gander at it to see how could we put this upstream, but I don\u2019t want it upstream, because I want it to always be an opt-in, so that it can\u2019t be turned on by mistake. <\/p>\n FS<\/strong> You mean you have to download ingimp and use it as a separate program? It functions in the same way as Gimp, but it makes the fact that it is a different tool very clear.<\/em><\/p>\n MT<\/strong> Right. You are more aware, because you are making that choice to download that, compared to the regular version. There is this awareness about that. FS<\/strong> Can you say something about how this type of research relates to classic usability research and in particular to the usability work that is happening in Gimp?<\/em><\/p>\n MT<\/strong> Instrumentation is not new, commercial software companies and researchers have been doing instrumentation for at least ten years, probably ten to twenty years. So, the idea is not new but what is new, in terms of the research aspects of this, is how do we do this in a way where we can make all the data open? The fact that you make the data open, really impacts your decision about the type of data you collect and how you are representing it. And you need to really inform people about what the software does. IM<\/strong> What approach did you take in order to make this project self-sustainable?<\/em><\/p>\n MT<\/strong> Collecting data is not hard. The challenge is to understand the data, and I don\u2019t want to create a situation where the community is relying on only one person to do that kind of analysis, because this is dangerous for a number of reasons. First of all, you are creating a dependency on an external party, and that party might have other obligations and commitments, and might have to leave at some point. If that is the case, then you need to be able to pass the baton to someone else, even if that could take a considerate amount of time and so on. In talking with members of the Gimp project here at the Libre Graphics Meeting, they started asking questions like, \u201cSo how many people are doing this, how many people are doing this and how many this?\u201d They\u2019ll ask me while we are sitting in a caf\u00e9, and I will be able to pop the database open and say, \u201cA certain number of people have done this, or, \u201cno one has actually used this tool at all.\u201d FS<\/strong> You mean you can\u2019t say that because it is not used, it doesn\u2019t deserve any attention?<\/em><\/p>\n MT<\/strong> Yes, you just can\u2019t jump to conclusions like that, which is again why we want to have this community website, which shows the reasoning behind the analysis. Here are the steps we had to go through to get this result, so you can understand what that means, what the context means, because if you don\u2019t have that context, then it\u2019s sort of meaningless. It\u2019s like asking, what are the most frequently used commands? This is something that people like to ask about. Well really, how do you interpret that? Is it the numbers of times it has been used across all log files? Is it the number of people that have used it? Is it the number of log files where it has been used at least once? There are lots and lots of ways in which you can interpret this question. So, you really need to approach this data analysis as a discourse, where you are saying, here are my assumptions, here is how I am getting to this conclusion, and this is what it means for this particular group of people. So again, I think it is dangerous if one person does that and you become to rely on that one person. We really want to have lots of people looking at it, and considering it, and thinking about the implications. <\/p>\n FS<\/strong> Do you expect that this will impact the kind of interfaces that can be done for Gimp?<\/em><\/p>\n MT<\/strong> I don\u2019t necessarily think it is going to impact interface design, I see it really as a sort of reality check: this is how communities are using the software and now you can take that information and ask, do we want to better support these people or do we\u2026For example on my data set, most people are working on relatively small images for short periods of time, the images typically have one or two layers, so they are not really complex images. So regarding your question, one of the things you can ask is, should we be creating a simple tool to meet these people\u2019s needs? All the people are is just doing cropping and resizing, fairly common operations, so should we create a tool that strips away the rest of the stuff? Or, should we figure out why people are not using any other functionality, and then try to improve the usability of that? FS<\/strong> And do you see a difference in how interface design is done in free software projects, and in proprietary software?<\/em><\/p>\n MT<\/strong> Well, I have been mostly involved in the research community, so I don\u2019t have a lot of exposure to design projects. I mean, in my community we are always trying to look at generating new knowledge, and not necessarily at how to get a product out the door. So, the goals or objectives are certainly different. An interview with Michael Terry (ingimp) At the Libre Graphics Meeting 2008 in Wroclaw, just before Michael Terry presents ingimp to an audience of curious Gimp developers and users, we meet up to talk more about ‘instrumenting The Gimp’ and about the way Terry thinks data analysis could be done as a form of discourse. […]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[18],"tags":[31,137,118,50,122],"_links":{"self":[{"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/posts\/955"}],"collection":[{"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/comments?post=955"}],"version-history":[{"count":57,"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/posts\/955\/revisions"}],"predecessor-version":[{"id":1055,"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/posts\/955\/revisions\/1055"}],"wp:attachment":[{"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/media?parent=955"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/categories?post=955"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/ospublish.constantvzw.org\/blog\/wp-json\/wp\/v2\/tags?post=955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}
\n
\n(This conversation will also be included in the forthcoming Constant publication Tracks in electronic fields<\/a><\/em>)<\/small><\/p>\n
\nOne of the first things that we realized is that we have over 800 installations, but then you have to ask, how many of those are really serious users? A lot of people probably just were curious, they downloaded it and installed it, found that it didn\u2019t really do much for them and so maybe they don’t use it anymore. So, the first thing we had to do is figure out which data points should we really pay attention too. We decided that a person should have saved an image, and they should have used ingimp on two different occasions, preferably at least a day apart, where they\u2019d saved an image on both of the instances. We used that as an indication of what a serious user is. So with that filter in place, then the \u201c800 installations\u201d drops down to about 200 people. So we had about 200 people using ingimp, and looking at the data this represents about 800 hours of use, about 4000 log files, and again still about half a million commands. So, it\u2019s still a very significant group of people. 200 people is still a lot, and that\u2019s a lot of data, representing about 11000 images they have been working on, there’s just a lot.<\/p>\n
\nWhat else did we find? We found that the two most popular monitor resolutions are 1280 by 1024 and 1024 by 768. So, those represent collectively 60% of the resolutions, and really 1280 by 1024 represents pretty much the maximum for most people, although you have some higher resolutions. so one of the things that\u2019s always contentious about gimp, is its window management scheme and the fact that it has multiple windows, right? And some people say, well you know this works fine if you have two monitors, because you can throw out the tools on one monitor and then your images are on another monitor. Well, about 10 to 15% of ingimp users have two monitors, so that design decision is not working out for most of the people, if that is the best way to work. These are things I think that people have been aware of, it\u2019s just now we have some actual concrete numbers where you can turn to and say, now this is how people are using it.
\nThere is a wide range of tasks that people are performing with the tool, but they are really short, bursty tasks. <\/p>\n
\n<\/em><\/p>\n
\nWhile it might be nice to be able to do something like record people\u2019s actions and then share that script, I don\u2019t think that that is really a good use of ingimp. That said, I think it is interesting to ask, could we characterize people\u2019s use enough, so that we can start clustering groups of people together and then providing a forum for these people to meet and learn from one another? That\u2019s something we haven\u2019t worked out. I think we have enough work cut out for us right now just to characterize how the community is using it.<\/p>\n
\nWell, just a bit more on what is logged and what not… Maybe you could explain where and why you put the limit and what kind of use you might miss out on as a result?<\/em><\/p>\n
\nI think you have to work backwards and ask what kind of things do you want to learn. And so the data that we collect right now, was really driven by what people have done in the past in the area of instrumentation, but also by us bringing people into the lab, observing them as they are using the application, and noticing particular behaviours and saying, hey, that seems to be interesting, so what kind of data could we collect to help us identify those kind of phenomena, or that kind of performance, or that kind of activity? So again, the data that we were collecting was driven by watching people, and figuring out what information will help us to identify these types of activities.
\nAs I\u2019ve said, this is really the first project that is doing this, and we really need to make sure we don\u2019t poison the well. So if it happens that we collect some bit of information, that then someone can later say, \u201cOh my gosh, here is the person\u2019s file system, here are the names they are using for the files\u201d or whatever, then it\u2019s going to make the normal user population weary of downloading this type of instrumented application. This is the thing that concerns me most about open source developers jumping into this domain, is that they might not be thinking about how you could potentially impact privacy.<\/p>\n
\nWe have a lengthy concern agreement that details the type of information we are collecting and the ways your privacy could be impacted, but people don\u2019t read it. <\/p>\n
\nWe collect command names, we collect things like what windows are on the screen, their positions, their sizes, we take hashes of layer names and file names. We take a string and then we create a hash code for it, and we also collect information about how long is this string, how many alphabetical characters, numbers, things like that, to get a sense of whether people are using the same files, the same layer names time and time again, and so on. But this is an instance where our first pass at this, actually left open the possibility of people taking those hashes and then reconstructing the original strings from that. Because we have the hash code, we have the length of the string, all you have to do is generate all possible strings of that length, take the hash codes and figure out which hashes match. And so we had to go back and create a new scheme for recording this type of information where we create a hash and we create a random number, we pair those up on the client machine but we only log the random number. So, from log to log then, we can track if people use the same image names, but we have no idea of what the original string was.
\nThere are these little gotches (\u201cgotchas\u201d \u2013 that means \u201cthings to look out for\u201d) like that, that I don\u2019t think most people are aware of, and this is why I get really concerned about instrumentation efforts right now, because there isn\u2019t this body of experience of what kind of data should we collect, and what shouldn\u2019t we collect.<\/p>\n
\nWe have this lengthy text based consent agreement that talks about the data we collect, but less than two percent of the population reads license agreements. And, most of our users are actually non-native English speakers, so there are all these things that are working against us. So, for the past year we have really been focussing on privacy, not only in terms of how we collect the data, but how we make people aware of what the software does.
\nWe have been developing wordless diagrams to illustrate how the software functions, so that we don\u2019t have to worry about localisation errors as much. And so we have these illustrations that show someone downloading ingimp, starting it up, a graph appears, there is a little icon of a mouse and a keyboard on the graph, and they type and you see the keyboard bar go up, and then at the end when they close the application, you see the data being sent to a web server. And then we show snapshots of them doing different things in the software, and then show a corresponding graph change. So, we developed these by bringing in both native and non-native speakers, having them look at the diagrams and then tell us what they meant. We had to go through about fifteen people and continual redesign until most people could understand and tell us what they meant, without giving them any help or prompts. So, this is an ongoing research effort, to come up with techniques that not only work for ingimp but also for other instrumentation efforts, so that people can become more aware of the implications. <\/p>\n
\nBut I think your question is… how does it impact the Gimp\u2019s usability process? Not at all, right now. But that is because we have intentionally been laying off to the side, until we got to the point where we had an infrastructure, where the entire community could really participate with the data analysis. We really want to have this to be a self-sustaining infrastructure, we don\u2019t want to create a system where you have to rely on just one other person for this to work.<\/p>\n
\nYou also don\u2019t want to have this external dependency, because of the richness in the data, you really need to have multiple people looking at it, and trying to understand and analyse it. So how are we addressing this? It is through this Stats Jam extension to the MediaWiki that I will introduce today. Our hope is that this type of tool will lower the barrier for the entire community to participate in the data analysis process, whether they are simply commenting on the analysis we made or taking the existing analysis, tweaking it to their own needs, or doing something brand new. <\/p>\n
\nThe danger is that this data is very rich and nuanced, and you can\u2019t really reduce these kind of questions to an answer of \u201cN people do this\u201d, you have to understand the larger context. You have to understand why they are doing it, why they are not doing it. So, the data helps to answer some questions, but it generates new questions. They give you some understanding of how the people are using it, but then it generates new questions of, Why is this the case? Is this because these are just the people using ingimp, or is this some more widespread phenomenon?
\nThey asked me yesterday how many people are using this colour picker tool \u2013 I can\u2019t remember the exact name \u2013 so I looked and there was no record of it being used at all in my data set. So I asked them when did this come out, and they said, \u201cWell it has been there at least since 2.4.\u201d And then you look at my data set, and you notice that most of my users are in the 2.2 series, so that could be part of the reasons. Another reason could be, that they just don\u2019t know that it is there, they don\u2019t know how to use it and so on. So, I can answer the question, but then you have to sort of dig a bit deeper.<\/p>\n
\nThere are so many ways to use data I don\u2019t really know how it is going to be used, but I know it doesn\u2019t drive design. Design happens from a really good understanding of the users, the types of tasks they perform, the range of possible interface designs that are out there, lots of prototyping, evaluating those prototypes and so on. Our data set really is a small potential part of that process. You can say, well according to this data set, it doesn\u2019t look like many people are using this feature, let\u2019s not much focus too on that, let\u2019s focus on these other features or conversely, let\u2019s figure out why they are not using them\u2026Or you might even look at things like how big their monitor resolutions are, and say well, given the size of the monitor resolution, maybe this particular design idea is not feasible. But I think it is going to complement the existing practices, in the best case. <\/p>\n
\nI think one of the dangers in your question is that you sort of lump a lot of different projects and project styles into one category of \u201cOpen Source\u201d. \u201cOpen source\u201d ranges from volunteer driven projects to corporate projects, where they are actually trying to make money out of it. There is a huge diversity of projects that are out there; there is a wide diversity of styles, there is as much diversity in the Open Source world as there is in the proprietary world.
\nOne thing you can probably say, is that for some projects that are completely volunteer driven like Gimp, they are resource strapped. There is more work than they can possibly tackle with the number of resources they have. That makes it very challenging to do interface design, I mean, when you look at interface code, it costs you 50 or 75 percent of a code base. That is not insignificant, it is very difficult to hack and you need to have lots of time and manpower to be able to do significant things. And that\u2019s probably one of the biggest differences you see for the volunteer driven projects, it is really a labour of love for these people and so very often the new things interest them, whereas with a commercial software company developers are going to have to do things sometimes they don\u2019t like, because that is what is going to sell the product.<\/p>\n","protected":false},"excerpt":{"rendered":"