One of the things that has surprised me a bit about the recent discourse relating to patents is the increasing number of people commenting about how they feel that perhaps juries are not the best final check on this system. I find this reaction curious, if not a bit frightening. I’ve seen people unhappy with verdicts before, but rarely does that lead to questioning the very use of juries. I’m hoping to understand why people who I believe would otherwise trust juries feel this way about this particular facet of law.
Let’s start with what I hope are uncontroversial statements about patents. I believe these to be uncontroversial because they don’t necessarily help either side of the argument: they could be used to both support or oppose the current patent system:
A patent effectively creates a “mini law” in a sense. This “mini law” simply states that the government has granted the patent holder a temporary monopoly on a certain invention. If you can prove someone is encroaching on this monopoly, the legal system will defend you against them. It’s kind of like a new temporary regulation in a market.
Patents can have huge effects on the economy. Again, not judging whether these effects are good or bad. Patent defenders will say that it has a huge positive effect by protecting inventors, while patent detractors will say they have a huge negative effect by harming smaller companies and stifling innovation.
The point of these two statements is that hopefully we can all at the very least agree that the questions relating to patents are important, and thus worthy of our scrutiny. However, when we scrutinize things, it’s usually popular to get bogged down in philosophical positions (“can a business method really be owned?” ) or incredibly difficult to prove economic theories (“while usually detrimental to the point of requiring anti-trust facilities in the government, a small temporary monopoly can actually help the market in the long run” ). We often take for granted the actual system that’s implemented, and consider only it’s idealized abstraction. For example, we talk about what “congress” wants, and forget that it’s just 535 human beings with sometimes completely opposing points of view. Similarly, we talk about the “patent system” as some sort of machine that would execute the goals we want, when in reality it too is ultimately just an organization of individuals.
So let’s not look at the ideal our system tries to achieve, but instead the stark reality of what we currently have in place. Luckily for us, there are a number of statistics published about this organization (the USPTO). So let’s take a look at them:
In 2011, there were 6780 patent examiners (according to wikipedia only 99 of these were design examiners!). Taking into account that the USPTO doesn’t work weekends, this means that to get through all the applications in one year would require an average of 3 days of scrutiny per patent. In reality this is probably to blame for a backlog of over 700,000 patents.
There is little to no public accountability for patent examiners at the USPTO. By this I mean that the USPTO is obviously not staffed by elected officials. In fact, there is a tenuous connection between anyone you may vote for and USPTO policy.
According to Wikipedia, patent examiners “are generally newly graduated scientists and engineers, recruited from various universities around the nation.” Unfortunately no citation is listed for this, however the USPTO careers page does seem to have an emphasis on recent graduates, and looking at job postings on the USPTO site demonstrates that the requirements do indeed seem to be just having a 4-year degree. Additionally, no law degree is required.
The salary range for a patent examiner seems to be $51,988 to $78,881, well below what an “expert” in the field would make in the industry.
I think by definition the job of a patent examiner is an extremely difficult one: they are meant to analyze, understand, and draw conclusions regarding the latest in bleeding edge technologies. I think most people would want a proven physics expert analyzing a newly invented nuclear engine, not a recent physics graduate. A recent graduate might consider more things novel, or be more easily fooled, than someone experienced in the field, wouldn’t you agree? I consider myself pretty proficient in computer science, having attained a degree and worked on a number of products and companies, but I only consider myself an expert in a very small part of my field, and only if I’ve just recently focused on it. In other words, I don’t know if I’d even trust myself with this job — even if I was meant to only look at the things that I directly had experience with. I think all it takes is reading a few actual computer science papers to quickly start doubting your own expertise in this field. This doesn’t even get into the question of whether such experts, if they existed, would feel compelled to enter the USPTO instead of actually inventing new things. What’s particularly frightening about this to me is that I only know about these issues precisely because it’s in a field I understand: not being in fields such as biology, it scares me even more what must be going on with those patents. Perhaps 6780 people could be enough to cover all the subfields of computer science, but are they enough for all subfields of all the sciences?
The response to this is that the USPTO is currently attempting to increase its hires by 1000, with “more emphasis on recruiting candidates with significant IP experience”, in contrast with “previous hiring which focused on scientific background and experience”. One of the benefits listed is that “this will result in reduced training time as well as an increased ability to examine applications much sooner than a new hire with little or no IP experience, thereby increasing overall production output.” (source) This shows the Catch-22 and cross-displiclnary nature of the situation: I’m not sure more people with a law background to get patents through the door quicker is the appropriate response to these problems.
The conclusion I reach looking at these statistics is that the USPTO is obviously understaffed, and more importantly necessarily under-qualified, even in the ideal scenario, which their current staffing certainly does not represent.
The reason this is ultimately important is to establish some context to our original question of juries. In other facets of law, we generally have an interesting balance: publicly elected and (ostensibly) professional people making rules which are only then tested in court against the sobering “common sense” of society, which offers a way out of possible group-think of people in that field. If the rules being created are repeatedly bad, we can ultimately choose someone else through elections. In the world of patents we instead seem to be missing this balance: on the one hand you have an overwhelmed and under-qualified entity establishing rules for commerce. It is difficult to know who to blame or how to do anything about it if they repeatedly make decisions we disagree with. And then we of course have these rules tested by 12 (effectively random) individuals. So what we are left with is incredibly complex questions being passed to one group of people with questionable expertise called the USPTO (which we as a voting population can’t realistically affect), and then finally to a second group of individuals who are by design also rarely experts. Regardless of where you fall in the patent dispute, I’m not sure how you can trust the decisions being made today by this arrangement.
Imagine momentarily if other sectors of law worked this way. What if you could apply for new traffic codes to the DMV, which would be approved or denied by DMV staff, but would then be enforced by actual state police. Then, your only option was to hope that a jury found this traffic code unreasonable. I think then you would start to see more people unhappy with the way courts functioned.
If this example seems silly or far fetched, consider that patents today can determine whether entire product lines can exist, whether a health company could potentially make a life saving product (or a generic and affordable drug), or increasingly complex questions such as whether we can patent bacteria or genes. Again, the point is not what side you fall on with these questions, but rather that these questions are important and largely beyond accountability to both actual experts and the voting population. When you take this into account it quickly becomes obvious that our current real-life patent scenario is actually much scarier than the fictional traffic code one.
These have the potential to be some of the most important decisions of our lifetime, and they are for the most part completely out of our control. This is why people are so frustrated by these patent trials – it is a feeling of helplessness. The granting of a patent can be as influential on our careers or our lives as the passing of a law, and yet it’s hard to feel like anything other than a mere spectator in the process. It’s inevitable that there will always be disatisfaction with verdicts, but juries here don’t feel like some last line of defense in an already accountable and trusted system – they instead seem like the only real chance for input the public had on these individual patents. And thus we more so feel they had a duty to represent our particular personal opinions.
This feeling is however unfair. Clearly the problem is not with these juries, but instead with the way in which patents are granted in the first place. So how can you create a system that practically makes appropriate decisions about patents? How many people would it take? How would you attract the qualified people, and would it be worth it to take them away from the industry? How could you make them accountable? And how could we realistically impose such a change and maintain such a system in the near future? I don’t think that these are simple matters of staffing or policy changes. Regardless of whether you believe in patents or not, its pretty clear to me that the current system is broken and needs some fundamental changes.
Joe Hewitt just published a blog post titled Web Technologies Need an Owner, and it has predictably created quite a contentious discussion. I have had the pleasure of discussing these issues with Joe on a few occasions, and have thought a lot about it myself, so I figure it’s as good a time as ever to share some of my thoughts, and some of my interpretations of what he is saying. Before I begin, I would like to state something that should be obvious but is often forgotten: we are all on the same page here. We all want a better more vibrant web. We share these opinions precisely because we care about the web.
For a little background, JavaScript and the web were one of my first programming environments. In 2008 I started a company that was all about the web, and the promise I thought it had for the future. We launched our first product 280 Slides, a presentation authoring tool, to a lot of excitement. Since I’m thus very familiar with this app, I feel that a good place to begin this discussion is with a short corollary that I think really sums up my general feelings about the web today:
Had we released 280 Slides today, in 2011, it would generate as much excitement and be considered just as innovative as when it was released 3 and a half years ago.
This is a problem and certainly doesn’t make me happy. I use 280 Slides here not because I believe it was the best app ever released, but because regardless of how good it was, this statement should not be true in a healthy and evolving environment. Anything from 3 and a half years ago should seem kind of old.
When I started 280 North with my cofounders in 2008 I believed deeply that the web held an incredible potential: a promise for the future for application development that was both free and exciting. I think the web has failed on that promise in a lot of ways. If anything, I certainly think that the timing was off.
What’s Important About the Web?
Before I continue, I’d like to take a step back to talk about what I mean by the web, and what I consider to be crucial to the web. The “web” is an overloaded term that gets used to mean a lot of things.
A lot of people think this is a discussion largely about the future of HTML, JavaScript, and CSS. While these are certainly important components which influence the web greatly, I think they actually represent the least important “pieces” of the web. A web composed of XML and Python would in my opinion still be the web. Alternatively, a native platform that happened to use HTML and JavaScript but forced you through a marketplace would hold very few of the virtues of the web.
Take a look at iTunes: it is well known that the iTunes Music Store (iTMS) is written in HTML and JavaScript and hosted in a WebKit View within the app. However, since it is confined to the iTunes cage, it couldn’t behave any less like the web. I can’t search the contents of the iTunes Music Store on Google, it’s incredibly difficult for me to link a friend to an iTMS page, I can’t go to the iTMS on Linux or any new platform that might come out tomorrow, and I can’t even do simple things like view two iTMS pages at once. Don’t get me wrong, the iTunes Music Store is great (you can check my credit card statements if you think I am being facetious), but it is not the web.
That is why I think a lot of the developments we’ve seen with “web technologies” in the past years are completely orthogonal to the future and health of the web. If you are a big proponent of getting JavaScript to run on embedded devices that’s great, but it is not really a “web” issue.
In other words, what I consider most important about the web is the freedom it gives us, the interoperability it provides. The web is unique in that by default web pages behave very well together, as opposed to native where by default things don’t cooperate very well at all. I think that what makes the web is best embodied in the browser. Ultimately what I am saying is that I don’t want the browser to die, since it remains our last outlet for free technological expression. A world where HTML and JavaScript become the dominant languages but the browser takes a more muted role in our lives is not a good one in my opinion.
In What Ways is the Web Failing?
The web is an incredibly complex environment, where the most important issues at stake are in my opinion the most subtle. For this reason, this argument can quickly digress and we often spend too much time focusing on the wrong problems.
I think a great example of this is the history of Flash. Most people would point to Flash’s incredible decline in the last 3 years as a great success of the web. However I believe that the crusade against Flash was incredibly misguided and ultimately left us in a worse position on the web overall. I am obviously no big fan of Flash (having created many technologies meant to compete directly against it), but what web evangelists fail to recognize is that the object tag is also part of the HTML standard and serves an incredibly democratic and important role in the web. Plugins are the equivalent to California’s ballot initiatives, an opportunity for us the developers to shoehorn something into the web when we are unsatisfied with where web standards are going. Remember that had it not been for plugins, YouTube may have taken another 5 or 10 years to come out. Had plugins existed on mobile perhaps FourSquare would today be a largely mobile web app instead of a native one. By killing Flash, what we actually did was kill plugins. Sure, today we happen to have a lot of influx of technologies being provided by HTML5, but as things change in the future, we have lost a powerful tool to experiment with. We are now completely at the mercy of the browser vendors and standards writers to have access to some really cool technologies, furthering the gap between web and native.
But of course Flash is not the crux of this argument. Simply put, the web is failing because it is failing to “expand its market”. If we were to imagine the web as a corporation, we the stockholders should be growing increasingly worried that it isn’t able to tackle new problems. The web is as good for CRM’s and blogs as it was years ago, but I’m honestly surprised that it still can’t compete with desktop applications. For a more cogent example, the web is still a great place for your social networks on the desktop, but is it really the best place to launch a social network on mobile? The data would suggest not: the most successful recent mobile social networks seem to all exist in the app space. Which brings us to a very interesting case study…
The Mobile Web
I find the mobile web incredibly interesting because it really represented the culmination of most web developer’s gripes and fantasies: it had no flash, it was dominated by one major engine (WebKit), it provided unheard of lowest denominator support for the fanciest and newest CSS features and HTML 5, and it even gave us hardware acceleration.
And yet… 4 years after the iPhone gave us the most modern web on mobile, what do we have to show for it? Not much, really. The mobile web is at best doing OK at presenting you existing desktop content in a mobile friendly way, but no one has really come out and done anything absolutely amazing with it. Have you ever been as blown away by a mobile web app as with the GarageBand native app? Alright, maybe that’s not fair since even most native apps aren’t as good as GarageBand, but the truth is, I’m not seeing a lot of innovation at all.
But what I find particularly disturbing is that the exact opposite seems to be taking place. Major news publications like the New York Times and the Economist have launched native apps, not to mention Facebook. This is precisely what the web is supposed to be good at, and yet the market is telling us something different.
What Does an “Owner” of Web Technologies have to do with all this?
I think there is some confusion as to what Joe meant by needing an owner. Hearing this, people imagine one person in charge and fear a tyrannical and incompetent owner, much like the old days of Internet Explorer. But ironically, that is precisely the situation today: the infrastructure for unilateral rule is already in place. If any browser were to become dominant today, there is no reason we wouldn’t relive the stagnation of those years again. I am betting that if you, the average web developer, wanted to get an idea into the next HTML spec, you wouldn’t be able to. In fact, very famous and well renowned developers have given up on trying for precisely this reason. That sounds a lot like ownership to me. I think what Joe meant is that HTML, JavaScript, and CSS need to be “demoted”, both in a technological and philosophical sense. After this, anyone and everyone could be the owner, and that is the key. Allow me to explain.
The reason no one thinks twice that jQuery or CoffeeScript have owners is because we do not put them on some untouchable pedestal. Worst case scenario, if the owners of jQuery or CoffeeScript make a mistake, we could always just use an older version, fork it, or just switch to a competitor. I think this is incredibly healthy. In other words, what matters is that anyone with vision can really take the reigns, and when they do, you can choose to not be affected by it.
When people imagine HTML, CSS, or JavaScript having an owner, they imagine that in the same context as they exist today where these are the only and lowest level technologies we have. So of course the odds are much higher. A mistake in HTML, CSS, and JavaScript could be devastating, the problems would percolate up the stack and affect everything, and it’s not like you could choose to not use JavaScript on the web.
Now imagine a world where the browser ran something lower level. Where Google, Mozilla, Microsoft, Opera, and Apple’s jobs in providing a browser more resembled Intel’s job of providing a chipset: they simply provided hooks for basic technologies like sound, camera, movies, geolocation, etc. And on top of that, we the developers wrote HTML, CSS, JavaScript, and… and anything else we might think up. What if HTML was a file you included in your webpages, a project you could fork on GitHub. Worst case scenario, if the current owners really messed up, you could always just fall back to HTML4 or something else… something we can’t even imagine today because we are not allowed to.
I found an interesting critique via daring fireball regarding the latest Cappuccino application, Mockingbird. I’m really glad this has come up because I think the concerns are valid and I’m excited that we can start having a conversation about this. One comment in particular really stuck out for me:
If you load the app, you can see custom scrollbars and navigation, a complete lack of accessibility, non-native controls, and all those other things that cause geeks to hate Flash.
This is a classic programmer’s misunderstanding of a design problem. Listen carefully guys: pure native controls aren’t what matters. What users actually care about is broken and ugly controls. This should be incredibly obvious from the fact that many of the most popular applications on the Mac use custom “non native” controls, such as Tweetie, iTunes, Quicktime, every single Apple Pro App, and Acorn just to name a few. In fact, its almost par for the course now adays.
Now, I haven’t heard a single complaint that Tweetie for the Mac doesn’t have blue aqua scrollbars. Why? Because the Tweetie scrollbars work well in this setting and look good. The reason people hated Java’s UI’s wasn’t because they were “non native”, its because they were ridiculously horrendous and behaved poorly to boot. Instead of admitting that what was needed was good designers, programmers simply drew the lazy conclusion that every control had to be drawn by the system to give it some sort of magical properties. This is exactly the problem with Flash. What frustrates me about the scrollers in Balsamiq isn’t simply that they’re different, its that they don’t work with my scrollwheel mouse and look incredibly out of place.
In Cappuccino we’ve taken two important steps: First, we’ve relentlessly implemented all the “native” features of scrollers (and other controls of course) people have come to expect: from command+clicking in the track to respecting horizontal scroll to listening to arrow keys. Have we missed one? Perhaps, in which case you should file a bug. Or better yet, fork the project and ship your fix immediately to your users, something you can’t do with Flash or the built in controls in HTML. Secondly, we’ve hired a real design firm, Sofa, to make a UI that truly looks awesome on the web: Aristo. You should take a look at their Cappuccino application EnStore and try to argue that this thing doesn’t feel great:
The second assertion was the following:
Gruber’s definition of “true web app” and mine greatly differ. Clue: If it’s completely unusable on the iPhone Safari browser, it doesn’t matter if it’s built in JavaScript, Flash or Microsoft Visual Fortran 2012. It’s not a “true web app”
Well, for starters:
Mockingbird on the iPhone
But let’s get to the real issue here, because this is once again a misunderstanding of design vs. programming. HTML, JS, and CSS do not magically create wonderful experiences on every platform they are run. As you can see from the above screenshot, they certainly have the nice side effect of working on said platforms, but if you’re expecting HTML to somehow handle the subtle and explicit differences between a handheld multitouch peripheral and a desktop application, well then you’re doing it wrong. These are completely different environments and they require completely different designs and often implementations. The reason mockingbird is “completely unusable” on the iPhone despite loading up fine is because it was designed for a large screen. Photoshop written with perfect semantic markup or however you want to define a “true web app” won’t work on a small screen either. Clearly though, the nice side effect of using Cappuccino is that you’ll at least be able to share common source code between both versions of these apps.
I think the fundamental conclusion here is that people get really hung up on the “web” part of web apps, when they should be focusing on the “app” part. At the end of the day, you are delivering your customer an experience. I believe that someday all apps will be web apps, and then this will become much clearer. At that point, what will matter in a mobile web app is the mobile part, and what will matter in the desktop web app is the desktop part.
HTML 5 is shaping up to be quite an impressive step up from the capabilities web developers are currently constrained to. One of my favorite new features provided by the spec is support for native drag and drop. Cappuccino and many other JavaScript libraries have had drag and drop support for quite a while now, but with one important caveat: the drag operations were limited to within the browser window. This was not only visually displeasing, but prevented you from being able to share data in a user friendly way from one web app to another, or even to other desktop apps. HTML 5 aims to change all this by giving us access to the computer’s native drag system and clipboard. I took the last week to really familiarize myself with this API and its various implementations on current browsers so I could start adding support for it in Cappuccino. I feel that this gave me a pretty unique perspective on the current state of this feature which I’d like to share, mainly because I’ve had to make it work in a number of real (sometimes shipping) applications, as opposed to simplying creating small demos. The good news is that last night I was able to land my first commit which adds full HTML 5 drag and drop support for Safari and other WebKit-based browsers. Here is a short movie that shows this feature in action in our internal 280 Slides builds:
As you can see, this feature enables you to easily share data, whether it be images and shapes or full slides, from one presentation to another. What’s particularly cool about this is that you won’t have to change your existing code at all since Cappuccino simply detects when you are on a compliant browser and magically “upgrades” to native drag and drop. On older browsers, you will still get the old in-browser implementation. Ah, the beauty of abstraction.
This isn’t to say that working with this feature was all peaches and cream though. For starters, this feature is far from complete in any browser. I experienced a tremendous amount of bugs, crashes, and inconsistencies in all the browsers I tried. On the one hand, I got to play with a very exciting new toy, and on the other I was given a glimpse into the future of the bugs I would be dealing with for years to come (just when we thought the whole cross-browser thing was starting to become managable). This isn’t surprising of course, it is a very new addition and the spec isn’t even 100% complete yet. For this reason, I’ve decided to split this post up into two pieces. In the following I will be discussing what I believe to be actual and serious design flaws in the current API, as well as a few suggestions I have for how they might be remedied. I will also separately link to a page that has all the bugs and inconsistencies I discovered (as well as the associated tickets I filed on them), and workarounds when I could find them.
I believe the main “theme” of the problems I encountered was due to the fact that I am trying to build full-blown applications as opposed to dynamic web pages. This however is no excuse, as one of HTML 5′s supposed goals is to usher in an era of more web apps that are more competitive with desktop apps. This is precisely why Google is supporting it so heavily.
Lazy Data Loading
One of the key facilities of drag and drop is the ability to provide, and get, multiple representations of the same data. Different web pages, web apps, and desktop apps support different kinds of data, so it is up to your application to give them something they can work with. Take 280 Slides for example: When a user drags the slides out of slides navigator, he may be planning to drop it to any number of locations. If he is dragging it from one instance of 280 Slides to another, then we want to provide a serialized version of these slides so that they can be added to the other presentation. If however, he drags these slides into a program like Photoshop, then we would want to provide image data. If he were to drag them to his desktop, then perhaps we could provide a PDF version. He could even drag them to his text editor and expect the text contents of his slides to be pasted.
An application can provide multiple data types for dragging.
Multiple Data Types
The way you do this currently is with the setData function, which allows you to specify different types of data:
This is incredibly common on the desktop, and you’ve probably never noticed it precisely because it works so well: things seem to just do the right thing when you drag and drop them. However, an unfortunate side effect of this feature is that you end up doing a lot of extra unecessary work. The user only ever drops the item to one location, and so all the other formats you’ve created were wasted processing time. This is not a big deal for simple cases of drag and drop, but it becomes quite noticable in large applications like 280 Slides. In the example above, creating serialized and image representations of these slides can become quite slow depending on how many elements are in the individual slides and how many slides you are moving. Because of this you may experience a lag when you first drag the slides out. The worst part is, if all you intended to do was reposition the slides in the same presentation, then you didn’t need any of these formats!
This problem was solved in a very simple and intelligent way on the desktop a long time ago: simply delay supplying the actual data until the drop occurs. At the point of the drop, you actually know which of the 5 supplied types the user is interested in, so create it then. Not only does this save you from doing uncessary work, but generally users notice time spent processing after a drop a lot less (because there is no expected user feedback to stutter). I’ve thought a lot about a good way to allow the user to do this with the existing setData method , and I think it could be done by simply allowing developers to provide functions that could be called when the data is needed:
Perhaps a more backwards compatible alternative would be:
Although I don’t really think this is necessary since this API is so new. Either way, this allows us to use the existing setData method, while not actually needing to calculate the string value until getData is actually called by the drop target.
Initiating Drags
Another major hurdle I encountered was in controling the way drags are actually started. Currently this is a delicate dance of preventDefaults and interactions between mousedown, mousemove, and dragstart, in combination with the draggable HTML attribute. The basic problem with this is that it leaves the decision to create a drag entirely to the the browser. Again, this is just fine for simple cases, but it really starts to break down when you are building full on applications in the browser. On the other hand, frameworks like Cocoa allow the developer to initiate the actual drag sequence. Lets look at why this is important with a simple example. It is quite common to want to start a drag event on the initial mouse down, instead of waiting for additional mouse move events. In these cases, it would be more confusing if the initial mouse down did nothing. This is currently impossible to achieve with the HTML 5 drag and drop APIs. In Cocoa, this would be quite simple, requiring the developer simply start the process in mouseDown: instead of mouseDragged:
This is just a simple example of course. More complex widgets provide even more cases where drag and drop in the browser really works against you. Take tables in Mac OS X, which provide different behaviors depending on what direction the users drags in:
As you can see, when a user drags upwards in a table on Mac OS X, the selection of the table changes (in other words, no drag takes place). On the other hand, if the user drags left, right, or diagonally in any way, then he is allowed to move these files. This is very intuitive experience when you use it, and is absolutely trivial to implement in Cocoa:
However, this is again basically impossible with the current HTML 5 API, as you can never be a part of the decision as to whether an object is dragged or not. Once you get the drag event, it’s too late. You can imagine that this would become even more cumbersome in applications like Bespin that revolve less around specific tags and more around content that is drawn to a canvas elements. When a user drag in Bespin, they have to decide between any number of actions. I think a good solution to this would be to simply allow the developer to manually kick off a dragging event loop from either a mousedown or mousemove callback. Something like this:
In both these cases, calling startDrag would result in no further mousemoves/mouseups being fired in this event loop, and instead would kick off the drag event loop with a “dragstart” event. A matching cancelDrag() could be provided as well. This would allow you to cancel a drag, but not any other specific behavior such as selection. Currently calling preventDefault cancels both drags and selection. This actually leads to a number of other confusing results. For example, if you place a textfield within a draggable element, it is essentially impossible for text selection to happen in that textfield, even if you set the textfield itself to not be draggable.
Drag Images
One of the nice parts about drag and drop is that you are allowed to set any arbitrary image or element as what is actually rendered during the drag process with the setDragImage method:
However, on Firefox it is required that this element already be visible. Now, I wasn’t sure whether to list this as simply a bug in Firefox or an actual design flaw, but I chose to list it as a flaw because the documentation at mozilla.org would seem to suggest that they may consider this to be “correct behavior”. Safari does not have this restriction, and in fact Firefox even seems to make an exception for canvas elements. Firefox seems particularly strict about this requirement too, as I tried positioning an element offscreen in a negative position, setting its visibility to hidden, setting the display to none, and even placing the element in an offscreen iframe, anything to prevent having to actually flash the element in some random portion of the screen before dragging it. It seems to me that this method exists for the purpose of showing something different, and thus it’s a bit unreasonable to expect it to already be not only in the document, but visible as well. My request here is simple: that it should simply work the way it does in Safari.
Conclusion
Drag and drop is an incredibly important part of the way we interact with computers, which is why it is so crucial that we get it right from the beginning. I really hope my concerns are heard and that we can come up with some good solutions to the initial problems I faced with this young API, so that we can avoid the windows of incompatibility that plagued the last updates to HTML. In the meanwhile, I’ve filed a bunch of bugs and documented my current experiences here.
I had the pleasure of showing off some the cool new features we’ve been adding to the WebKit inspector at JSConf last week. It’s no secret that debugging basically sucks in JavaScript, and until recently, it was a little bit worse in Objective-J. Up until now we’ve focused mainly on adding stop gap measures to our own code, but recently we’ve decided to shift gears and attack the problem head on in the browsers themselves. This is why these past couple of weeks I’ve set aside the JavaScript code and instead focused on working with the great guys on the WebKit team on providing a solid debugging experience both in Objective-J and JavaScript in general. We first decided to focus on profiling, since this is an area of considerable interest for a framework. All the code I’ve committed is now available in the latest WebKit nightly, so if you want you can download it to follow along. I’ve also added to the end of this post links to both the WebKit commits we added, as well as the accompanying code we put in Cappuccino in an effort to show how to make the best use of these new features and encourage others to take a stab at adding some debugging features to WebKit. Surprisingly enough, the folks over at Joose wasted no time incorporating this into their own library, so I’ve included links to their additions as well.
Anonymous and Poorly Named Functions
Had you run a Cappuccino application through Firebug’s profiler back in September when we originally open sourced the framework, you would have probably seen something that looked like this:
Anonymous functions in Firebug
Anyone who’s done a significant amount of profiling with Firebug has probably run into the dreaded question mark functions at some point or another, but as you can see from above, it used to be particularly egregious in Objective-J. The reason these question marks show up is because somewhere the script in question contains an anonymous function. Anonymous functions, or lambdas as they’re sometimes referred to, are functions that you can declare just about anywhere in your program and not bother naming. Take the following code for example:
Here we’re using an anonymous function to perform special behavior on a mouse click event, and when profiled, this function will show up as a question mark. The obvious workaround is to simply declare this function normally somewhere else in the code, but this isn’t always possible because you might need it inline in the code so as to form a closure. So instead the recommended solution today is to simply give it a name with the following syntax:
And in this particular case, this will work quite well and allow this function to appear in profile as clicked. However, there are certain cases where this won’t work. Let’s look at a different snippet of code to see such a case:
Here we’ve created a function called generator that creates other functions when executed. As is, these functions will show up as question marks just as before, but this time we can’t simply name them inline because then all the generated functions would show up with the exact same name:
Unfortunately, there is really very little we can do to remedy this situation short of using an eval statement, which would change the performance characteristics of this method so drastically that the entire exercise would become moot. It’s not just floating anonymous functions that suffer from poor naming though. Imagine that you have created the following prototypal classes and methods in your application or library:
Both in Firebug and Safari, this code will generate a largely useless profile:
Profiling object methods in Firebug or Safari
This profile is almost as ambiguous as when it was all question marks. We can’t tell whether MyClass.myMethod, MyClass.prototype.myMethod, or MyOtherClass.prototype.myMethod is the bottleneck here. If we aren’t generating these methods in any special way, we could try to name them inline, but we’d have to mangle the names considerably to pack in all the information we need:
This is clearly not the most elegant solution, and doesn’t scale given the fact that you have limited visual room in Firebug and Safari (you actually can’t stretch the function name column in either profiler). This also runs the risk, albeit a small one, of clashing with an existing function name. But the important thing to notice here is that it is not necessarily anonymous functions that are the source of the problem, but the fact that functions don’t actually have real names in JavaScript. It is only the variables that are bound to them that are named. So in order to solve this issue once and for all, we decided to define a way to explicitly give functions a name for debugging: the displayName attribute. In WebKit, you can now simply set this property with any arbitrary name you desire. Let’s revisit our generator example from earlier and see what we can do with this slightly modified code:
If we now rerun this profile in a recent WebKit nightly, we should see something like this:
Explicitly named functions in WebKit Profiles
Each function is now clearly identifiable in the results, allowing us to actually make use of this data. We can extend this same approach to our prototypal classes we defined above to achieve a similar effect:
If we were to profile this now, the much more descriptive displayNames would show up instead of simply seeing method() used in every case. This is the basic idea behind what Objective-J does in the latest Cappuccino 0.7 betas, but it takes place completely automatically behind the scenes, so that with no code changes of your own, applications now look something like this when profiled:
Profiling Objective-J in WebKit
As you can see from this profile, Objective-J now has first class profiling support in WebKit. The best part about this though is that it’s not just limited to Objective-J: any language abstraction now has the opportunity to make the same use of these tools. Objective-J happens to be a great candidate because it is such a thin wrapper around JavaScript, but a project such as processing.js could show the actual processing functions instead of their generated JavaScript analogues, or perhaps GWT could have a flag where it shows the Java methods in the profiler instead of the generated JavaScript as well. We’ve actually taken this one step further in Objective-J though, and used this feature to display information that you actually can’t presently see with normal JavaScript scripts. Currently both Safari and Firebug are incapable of profiling code that doesn’t execute explicitly in a function. This means that if a good portion of your profile is taking place at the top level of a script file, it will be completely left out in Firebug and lumped into the overly generic (program) category in Safari. But thanks to the special way we handle files in Objective-J, we are able to tell our users precisely how much time they are spending in a specific file:
Objective-J profiling is smart about files in WebKit
This is actually what I found most exciting about this seemingly simple property addition. In less than a day I was able to apply it in a completely new way to supply WebKit with even more information than we had originally designed it for. I feel that there is something really interesting in the idea that the code can interact directly with the debugging tools, and its why I believe that despite the debugging situation being so poor in JavaScript today, it has the potential of being much better than that of traditional languages. Expect to see us experiment more with this new kind of debugging here at 280 North in the future, because this is clearly just the tip of the iceberg.
More Fine-Grained Profiling
The other thing we focused heavily on doing these last couple of weeks was completely rewriting the Bottom Up View of the WebKit profiler. To get a better idea of what this is, lets first take look at the other alternative WebKit currently gives you for analyzing your profiles, known as the Top Down View:
Top Down View in WebKit
The Top Down View shows you a graph of the actual flow of your application, a call stack with the very first functions that were executed as the root nodes and the functions they called as their children. Thus, the data in each row represents the statistics for the call stack starting with the root node, and ending in the child node. I’ve fully expanded all the nodes here to be able to see the entire call graph. If we look at the second to last line of this view, we can see that it represents a recursive call to aFunction that took place from within a call to caller3:
Call stack represented in Top Down View
We’d read this by saying that 0.41% of the time was spent in 5 calls to aFunction with this call stack. While this representation of your profile certainly gives you a very holistic view of what happened in your program and can help you get a better idea of the general flow of functions taking place, it’s harder to draw conclusions such as which function most of the time is being spent in. To do this, we would need to add up all the individual child times and then compare them to eachother. In this simple example this doesn’t seem that daunting, but you can imagine that it can quickly become quite complex.
This is where the Bottom Up View comes in. Let’s take a look at the same profile using this view:
Bottom Up View Collapsed
If we leave the children collapsed, this should look very familiar to Firebug users: it is a flat list of every function called in your program, and how much time was spent in each. However, where things really get interesting is when you expand the children:
Bottom Up View Expanded
Unlike in the Top Down View, the children here represent the parents, or callers, of the root function in question. For example, the second row represents the call stack starting at caller3 and ending at aFunction:
The call stack represented in the Bottom Up View
Because of this, the statistics on each row actually still refer to the original root node, and not the child as in the Top Down View. So on the second row you’d say “1000 calls to aFunction took place originating from caller3”. Essentially we are just flipping the Top Down View on its head. In order to understand why this information is so powerful, let’s take a look at a real world example I recently ran into in Cappuccino. Now, the following is an Objective-J profile, but the principles are exactly the same in normal JavaScript:
Objective-J Profile in Bottom Up View
If we were using Firebug or any other flat listing tool, the naive interpretation of this profile would be that setFrameSize: is probably something worth tuning since it is third on our list and takes about 4.58% of our profile’s total time. This diagnosis is not wrong in the strict sense, but we may find it difficult to find out exactly why this method is so slow if we simply jump into setFrameSize:‘s implementation and start hacking away. Remember that functions can be quite complex internally as well, and you may spend your time needlessly optimizing a code path in this method that was not even reached during the profile. However, we may get a better idea if we instead inspect this further and look at setFrameSize:’s callers:
Examining setFrameSize:
Interestingly enough, after expanding this node we find that it is not necessarily setFrameSize: which is universally slow, but rather some special interaction between setFrameSize: and its caller sizeToFit. We know this because this method usually takes an average of 0.01% to 0.04%, but specifically when called from sizeToFit it takes a whopping 4.34%, over 200 times as long. Not only that, but all this time is concentrated in just 1 actual call, profiling gold! Perhaps there is something in sizeToFit that is purging a cache that setFrameSize: relies on, or perhaps sizeToFit causes setFrameSize: to take a completely different code path internally than normal. It could be any number of reasons, but we are now empowered with a much better understanding of what exactly is happening in the program that is causing this slowdown. In other words, this allows us to profile not only the functions themselves, but the relationship between functions as well.
What’s Next?
Debugging in JavaScript still has a long way to go. These changes are like night and day for frameworks like Cappuccino, but we have a bunch of other ideas we’d like to get implemented in WebKit’s inspector as well. We also think its important to try to take some of the work we’ve done here and get it placed into Firebug. Given that there is no one browser your code will run in, it is important to have a great set of tools on as many browsers as possible. We’ve used a hacked version of Firebug internally before, and if I recall correctly it shouldn’t be too difficult to add support for the displayName property to function objects, so hopefully we’ll get a patch out for that soon.
Addendum
As promised earlier, I have included a list of links to the WebKit, Cappuccino, and Joose commits below. The Cappuccino and Joose commits should help you integrate support for these new WebKit features in your own application or library, and hopefully the WebKit commits will inspire you to report/fix/write new features for JavaScript debugging: