## Wednesday, August 27, 2014

### What is SageMathCloud: let's clear some things up

"You will have to close source and commercialize Sage. It's inevitable." -- Michael Monagan, cofounder of Maple, told me this in 2006.
SageMathCloud (SMC) is a website that I first launched in April 2013, through which you can use Sage and all other open source math software online, edit Latex documents, IPython notebooks, Sage worksheets, track your todo items, and many other types of documents. You can write, compile, and run code in most programming languages, and use a color command line terminal. There is realtime collaboration on everything through shared projects, terminals, etc. Each project comes with a default quota of 5GB disk space and 8GB of RAM.

SMC is fun to use, pretty to look at, frequently backs up your work in many ways, is fault tolerant, encourages collaboration, and provides a web-based way to use standard command-line tools.

### The Relationship with the SageMath Software

The goal of the SageMath software project, which I founded in 2005, is to create a viable free open source alternative to Magma, Mathematica, Maple, and Matlab. SMC is not mathematics software -- instead, SMC is best viewed by analogy as a browser-based version of a Linux desktop environment like KDE or Gnome. The vast majority of the code we write for SMC involves text editor issues (problems similar to those confronted by Emacs or Vim), personal information management, support for editing LaTeX documents, terminals, file management, etc. There is almost no mathematics involved at all.

That said, the main software I use is Sage, so of course support for Sage is a primary focus. SMC is a software environment that is being optimized for its users, who are mostly college students and teachers who use Sage (or Python) in conjunction with their courses. A big motivation for the existence of SMC is to make Sage much more accessible, since growth of Sage has stagnated since 2011, with the number one show-stopper obstruction being the difficulty of students installing Sage.

#### Sage is Failing

Measured by the mission statement, Sage has overall failed. The core goal is to provide similar functionality to Magma (and the other Ma's) across the board, and the Sage development model and community has failed to do this across the board, since after 9 years, based on our current progress, we will never get there. There are numerous core areas of research mathematics that I'm personally familiar with (in arithmetic geometry), where Sage has barely moved in years and Sage does only a few percent of what Magma does. Unless there is a viable plan for the areas to all be systematically addressed in a reasonable timeframe, not just with arithmetic geometry in Magma, but with everything in Mathematica, Maple., etc, we are definitely failing at the main goal I have for the Sage math software project.

I have absolutely no doubt that money combined with good planning and management would make it possible to achieve our mission statement. I've seen this hundreds of times over at a small scale at Sage Days workshops during the last decade. And let's not forget that with very substantial funding, Linux now provides a viable free open source alternative to Microsoft Windows. Just providing Sage developers with travel expenses (and 0 salary) is enough to get a huge amount done, when possible. But all my attempts with foundations and other clients to get any significant funding, at even the level of 1% of the funding that Mathematica gets each year, has failed. For the life of the Sage project, we've never got more than maybe 0.1% of what Mathematica gets in revenue. It's just a fact that the mathematics community provides Mathematica $50+ million a year, enough to fund over 600 fulltime positions, and they won't provide enough to fund one single Sage developer fulltime. But the Sage mission statement remains, and even if everybody else in the world gives up on it, I HAVE NOT. SMC is my last ditch strategy to provide resources and visibility so we can succeed at this goal and give the world a viable free open source alternative to the Ma's. I wish I were writing interesting mathematical software, but I'm not, because I'm sucking it up and playing the long game. ### The Users of SMC During the last academic year (e.g., April 2014) there were about 20K "monthly active users" (as defined by Google Analytics), 6K weekly active users, and usually around 300 simultaneous connected users. The summer months have been slower, due to less teaching. Numerically most users are undergraduate students in courses, who are asked to use SMC in conjunction with a course. There's also quite a bit of usage of SMC by people doing research in mathematics, statistics, economics, etc. -- pretty much all computational sciences. Very roughly, people create Sage worksheets, IPython notebooks, and Latex documents in somewhat equal proportions. ### What SMC runs on Technically, SMC is a multi-datacenter web application without specific dependencies on particular cloud provider functionality. In particular, we use the Cassandra database, and custom backend services written in Node.js (about 15,000 lines of backend code). We also use Amazon's Route 53 service for geographically aware DNS. There are two racks containing dedicated computers on opposites sides of campus at University of Washington with 19 total machines, each with about 1TB SSD, 4TB+ HDD, and 96GB RAM. We also have dozens of VM's running at 2 Google data centers to the east. A substantial fraction of the work in implementing SMC has been in designing and implementing (and reimplementing many times, in response to real usage) a robust replicated backend infrastructure for projects, with regular snapshots and automatic failover across data centers. As I write this, users have created 66677 projects; each project is a self-contained Linux account whose files are replicated across several data centers. ### The Source Code of SMC The underlying source of SMC, both the backend server and frontend client, is mostly written in CoffeeScript. The frontend (which is nearly 20,000 lines of code) is implemented using the "progressive refinement" approach to HTML5/CSS/Javascript web development. We do not use any Javascript single page app frameworks, though we make heavy use of Bootstrap3 and jQuery. All of the library dependencies of SMC, e.g., CodeMirror, Bootstrap, jQuery, etc. for SMC are licensed under very permissive BSD/MIT, etc. libraries. In particular, absolutely nothing in the Javascript software stack is GPL or AGPL licensed. The plan is that any SMC source code that will be open sourced will be released under the BSD license. Some of the SMC source code is not publicly available, and is owned by University of Washington. But other code, e.g., the realtime sync code, is already available. Some of the functionality of SMC, for example Sage worksheets, communicate with a separate process via a TCP connection. That separate process is in some cases a GPL'd program such as Sage, R, or Octave, so the viral nature of the GPL does not apply to SMC. Also, of course the virtual machines are running the Linux operating system, which is mostly GPL licensed. (There is absolutely no AGPL-licensed code anywhere in the picture.) Note that since none of the SMC server and client code links (even at an interpreter level) with any GPL'd software, that code can be legally distributed under any license (e.g., from BSD to commercial). Also we plan to create a fully open source version of the Sage worksheet server part of SMC for inclusion with Sage. This is not our top priority, since there are several absolutely critical tasks that still must be finished first on SMC, e.g., basic course management. ### The SMC Business Model The University of Washington Center for Commercialization (C4C) has been very involved and supportive since the start of the projects. There are no financial investors or separate company; instead, funding comes from UW, some unspent grant funds that were about to expire, and a substantial Google "Academic Education Grant" ($60K). Our first customer is the "US Army Engineer Research and Development Center", which just started a support/license agreement to run their own SMC internally. We don't currently offer a SaaS product for sale yet -- the options for what can be sold by UW are constrained, since UW is a not-for-profit state university. Currently users receive enhancements to their projects (e.g., increased RAM or disk space) in exchange for explaining to me the interesting research or teaching they are doing with SMC.

The longterm plan is to start a separate for-profit company if we build a sufficient customer base. If this company is successful, it would also support fulltime development of Sage (e.g., via teaching buyouts for faculty, support of students, etc.), similar to how Magma (and Mathematica, etc.) development is funded.

In conclusion, in response to Michael Monagan, you are wrong. And you are right.

## Tuesday, August 26, 2014

### You don't really think that Sage has failed, do you?

I just received an email from a postdoc in Europe, and very longtime contributor to the Sage project.  He's asking for a letter of recommendation, since he has to leave the world of mathematical software development (after a decade of training and experience), so that he can take a job at hedge fund.  He ends his request with the question:

> P.S. You don't _really_ think that Sage has failed, do you?

After almost exactly 10 years of working on the Sage project, I absolutely do think it has failed to accomplish the stated goal of the mission statement: "Create a viable free open source alternative to Magma, Maple, Mathematica and Matlab.".     When it was only a few years into the project, it was really hard to evaluate progress against such a lofty mission statement.  However, after 10 years, it's clear to me that not only have we not got there, we are not going to ever get there before I retire.   And that's definitely a failure.

Here's a very rough quote I overheard at lunch today at Sage Days 61, from John Voight, who wrote much quaternion algebra code in Magma: "I'm making a list of what is missing from Sage that Magma has for working with quaternion algebras.  However, it's so incredibly daunting, that I don't want to put in everything.  I've been working on Magma's quaternion algebras for over 10 years, as have several other people.  It's truly daunting how much functionality Magma has compared to Sage."

The only possible way Sage will not fail at the stated mission is if I can get several million dollars a year in money to support developers to work fulltime on implementing interesting core mathematical algorithms.  This is something that Magma has had for over 20 years, and that Maple, Matlab, and Mathematica also have.   That I don't have such funding is probably why you are about to take a job at a hedge fund.    If I had the money, I would try to hire a few of the absolute best people (rather than a bunch of amateurs), people like you, Robert Bradshaw, etc. -- we know who is good. (And clearly I mean serious salaries, not grad student wages!)

So yes, I think the current approach to Sage has failed.    I am going to try another approach, namely SageMathCloud.  If it works, maybe the world will get a free open source alternative to Magma, Mathematica, etc.  Otherwise, maybe the world never ever will.      If you care like I do about having such a thing, and you're teaching course, or whatever, maybe try using SageMathCloud.   If enough people use SageMathCloud for college teaching, then maybe a business model will emerge, and Sage will get proper funding.

## Thursday, July 31, 2014

### SageMathCloud -- history and status

2005: I made first release the SageMath software project, with the goal to create a viable open source free alternative to Mathematica, Magma, Maple, Matlab.

2006: First web-based notebook interface for using Sage, called "sagenb". It was a cutting edge "AJAX" application at the time, though aimed at a small number of users.

2007-2009: Much work on sagenb. But it's still not scalable. Doesn't matter since we don't have that many users.

2011-: Sage becomes "self sustaining" from my point of view -- I have more time to work on other things, since the community has really stepped up.

2012: I'm inspired by the Simons Foundation's (and especially Jim Simon's) "cluelessness" about open source software to create a new online scalable web application to (1) make it easier for people to get access to Sage, especially on Windows, and (2) generate a more longterm sustainable revenue stream to support Sage development. (I was invited to a day-long meeting in NYC at Simon's headquarters.)

2012-2013: Spent much of 2012 and early 2013 researching options, building prototypes, some time talking with Craig Citro and Robert Bradshaw (both at Google), and launched SageMathCloud in April 2013. SMC got some high-profile use, e.g., by UCLA's 400+ student calculus course.

2014: Much development over the last 1.5 years. Usage has also grown. There is some growth information here. I also have useful google analytics data from the whole time, which shows around 4000 unique users per week, with an average session duration of 97 minutes (see attached). Number of users has actually dropped off during the summer, since there is much less teaching going on.

SMC itself is written mostly in CoffeeScript using Node.js on the backend. There's a small amount of Python as well.

It's a highly distributed multi-data center application. The database is Cassandra. The backend server processes are mostly Node.js processes, and also nginx+haproxy+stunnel.

A copy of user data is stored in every data center, and is snapshotted every few minutes, both via :
• ZFS -- for rolling snapshots that vanish after a month -- and via
• bup -- for snapshots that remain forever, and are consistent across data centers.
These snapshots are critical for making it possible to trust collaborators on projects to not (accidentally) destroy your work. It is not possible for users to delete the bup snapshots, by design.
Here's what it does: realtime collaborative editing of Latex docs, IPython notebooks, Sage worksheets; use the command line terminal; have several people collaborate on a project (=a Linux account).
The main applications seem to be:
• teaching courses with a programming or math software components -- where you want all your students to be able to use something, e.g., IPython, Julia, etc, and don't want to have to deal with trying to get them to install said software themselves. Also, you want to easily be able to share files with students, see their work in realtime, etc. It's a much, much easier for people to get going that with naked VM's they have to configure -- and also I provide cross-data center replication.
• collaborative research mathematics -- all co-authors of a paper work together in an SMC project, both writing the paper there and doing computations.
Active development work right now:
• course management for homework (etc)
• administration functionality (mainly motivated by self-hosting and better moderation)
• easy history slider to see all pasts states of a document
• switching from bootstrap2 to bootstrap3.

## Thursday, June 12, 2014

I've added task list functionality to SageMathCloud (SMC), so you can keep track of a list of things to do related to a project, paper, etc. Task lists are saved as files on the filesystem, so they can be backed up in the usual way, automatically generated, etc. I doubt anybody is going to use SMC just for the tasks lists, but for people already using SMC in order to use Sage, write LaTeX documents, use IPython notebooks, etc., having a convenient integrated task list should come in handy.
To create a task list, in a project, click "+New", name the task list, then click the "Task List" button.  Here's what a task list looks like:

## The Design

I used the task list quite a bit when implementing the task list, and significantly modified the interface many, many times. I also tried out numerous other todo list programs for inspiration. I eventually settled on the following key design choices, some of which are different than anything I've seen. In particular, the design targets highly technical users, which is not something I saw with any other todo list programs I tried.
• Markdown editor: The task description is formatted using client-side rendered Github flavored markdown (using marked), including [ ] for checkboxes. I also include full MathJax support, and spent quite a while working around various subtleties of mixing mathjax and markdown. I'll be rewriting Sage worksheets to use this code. The editor itself is Codemirror 4 in markdown mode, so it respects whatever theme you choose, has nice features like multiple cursors, paren matching, vim/emacs modes, etc. Multiple people can edit the same task at once and everybody will see the changes as they are made (note: I haven't implemented showing other cursors.)
• Relative dates and times: All dates and times are shown relative to right now. If something is due in 20 hours, it says "due about 20 hours from now". I also included a sortable column with the last time when a task was edited, also shown relative to the current time. This display uses the timeago jquery plugin. You can of course click on the due date to see the actual date.
• Hashtags: As I was implementing (and removing) features such as priorities, a way to indicate which task you are currently working on, folders, etc, I decided that hashtags can provide every feature. Moreover, they are very text editor/programmer friendly. When editing a task, if you put #foo, then it is interpreted as a hashtag, which you can click on to show only tasks containing that tag. To disambiguate with markdown headings, to make a heading you have to include a whitespace, so # foo. I haven't added autocomplete for hashtags yet, but will. You can easily click on hashtags anywhere to select them, or use the bar at the top.
• User-specified task order: The default sort order for tasks is custom. There is a drag handle so you can explicitly move tasks up and down in order to indicate how important they are to you (or whatever else). You can also click an up hand or down hand to knock the currently selected task to the bottom of the list of displayed tasks.
Of course, I still have an enormous list of functionality I would like to add, but that will have to wait. For example, I need to enable a chat associated to each task list, like the chats associated to Sage worksheets and other files. I also want to make it so one can select a range of tasks and copy them, move them, paste them into another list, etc. It would also be nice to be able to export task lists to HTML, PDF, etc., which should be fairly easy using pandoc.  I'm also considering making a note list, which is like a task list but without the due date or "done" box.  Because of all the infrastructure already there, it would be easy to add code evaluation functionality, thus creating something like worksheets, but from a different point of view (with maybe hashtags determining the Python process).

## Databases and Differential Synchronization

One interesting thing I noticed when implementing task lists is that there are many similarities with the original sagenb.org design (and also IPython notebook), in which a worksheet is a list of cells that get evaluated, can be manually reordered, etc. Similarly, a task list is a list of "cells" that you edit, manually reorder, etc. With sagenb we had longstanding issues involving the order of each cell and assigning an integer numerical id (0, 1, 2, ...) to the cells, which resulted in things like cells seemingly getting randomly reordered, etc. Also, having multiple simultaneous viewers with automatic synchronization is very difficult with that model. For task lists, I've introduced some new models to address these issues.

A natural way to store a task list is in a database, and I initially spent some time coming up with a good database schema and implementing basic lists using Cassandra for the backend. However, I couldn't pursue this approach further, since I was concerned about how to implement realtime synchronization, and also about the lack of easily backing up complete task lists via snapshots, in git, etc. So instead I created an "object database" API built on top of a file that is synchronized across clients (and the server) using differential synchronization. The underlying file format for the database is straightforward -- there is one line in JSON format for each record in the database. When objects are changed, the file is suitably modified, synchronized to other clients, and events are triggered.

Since differential synchronization has no trouble with files that have "a few thousand lines", this approach works well for our purposes (since personal or shared todo lists are typically fairly short). Also, having one line per record is at least more git friendly than something like a sqlite database. I'm considering rewriting my implementation of IPython notebook sync on top of this abstraction.
Since I view the task list as a database, each task has a globally unique uuid. Also, instead of viewing the task order as being defined by an integer 0,1,2,3, which leads to all manner of bugs and programming misery, race conditions, etc., instead we view the order as being determined by floating point positions. So to insert a task between tasks with positions 1 and 2, we just give the task position 1.5.

## Thursday, June 5, 2014

### The Official Go Tutorial as a (41-page) SageMathCloud worksheet

Do you like using interactive SageMathCloud worksheets and want to learn the basics of the Go Programming language? I've added a %go magic to SMC worksheets, and translated the official Go tutorial into a single long worksheet.
1. Open a SageMathCloud project and restart the project server if necessary (project settings --Restart).
2. Click +New and paste in this URL (then press enter): https://github.com/sagemath/cloud-examples.git
3. You'll get a large collection of example worksheets in a directory cloud-examples. Navigate to the "Go" subdirectory and open go.sagews.
You can also directly browse a PDF of the worksheet here: https://github.com/sagemath/cloud-examples/blob/master/go/go.pdf?raw=true

# User interface changes (refresh your browser to get)

## Sage Worksheets

• Don't show spinner unless computation is taking more than a second.
• Streamlined evaluation a little bit (never get stuck with the grey spinner when you're online)
• 2d graphics now display using svg by default, since browser bugs have been fixed

## Upgrade from Codemirror 3.20 to CodeMirror version 4.2, which is much better, faster, and has new features:

• Multiple cursors (use control or command+click to create multiple cursors)
• Sublime keyboard bindings
• New editor features (you can turn these off in account settings):
• Auto brackets: automatically close brackets, quotes, etc.
• Show trailing whitespace: show spaces at ends of lines
Here's a 2-minute screencast that illustrates some of the above UI features: http://youtu.be/ykb12MGHOuk

## Task Lists: There is now a preliminary task list implementation. To use it, make a file that ends in .tasks.

• Task editing now uses full screen width
• Markdown list autocomplete

# Backend changes (restart project server to get):

• Automatically delete project log entries when the log grows beyond 7500 lines. In some cases of very active projects, the log would get enormous (say 30MB+) and just loading it would slow down everything for a while.
• Clear a certain logfile that was getting huge whenever the project server is restarted.

## Tuesday, May 6, 2014

### Update to Differential Synchronization in SageMathCloud

I've just pushed out a major update to how synchronization works in https://cloud.sagemath.com.

This change is pretty significant behind the scenes, but the only difference you should notice is that everything should be better. In particular:

• evaluation of code in Sage worksheet should feel a little snappier and more robust,
• various random and hard to reproduce issues with synchronized editing should be fixed, e.g. chat messages out of order, etc.
• everything should generally be a bit faster and more scalable overall.

Here's a short technical description of what changed. The basic architecture of SageMathCloud is that there are many web browsers connected to many hubs, which are in turn connected to your project (and to many other projects too):

  [web browser] <- websocket ----\/
[web browser] <------------> [hub]<------ tcp -------\/
[project]
[web browser] <------------> [hub]<------------------/\

Until today, the differential synchronization implementation involved having a copy of the document you're editing on:

1. each hub pictured above,
2. in each browser, and
3. in the project itself.

In particular, there were three slightly different implementations of differential synchronization running all over the place. The underlying core code is the same for all three, but the way it is used in each case is different, due to different constraints. The implementations:

• browser: running in a web browser, which mainly has to worry about dealing with the CodeMirror editor and a flakie Internet connection.
• hub: running in a node.js server that's also handling a lot of other stuff, including worrying about auth, permissions, proxying, logging, account creation, etc.
• project: running in the project, which doesn't have to worry about auth or proxying or much else, but does have to worry about the filesystem.

Because we're using Node.js, all three implementations are written in the same language (CoffeeScript), and run the same underlying core code (which I BSD licensed at https://github.com/sagemath/cloud/blob/master/diffsync.coffee). The project implementation was easiest to write, since it's very simple and straightforward, and has minimal constraints. The browser implementation is mainly difficult, since the Internet comes and goes (as laptops suspend/resume), and it this involves patching and diff'ing a CodeMirror editor instance; CodeMirror is difficult, because it views the document as a line instead of a single long string, and we want things to work even for documents with hundreds of thousands of lines, so converting back and forth to a string is not an option! Implementing the hub part of synchronization is the hardest, for various reasons -- and debugging it is particularly hard. Moreover, computing diffs can be computationally expensive if the document is large, so doing anything involving differential sync on the hub can result in nontrivial locking cpu usage, hence slower handling of other user messages (node.js is single threaded). The hub part of the above was so hard to get right that it had some nasty locking code, which shouldn't be needed, and just looked like a mess.

A lot of issues people ran into with sync involved two browsers connected to different hubs, who then connected to the same document in a project. The two hubs' crappy synchronization would appear to work right in this part of the picture "[web browser] <------------> [hub]", but have problems with this part "[hub]<-------------->[project]", which would lead to pain later on. In many cases, the only fix was to restart the hub (to kill its sync state) or for the user to switch hubs (by clearing their browser cookies).

Change: I completely eliminated the hub from the synchronization picture. Now the only thing the hub does related to sync is forward messages back and forth between the web browser and local hub. Implementing this was harder than one might think, because the the project considered each client to be a single tcp connection, but now many clients can connect a project via the same tcp connection, etc.

With this fix, if there are any bugs left with synchronization, they should be much easier to debug. The backend scalability and robustness of sync have been my top priorities for quite a while now, so I'm happy to get this stuff cleaned up, and move onto the next part of the SMC project, which is better collaboration and course support.