The next beta release of Jungle Disk, which should be available tomorrow pending final testing, adds a raft of features including fast file renaming and copying. As many Jungle Disk users are aware, moving or renaming files with the current version of Jungle Disk can be a painful process. Because S3 does not allow objects to be renamed in-place once uploaded, moving or renaming a file requires it to be re-uploaded which takes time and can cost money as well since S3 charges for data transfers. While we still hope that Amazon will remove this limitation in the future, we’ve decided to put an interim solution in place to remove this headache and the associated costs.
To enable fast file renaming and copying, we’ve placed servers in Amazon.com datacenters using their EC2 service. When you need to rename, move, or copy a file on your Jungle Disk a request is sent to these servers with the old and new names. The EC2 server will copy the data from the old name to the new name without the data ever leaving the Amazon data center. This means that it not only occurs quickly, but you aren’t charged any bandwidth for the operation. The request sent from Jungle Disk to the servers is signed such that only that specific operation can be performed. In addition, the file data stays encrypted at all times during the transfer, and stays completely within Amazon’s internal network. Your secret key and encryption keys are never sent out of your machine.
This is the first of several features on our roadmap that will make use of Amazon’s EC2 servers. The ability to host services from within Amazon’s datacenters enables us to provide all kinds of functionality that would otherwise be impossible with the basic S3 API. However, if for any reason you’d prefer to not have your Jungle Disk client interact with these EC2 servers, the fast rename feature (as well as future EC2-enabled features) can easily be turned off in the options. In addition, should the Jungle Disk EC2 servers be unavailable for any reason, the Jungle Disk software will gracefully fall back to the old rename method.
10 Comments
RSS feed for comments on this post · TrackBack URI


Avi Flax said,
February 22, 2007 @ 3:23 pm
Sounds impressive, looking forward to it!
Colin Henderson said,
February 22, 2007 @ 10:45 pm
Fantastic job. The only thing you guys need is more promoters. This is the simplest, smoothest back up solution.
Colin
jd said,
February 23, 2007 @ 10:19 am
Dave,
Great use of EC2! I’m a tad confused, however, of how you can set up an EC2 server to rename files in a user account without transferring the user’s secret key. Does Jungle Disk just transfer the user id and password to the EC2 machine to get authority to change the file name?
Any light you could shed on how this is accomplished would be greatly appreciated!
Thanks,
JD
Jungle Dave said,
February 23, 2007 @ 7:46 pm
Sure – it’s actually quite simple. S3 allows you to “sign” a request on one machine, then execute it on another. Jungle Disk simply signs the requests locally, then sends them to the EC2 server to execute. Signed requests are only valid for that specific operation, and only for a short period of time.
jd said,
February 24, 2007 @ 7:56 am
Thanks for the reply Dave. The signing feature sounds cool. I am in line to get an EC2 account and look forward to working with them down the road. It really looks like the Amazon guys have thought this through really well.
Congrats on great software, BTW. I enjoy using it and I like your proposed pricing scheme. Good work.
-JD
Jackal said,
February 27, 2007 @ 9:52 pm
I was planning on getting an EC2 account to do the same thing with my other S3 site (I use Interarchy to publish publicly-downloadable files), and I’m happy to see that you’ve come up with the same solution!
My only question is: since it’s your EC2 account, you’re paying for the EC2 compute time (which shouldn’t be a lot for a simple mv or cp function)–so how are you going to pay for all of this? It seems generous for you to pay out of your pocket for it, but (unless you’ve inherited more money than you know what to do with) that’s not a long-term solution. Do you eventually plan to start charging for JungleDisk? (I love the utility and love even more that it’s free, but I *would* gladly pay a small sum for it…)
Jackal said,
February 27, 2007 @ 9:53 pm
Oops, helps if I actually read your blog in total (instead of just this entry). I see your proposed pricing scheme post–sounds like a winner!
Jackal said,
February 27, 2007 @ 10:43 pm
OK, sorry for the third post in a row, but…
I just did some JungleDisk sorting (moving some directories into a more organized hierarchy). I’m seeing copy rates of anywhere between 10000 kbps to 30000 kbps (roughly 1-3 MBps) on large files. That’s A LOT better than downloading and reuploading the files, so thanks for implementing this, Dave!
Still, it’s a bit slower than I would expect for computers in presumably the same data center as the storage (and presumably using at least 100Mbps NICs if not 1000Mbps NICs). Anyone getting better copy speeds? Should I expect better speeds?
I’ve been hashing out some slower-than-expected data transfer rates with S3 on the AWS forums. On my 3mbps home connection, I can only download at ~720Kbps (and it’s always exactly 720Kbps, so it almost seems as if there’s a packet shaper or some cap somewhere–even though no other downloads from other servers are capped at this speed, and even though I’m only 12-15 hops away from their Seattle POP, via my ISP’s OC192 fiber optic cable directly between Anchorage and Seattle), and at my university, I get bad speeds, even factoring in the 5mbps limit on the packet shaper and the OC3 we have to the commodity Internet. But that’s not really related to this post…I’m just hoping that Amazon increases the performance of their networks…
Jungle Dave said,
February 28, 2007 @ 9:44 am
The speeds are now listed in kilobits/sec, so 10000-30000kbps corresponds to 10-30mbits/sec. I’ve seen it go as high as 70mbit/sec on a single large file. When writing the data, it actually goes to multiple datacenters at the same time, so that speed is actually quite good.
Jackal said,
April 2, 2007 @ 1:49 am
Well, I had my ISP conduct some tests from their headquarters. On the cable modem platform, they found the same odd 720kbps limit (so I know it’s not my computer). But on their corporate network (presumably directly connected to their backbone), they got speeds of 5+mbps. So it’s not Amazon. (The techs at my ISP couldn’t figure out what was causing the limit, but they weren’t really motivated to solve it, I guess…)
But that’s not why I’m posting. I finally got my invite to EC2, and I loaded up a test image and played around with it. Fun–but more expensive for my type of usage than I thought.
I was under the impression that I would be charged for hours of CPU time–in other words, since my instance would most likely remain mostly idle, I would only be charged at most for an hour or two of usage per month. After I played with it and read the terms, I understood that I would be charged for each hour the machine remained up, regardless of usage (an instance-hour, not a CPU hour–so, basically, it’s a virtual dedicated server, not really shared hosting). So, leaving a machine up for me to occasionally connect to and play with would cost me $73 per month–not a good bargain for me, although that seems like a fairly good deal for dedicated hosting (of course, data transfer is extra).
Actually, the reason I post all that here is because I didn’t realize it, but JungleDave’s paying that $73 per month just to provide us with this very convenient fast-renaming service! (Well, maybe he’s using it for other things, too…) I hope he’s able to cover that with JD subscription sales…!