Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Ubuntu Gutsy Patch Management Excellent

On 2007-12-11, Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
> I remember Canonical complaining about people who planned to download an RC of
> Ubuntu and then rsync for the diffs when it's ready. That would have taken
> servers to their knees. My experience my rsync over here is that it's CPU
> murder, but a bandwidth miracle.

Has anyone asked the rsync developers if they can address this?  Here's
basically how rsync works:

1. The receiver splits the file it has into non-overlapping fixed size
chunks. For each chunk, it computes two checksums:

    A. A strong checksum (I believe it uses MD4).

    B. A weaker checksum (only 32 bits, I think), but using an algorithm
    that has this important property:

        If you are given the checksum, C0, of a sequence of bytes

            B0 B1 B2 ... Bn-1

        you can compute the checksum, C1,  of the sequence of bytes

            B1 B2 B3 ... Bn

        using only B0, Bn, and C0.

2. The receiver sends the checksums to the sender.

3. The sender computes the weak checksum for every n byte chunk of the
file.  The special property of the weak checksum given above allows it
to do this efficiently, by computing it for the chunk starting at offset
0, and then it can just roll through the file, adjusting for each
offset.

4. The sender looks for matches in those checksums in the list from the
receiver. Where it finds matches, it computes the strong checksum to see
if they are real (because the weak checksum is weak, there will be many
matches that aren't real, hence the need for a strong checksum).

5. From this, the sender can figure out how to change the receiver's
current file into the sender's file.

It seems to me that this could work just as well with the sender doing
the non-overlapped checksums, and the receiver doing the rolling
checksum, the matching, the verification by strong checksum, and the
determination of what data to ask for.  That would put most of the CPU
work on the receiver side.

Presumably, there is an advantage to the way it is done now, but I bet
they could add an option to rsync to do it the other way, for use in
using rsync for large software distributions.

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index