SpookyET
Sun May 23 17:35:36 CDT 2004
I rewrote the algorithm that TheShadow mentioned to make it more clearly.
/*
* 1) A piece is received for storage.
* 2) Allocate space to hold the piece.
* 3) If the new allocated space is where the piece is supposed to be,
write it there. Return!
* 4) If the new allocated space is not where the received piece is
supposed to be, search for
* the piece that is supposed to there.
* 5) If the piece is not found, search for the location where the
received piece is supposed to be.
* 6) Move the piece found there into the new allocated space, and write
the received piece in the
* old location. Return!
* 6) If the piece that is supposed to be in the new allocated space was
found, move it there.
* 7) Search for the location where the received piece is supposed to be.
If there is a piece
* there, move it into the new empty space, and write the received
piece there. Return!
*
* The result is a maximum of 2 moves and 3 writes.
* If you have 8 (0-7) pieces and you get 6,4,3,5,7,1,0,2:
* 0) You get 6; 6 goes in 0.
* 1) You get 4; 4 goes in 1.
* 2) You get 3; 3 goes in 2.
* 3) You get 5; 3 goes in 3; 5 goes in 2.
* 4) You get 7; 4 goes in 4; 7 goes in 1.
* 5) You get 1; 5 goes in 5; 1 goes in 1; 7 goes in 2.
* 6) You get 0; 6 goes in 6; 0 goes in 0.
* 7) You get 2; 7 goes in 7; 2 goes in 2.
*/
I'm not very good at writing with the fewest words possible.
If you can rewrite what I have written above in a more clear way with less
words, I would appreciate if you post it.
You still have to deal with pieces that span multiple files.
You have to create a virtual file system that puts the files end to end,
like a chain.
If you wish to see how it works, draw some boxes on a paper that represent
pieces and play with it
On Sat, 22 May 2004 15:36:31 -0500, Daniel O'Connell [C# MVP]
<onyxkirx@--NOSPAM--comcast.net> wrote:
>
> "SpookyET" <not4_u@hotmail.com> wrote in message
> news:opr8e324rb1s9n15@saturn...
>> Thanks! The problem with bittorrent is that a piece can go over file
>> boundaries. Some people care more about disk space usage than file
>> fragmentation, thus I have to implement this. Sorting is horrible if
>> you
>> have to go through all the files to sort pieces.
>>
>> After I have mentioned your algorithm, I got this in return:
>>
>> [16:12] (TheSHAD0W): Heh.
>> [16:12] (TheSHAD0W): No.
>> [16:12] (TheSHAD0W): That'd work, but you'd wind up with spikes in
>> usage.
>
> He's right, it has a potential for spiking.
>
>> [16:12] (TheSHAD0W): Here's how it's done, and Bram devised it...
>> [16:12] (TheSHAD0W): You get data for a new chunk.
>> [16:12] (TheSHAD0W): Allocate space on the end.
>> [16:12] *** blay (~hydroscce@user-0cej7gg.cable.mindspring.com) has
>> joined
>> #BitTorrent
>> [16:12] (TheSHAD0W): If that space allocated is that chunk, write it
>> there. Done.
>> [16:13] (TheSHAD0W): If there already exists data for that piece in
>> previously allocated space, move it there, and its old spot is your new
>> open space.
>> [16:14] (TheSHAD0W): Then, check if the data you got the piece for is in
>> allocated space but something else is taking up its spot. If so, move
>> that data to the open spot.
>> [16:14] (TheSHAD0W): Finally, write your data into that spot.
>> [16:14] (TheSHAD0W): Results in a maximum of two moves in order to get
>> your data out.
> However, the problem with this, IMHO, is after writing a slot out you
> moved
> data that *will* be moved again. Whereas above it might be. I think this
> appraoch will result in more overal disk IO than my algorithm, but it
> will
> probably be spread out further.
>
> In this case, I would probably benchmark a number of different
> approaches.
> Consider allocating pages(perhaps 4 chunks per page) and try to only
> permit
> half of a page to be used as scratch at any given time. Or perhaps
> others.
> Now I'm curious, lets see what I can think up...
>>
>> On Sat, 22 May 2004 14:40:16 -0500, Daniel O'Connell [C# MVP]
>> <onyxkirx@--NOSPAM--comcast.net> wrote:
>>
>>>
>>> "SpookyET" <not4_u@hotmail.com> wrote in message
>>> news:opr8ewqaqi1s9n15@saturn...
>>>> Thanks for the responses. How would you do this in an efficient
>>>> way. I
>>>> have to deal with files from hundreds of mebibytes to tens of
>>>> gibibytes.
>>> I assume this is for your BT client, in which case you need to either
>>> use
>>> sparse files(which you've already suggested won't work) or perhaps a
>>> lazy
>>> sorting system.
>>>
>>> Considering what I know about BT, a file is split into multiple, fixed
>>> sized
>>> chunks(zero extend the last chunk and deal with explicitly). Because of
>>> this, you can use this algorithm.
>>>
>>> 1. Get a chunk
>>> 2. If that chunk would be beyond the end of the current file, write it
>>> to
>>> the end,
>>> otherwise, locate the place where the chunk belongs. Read the chunk
>>> in
>>> its spot out, write the original chunk and repeat this step with
>>> the
>>> newly read chunk until you finally find one that belongs at the
>>> end.
>>> 3. Repeat until file is complete.
>>> 4. Final pass continuing sorting until all chunks are in the
>>> appropriate
>>> slots
>>>
>>> You will need to maintain a map between chunk IDs to slot IDs, but it
>>> might
>>> result in considerably less IO with large files than if you were to
>>> copy
>>> the
>>> end out every time. There is no guarentee considering that worst case
>>> will
>>> be as bad in either situation and best case should be pretty much as
>>> good in
>>> either situation. (I havn't tested this, so I don't know what the
>>> average
>>> case will be, but its worth a shot). IF you choose to implement it,
>>> please
>>> let me know if it performs and scales well enough for you.
>>>
>>>
>>
>>
>>
>> --
>> Using Opera's revolutionary e-mail client:
http://www.opera.com/m2/
>
>
--
Using Opera's revolutionary e-mail client:
http://www.opera.com/m2/