How do you delete some data from a file? The file size should shrink.
How do you insert data at a specific offset without overwriting?
Something like this:
xy xaabcy
01 012345

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Re: FILE I/O by Chad

Chad
Sat May 22 11:57:01 CDT 2004

SpookyET <not4_u@hotmail.com> wrote in news:opr8eqjrn21s9n15@saturn:
> How do you delete some data from a file? The file size should shrink.

You have to copy the end parts down and shrink it.

> How do you insert data at a specific offset without overwriting?

You have to move the end part out, and then write in the gap.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Get your ASP.NET in gear with IntraWeb!
http://www.atozed.com/IntraWeb/

Re: FILE I/O by Eric

Eric
Sat May 22 12:38:30 CDT 2004

Here is how you insert some data:

fs = New FileStream("c:\News\fileio.txt", FileMode.Open,
FileAccess.ReadWrite)

iInsert = fs.Length / 2 ' just for a beginning location

iLen = sToWrite.Length

Dim temp(iLen - 1) As Byte

iPos = fs.Length - iLen


Do Until iPos = iInsert

iPos -= i

If iPos < iInsert Then iPos = iInsert

fs.Position = iPos

i = fs.Read(temp, 0, iLen)

fs.Write(temp, 0, i)

Loop

fs.Position = iInsert

fs.Write(ASCIIEncoding.ASCII.GetBytes(sToWrite), 0, iLen)

fs.Flush()

fs.Close()



and here is how you remove some data:



fs = New FileStream("c:\News\fileio.txt", FileMode.Open,
FileAccess.ReadWrite)

iRemove = fs.Length / 2 ' just for a beginning location

iLen = 3

Dim temp(iLen - 1) As Byte

iPos = iRemove + iLen

Do Until iPos = fs.Length - iLen

iPos += i

fs.Position = iPos

i = fs.Read(temp, 0, iLen)

fs.Position = iPos - iLen

fs.Write(temp, 0, i)

Loop

fs.SetLength(fs.Length - iLen)

fs.Flush()

fs.Close()


--
Eric Marvets
Principal Consultant

the bang project

<shameless self promotion>

Email sales@bangproject.com for Information on Our Architecture and
Mentoring Services

</shameless self promotion>



Re: FILE I/O by SpookyET

SpookyET
Sat May 22 12:40:48 CDT 2004

Thanks for the responses. How would you do this in an efficient way. I
have to deal with files from hundreds of mebibytes to tens of gibibytes.

Re: FILE I/O by Eric

Eric
Sat May 22 13:16:14 CDT 2004

The sample I sent you does not read and write in efficient chunks. I would
play with different read and write blocks until you find the which one will
work the fastest. You don't want it too big or small.

Without knowing anything else about what you are doing, its hard to offer
any other advice. I can suggest you look at what others have done, for
example garbage collection, it reclaims space in memory at certain
intervals, doing more work at those intervals, but reducing the total amount
of work done overall. Or SQL for example writes in pages, and does not
immediatly reclaim space, instead it leaves the space open so that it can
write to it later, and if need be split a page.


--
Eric Marvets
Principal Consultant

the bang project

<shameless self promotion>

Email sales@bangproject.com for Information on Our Architecture and
Mentoring Services

</shameless self promotion>



Re: FILE I/O by Daniel

Daniel
Sat May 22 14:40:16 CDT 2004


"SpookyET" <not4_u@hotmail.com> wrote in message
news:opr8ewqaqi1s9n15@saturn...
> Thanks for the responses. How would you do this in an efficient way. I
> have to deal with files from hundreds of mebibytes to tens of gibibytes.
I assume this is for your BT client, in which case you need to either use
sparse files(which you've already suggested won't work) or perhaps a lazy
sorting system.

Considering what I know about BT, a file is split into multiple, fixed sized
chunks(zero extend the last chunk and deal with explicitly). Because of
this, you can use this algorithm.

1. Get a chunk
2. If that chunk would be beyond the end of the current file, write it to
the end,
otherwise, locate the place where the chunk belongs. Read the chunk in
its spot out, write the original chunk and repeat this step with the
newly read chunk until you finally find one that belongs at the
end.
3. Repeat until file is complete.
4. Final pass continuing sorting until all chunks are in the appropriate
slots

You will need to maintain a map between chunk IDs to slot IDs, but it might
result in considerably less IO with large files than if you were to copy the
end out every time. There is no guarentee considering that worst case will
be as bad in either situation and best case should be pretty much as good in
either situation. (I havn't tested this, so I don't know what the average
case will be, but its worth a shot). IF you choose to implement it, please
let me know if it performs and scales well enough for you.



Re: FILE I/O by SpookyET

SpookyET
Sat May 22 15:19:42 CDT 2004

Thanks! The problem with bittorrent is that a piece can go over file
boundaries. Some people care more about disk space usage than file
fragmentation, thus I have to implement this. Sorting is horrible if you
have to go through all the files to sort pieces.

After I have mentioned your algorithm, I got this in return:

[16:12] (TheSHAD0W): Heh.
[16:12] (TheSHAD0W): No.
[16:12] (TheSHAD0W): That'd work, but you'd wind up with spikes in usage.
[16:12] (TheSHAD0W): Here's how it's done, and Bram devised it...
[16:12] (TheSHAD0W): You get data for a new chunk.
[16:12] (TheSHAD0W): Allocate space on the end.
[16:12] *** blay (~hydroscce@user-0cej7gg.cable.mindspring.com) has joined
#BitTorrent
[16:12] (TheSHAD0W): If that space allocated is that chunk, write it
there. Done.
[16:13] (TheSHAD0W): If there already exists data for that piece in
previously allocated space, move it there, and its old spot is your new
open space.
[16:14] (TheSHAD0W): Then, check if the data you got the piece for is in
allocated space but something else is taking up its spot. If so, move
that data to the open spot.
[16:14] (TheSHAD0W): Finally, write your data into that spot.
[16:14] (TheSHAD0W): Results in a maximum of two moves in order to get
your data out.

On Sat, 22 May 2004 14:40:16 -0500, Daniel O'Connell [C# MVP]
<onyxkirx@--NOSPAM--comcast.net> wrote:

>
> "SpookyET" <not4_u@hotmail.com> wrote in message
> news:opr8ewqaqi1s9n15@saturn...
>> Thanks for the responses. How would you do this in an efficient way. I
>> have to deal with files from hundreds of mebibytes to tens of gibibytes.
> I assume this is for your BT client, in which case you need to either use
> sparse files(which you've already suggested won't work) or perhaps a lazy
> sorting system.
>
> Considering what I know about BT, a file is split into multiple, fixed
> sized
> chunks(zero extend the last chunk and deal with explicitly). Because of
> this, you can use this algorithm.
>
> 1. Get a chunk
> 2. If that chunk would be beyond the end of the current file, write it to
> the end,
> otherwise, locate the place where the chunk belongs. Read the chunk
> in
> its spot out, write the original chunk and repeat this step with the
> newly read chunk until you finally find one that belongs at the
> end.
> 3. Repeat until file is complete.
> 4. Final pass continuing sorting until all chunks are in the appropriate
> slots
>
> You will need to maintain a map between chunk IDs to slot IDs, but it
> might
> result in considerably less IO with large files than if you were to copy
> the
> end out every time. There is no guarentee considering that worst case
> will
> be as bad in either situation and best case should be pretty much as
> good in
> either situation. (I havn't tested this, so I don't know what the average
> case will be, but its worth a shot). IF you choose to implement it,
> please
> let me know if it performs and scales well enough for you.
>
>



--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Re: FILE I/O by Daniel

Daniel
Sat May 22 15:36:31 CDT 2004


"SpookyET" <not4_u@hotmail.com> wrote in message
news:opr8e324rb1s9n15@saturn...
> Thanks! The problem with bittorrent is that a piece can go over file
> boundaries. Some people care more about disk space usage than file
> fragmentation, thus I have to implement this. Sorting is horrible if you
> have to go through all the files to sort pieces.
>
> After I have mentioned your algorithm, I got this in return:
>
> [16:12] (TheSHAD0W): Heh.
> [16:12] (TheSHAD0W): No.
> [16:12] (TheSHAD0W): That'd work, but you'd wind up with spikes in usage.

He's right, it has a potential for spiking.

> [16:12] (TheSHAD0W): Here's how it's done, and Bram devised it...
> [16:12] (TheSHAD0W): You get data for a new chunk.
> [16:12] (TheSHAD0W): Allocate space on the end.
> [16:12] *** blay (~hydroscce@user-0cej7gg.cable.mindspring.com) has joined
> #BitTorrent
> [16:12] (TheSHAD0W): If that space allocated is that chunk, write it
> there. Done.
> [16:13] (TheSHAD0W): If there already exists data for that piece in
> previously allocated space, move it there, and its old spot is your new
> open space.
> [16:14] (TheSHAD0W): Then, check if the data you got the piece for is in
> allocated space but something else is taking up its spot. If so, move
> that data to the open spot.
> [16:14] (TheSHAD0W): Finally, write your data into that spot.
> [16:14] (TheSHAD0W): Results in a maximum of two moves in order to get
> your data out.
However, the problem with this, IMHO, is after writing a slot out you moved
data that *will* be moved again. Whereas above it might be. I think this
appraoch will result in more overal disk IO than my algorithm, but it will
probably be spread out further.

In this case, I would probably benchmark a number of different approaches.
Consider allocating pages(perhaps 4 chunks per page) and try to only permit
half of a page to be used as scratch at any given time. Or perhaps others.
Now I'm curious, lets see what I can think up...
>
> On Sat, 22 May 2004 14:40:16 -0500, Daniel O'Connell [C# MVP]
> <onyxkirx@--NOSPAM--comcast.net> wrote:
>
>>
>> "SpookyET" <not4_u@hotmail.com> wrote in message
>> news:opr8ewqaqi1s9n15@saturn...
>>> Thanks for the responses. How would you do this in an efficient way. I
>>> have to deal with files from hundreds of mebibytes to tens of gibibytes.
>> I assume this is for your BT client, in which case you need to either use
>> sparse files(which you've already suggested won't work) or perhaps a lazy
>> sorting system.
>>
>> Considering what I know about BT, a file is split into multiple, fixed
>> sized
>> chunks(zero extend the last chunk and deal with explicitly). Because of
>> this, you can use this algorithm.
>>
>> 1. Get a chunk
>> 2. If that chunk would be beyond the end of the current file, write it to
>> the end,
>> otherwise, locate the place where the chunk belongs. Read the chunk
>> in
>> its spot out, write the original chunk and repeat this step with the
>> newly read chunk until you finally find one that belongs at the
>> end.
>> 3. Repeat until file is complete.
>> 4. Final pass continuing sorting until all chunks are in the appropriate
>> slots
>>
>> You will need to maintain a map between chunk IDs to slot IDs, but it
>> might
>> result in considerably less IO with large files than if you were to copy
>> the
>> end out every time. There is no guarentee considering that worst case
>> will
>> be as bad in either situation and best case should be pretty much as
>> good in
>> either situation. (I havn't tested this, so I don't know what the average
>> case will be, but its worth a shot). IF you choose to implement it,
>> please
>> let me know if it performs and scales well enough for you.
>>
>>
>
>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/m2/



Re: FILE I/O by SpookyET

SpookyET
Sun May 23 17:35:36 CDT 2004

I rewrote the algorithm that TheShadow mentioned to make it more clearly.

/*
* 1) A piece is received for storage.
* 2) Allocate space to hold the piece.
* 3) If the new allocated space is where the piece is supposed to be,
write it there. Return!
* 4) If the new allocated space is not where the received piece is
supposed to be, search for
* the piece that is supposed to there.
* 5) If the piece is not found, search for the location where the
received piece is supposed to be.
* 6) Move the piece found there into the new allocated space, and write
the received piece in the
* old location. Return!
* 6) If the piece that is supposed to be in the new allocated space was
found, move it there.
* 7) Search for the location where the received piece is supposed to be.
If there is a piece
* there, move it into the new empty space, and write the received
piece there. Return!
*
* The result is a maximum of 2 moves and 3 writes.
* If you have 8 (0-7) pieces and you get 6,4,3,5,7,1,0,2:
* 0) You get 6; 6 goes in 0.
* 1) You get 4; 4 goes in 1.
* 2) You get 3; 3 goes in 2.
* 3) You get 5; 3 goes in 3; 5 goes in 2.
* 4) You get 7; 4 goes in 4; 7 goes in 1.
* 5) You get 1; 5 goes in 5; 1 goes in 1; 7 goes in 2.
* 6) You get 0; 6 goes in 6; 0 goes in 0.
* 7) You get 2; 7 goes in 7; 2 goes in 2.
*/

I'm not very good at writing with the fewest words possible.
If you can rewrite what I have written above in a more clear way with less
words, I would appreciate if you post it.

You still have to deal with pieces that span multiple files.
You have to create a virtual file system that puts the files end to end,
like a chain.
If you wish to see how it works, draw some boxes on a paper that represent
pieces and play with it



On Sat, 22 May 2004 15:36:31 -0500, Daniel O'Connell [C# MVP]
<onyxkirx@--NOSPAM--comcast.net> wrote:

>
> "SpookyET" <not4_u@hotmail.com> wrote in message
> news:opr8e324rb1s9n15@saturn...
>> Thanks! The problem with bittorrent is that a piece can go over file
>> boundaries. Some people care more about disk space usage than file
>> fragmentation, thus I have to implement this. Sorting is horrible if
>> you
>> have to go through all the files to sort pieces.
>>
>> After I have mentioned your algorithm, I got this in return:
>>
>> [16:12] (TheSHAD0W): Heh.
>> [16:12] (TheSHAD0W): No.
>> [16:12] (TheSHAD0W): That'd work, but you'd wind up with spikes in
>> usage.
>
> He's right, it has a potential for spiking.
>
>> [16:12] (TheSHAD0W): Here's how it's done, and Bram devised it...
>> [16:12] (TheSHAD0W): You get data for a new chunk.
>> [16:12] (TheSHAD0W): Allocate space on the end.
>> [16:12] *** blay (~hydroscce@user-0cej7gg.cable.mindspring.com) has
>> joined
>> #BitTorrent
>> [16:12] (TheSHAD0W): If that space allocated is that chunk, write it
>> there. Done.
>> [16:13] (TheSHAD0W): If there already exists data for that piece in
>> previously allocated space, move it there, and its old spot is your new
>> open space.
>> [16:14] (TheSHAD0W): Then, check if the data you got the piece for is in
>> allocated space but something else is taking up its spot. If so, move
>> that data to the open spot.
>> [16:14] (TheSHAD0W): Finally, write your data into that spot.
>> [16:14] (TheSHAD0W): Results in a maximum of two moves in order to get
>> your data out.
> However, the problem with this, IMHO, is after writing a slot out you
> moved
> data that *will* be moved again. Whereas above it might be. I think this
> appraoch will result in more overal disk IO than my algorithm, but it
> will
> probably be spread out further.
>
> In this case, I would probably benchmark a number of different
> approaches.
> Consider allocating pages(perhaps 4 chunks per page) and try to only
> permit
> half of a page to be used as scratch at any given time. Or perhaps
> others.
> Now I'm curious, lets see what I can think up...
>>
>> On Sat, 22 May 2004 14:40:16 -0500, Daniel O'Connell [C# MVP]
>> <onyxkirx@--NOSPAM--comcast.net> wrote:
>>
>>>
>>> "SpookyET" <not4_u@hotmail.com> wrote in message
>>> news:opr8ewqaqi1s9n15@saturn...
>>>> Thanks for the responses. How would you do this in an efficient
>>>> way. I
>>>> have to deal with files from hundreds of mebibytes to tens of
>>>> gibibytes.
>>> I assume this is for your BT client, in which case you need to either
>>> use
>>> sparse files(which you've already suggested won't work) or perhaps a
>>> lazy
>>> sorting system.
>>>
>>> Considering what I know about BT, a file is split into multiple, fixed
>>> sized
>>> chunks(zero extend the last chunk and deal with explicitly). Because of
>>> this, you can use this algorithm.
>>>
>>> 1. Get a chunk
>>> 2. If that chunk would be beyond the end of the current file, write it
>>> to
>>> the end,
>>> otherwise, locate the place where the chunk belongs. Read the chunk
>>> in
>>> its spot out, write the original chunk and repeat this step with
>>> the
>>> newly read chunk until you finally find one that belongs at the
>>> end.
>>> 3. Repeat until file is complete.
>>> 4. Final pass continuing sorting until all chunks are in the
>>> appropriate
>>> slots
>>>
>>> You will need to maintain a map between chunk IDs to slot IDs, but it
>>> might
>>> result in considerably less IO with large files than if you were to
>>> copy
>>> the
>>> end out every time. There is no guarentee considering that worst case
>>> will
>>> be as bad in either situation and best case should be pretty much as
>>> good in
>>> either situation. (I havn't tested this, so I don't know what the
>>> average
>>> case will be, but its worth a shot). IF you choose to implement it,
>>> please
>>> let me know if it performs and scales well enough for you.
>>>
>>>
>>
>>
>>
>> --
>> Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
>
>



--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Re: FILE I/O by Daniel

Daniel
Wed May 26 11:29:52 CDT 2004


Sorry about the delay, I've been busy.
"SpookyET" <not4_u@hotmail.com> wrote in message
news:opr8g41mja1s9n15@saturn...
>I rewrote the algorithm that TheShadow mentioned to make it more clearly.
>
> /*
> * 1) A piece is received for storage.
> * 2) Allocate space to hold the piece.
> * 3) If the new allocated space is where the piece is supposed to be,
> write it there. Return!
> * 4) If the new allocated space is not where the received piece is
> supposed to be, search for
> * the piece that is supposed to there.
> * 5) If the piece is not found, search for the location where the
> received piece is supposed to be.
> * 6) Move the piece found there into the new allocated space, and write
> the received piece in the
> * old location. Return!
> * 6) If the piece that is supposed to be in the new allocated space was
> found, move it there.
> * 7) Search for the location where the received piece is supposed to be.
> If there is a piece
> * there, move it into the new empty space, and write the received
> piece there. Return!
> *
> * The result is a maximum of 2 moves and 3 writes.
> * If you have 8 (0-7) pieces and you get 6,4,3,5,7,1,0,2:
> * 0) You get 6; 6 goes in 0.
> * 1) You get 4; 4 goes in 1.
> * 2) You get 3; 3 goes in 2.
> * 3) You get 5; 3 goes in 3; 5 goes in 2.
> * 4) You get 7; 4 goes in 4; 7 goes in 1.
> * 5) You get 1; 5 goes in 5; 1 goes in 1; 7 goes in 2.
> * 6) You get 0; 6 goes in 6; 0 goes in 0.
> * 7) You get 2; 7 goes in 7; 2 goes in 2.
> */
>
> I'm not very good at writing with the fewest words possible.
> If you can rewrite what I have written above in a more clear way with less
> words, I would appreciate if you post it.
>
Your above sounds pretty fair. All I would do is reword it(and probably
lengthen it). If you understand it don't worry about the length.

> You still have to deal with pieces that span multiple files.
> You have to create a virtual file system that puts the files end to end,
> like a chain.
> If you wish to see how it works, draw some boxes on a paper that represent
> pieces and play with it
>

Multiple files...now there is a problem. For .NET I would probably just
create a stream that aggregates the files and does the writting to the
proper one(I assume thats what you are doing). If you do that you can just
generalize the above algorithm into that stream and forget about the
underlying files.