Hardware Forum / Motherboards / ASUS / May 2007
Skybuck's Universal Code 4 (memory efficient)
|
|
Thread rating:  |
Skybuck Flying - 26 May 2007 08:01 GMT Skybuck's Universal Code 4:
The first version of Skybuck's Universal Code describes the basic idea but it's memory inefficient for large ammounts of data/information.
For each data bit a marker bit was necessary.
Also the interleaving of bits could be cumbersome for software and maybe even hardware ;)
Thus I present to you a somewhat more efficient universal code, which is still variable bit-based but more efficient, though I am not sure if I like the bitness of it... maybe I'll think up a byte version for intel system which are octal/byte based :)
For now however the basic concept for version 4 is:
Length1 + Length2 + Data
This represent 3 fields stuck together in the information stream.
The first field describes the length of the second field. The second field describes the length of the data.
The first field is like a marker for the second field, and the second field is length for the data in binary encoding.
So I could also be described as:
Marker + Length + Data
But this could confuse people with version 1... where the term marker ment a full marker...
So maybe describe it better as:
LengthMarker + Length + Data
or
LengthOfLength + Length + Data.
The first field consists out of zero's with a 1 terminator to describe the length of the second field, the second field uses binary encoding, and the data can be anything.
An example:
Suppose the following data has to be encoded/stored:
1234567890123 Hello World !
13 characters.
Length = 13 + 1 for null terminator = 14.
Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.
Thus also 4 bits needed for the marker.
The information stream as stored as follows:
length marker: 0001
length: 0111
data: xxxxxxetc.
00010111xxxxxxetc
the length field describes the data length in bits to make it bit flexible.
So now software can read the information stream as follows:
// read length marker:
LengthMarkerCount := 0; repeat if ReadBit( Bit ) then begin LengthMarkerCount := LengthMarkerCount + 1; end else begin break; // error. end; until Bit = 1;
// allocate length field.
SetLengthInBits( Length, LengthMarkerCount );
// read length: LengthCount := 0; repeat if ReadBit( Bit ) then begin SetBit( Length, LengthCount, Bit ); LengthCount := LengthCount + 1; end else begin break; // error, or retry, inform, etc. end; until LengthCount = LengthMarkerCount;
// allocate data field in bits.
SetLengthInBits( Data, Length );
// read data DataCount := 0; repeat if ReadBit( Bit ) then begin SetBit( Data, DataCount, Bit ); DataCount := DataCount + 1; end else begin break; // error or retry. end; until DataCount = Length;
// ofcourse the example would not be complete without a write example how to encode information so let's do that too:
For example take the string as example.
DataCountInBits := Length( 'Hello World !'+#0 ) * 8; LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions to determine bits needed to encode the value in binary. LengthMarkerCountInBits := LengthCountInBits; // same
// now first write the marker:
if LengthMarkerCountInBits = 0 then exit; // strange error which should not happen.
LengthMarkerCount := 0;
repeat if LengthMarkerCount = LengthMarkerCountInBits then begin // write terminating bit. if WriteBit( 1 ) then begin LengthMarkerCount := LengthMarkerCount + 1; end else begin // error, or retry, inform, etc. end; end else begin // write leading bit(s) if WriteBit( 0 ) then begin LengthMarkerCount := LengthMarkerCount + 1; end else begin // error, or retry, inform, etc. end; end;
until LengthMarkerCount = LengthMarkerCountInBits;
// now write the binary length field.
if LengthCountInBits = 0 then exit; // strange error which should not happen.
LengthCount := 0;
repeat if LengthCount = LengthCountInBits then begin if GetBit( Length, LengthCount, Bit ) then begin // write binary length bit. if WriteBit( Bit ) then begin LengthCount := LengthCount + 1; end else begin // error, or retry, inform, etc. end; end else begin break; // memory error. end; end;
until LengthCount = LengthCountInBits;
// now write the data field:
if DataCountInBits = 0 then exit; // strange error which should not happen.
DataCount := 0;
repeat if DataCount = DataCountInBits then begin if GetBit( Data, DataCount, Bit ) then begin // write data bit. if WriteBit( Bit ) then begin DataCount := DataCount + 1; end else begin // error, or retry, inform, etc. end; end else begin break; // memory error. end; end;
until DataCount = DataCountInBits;
There, that should give you an idea of how to do it. Ofcourse this is only concept code... not really tested it... but should work nicely.
Some random and some crappy thoughts about using it in software development source codes (can be skipped for reading):
(* I am not sure if I like working with bits like this.. so as I wrote before... I might think up a byte version of the universal code... which might be more easy to program and handle...
not sure yet... working with bits can be straight forward to... but then again sometimes not... depends on the source/classes etc... don't know really, haven't tried yet...
Extra classess, and methods definetly needed... would be handy to have some code which can simply encoded any type of data really... like a pointer and a size or so... probably easy to make... but also everything has to be "serialized" and "deserialized" which is a bit weird and such... also offsets go out the window ;) <- can almost no longer be used since you don't know where you were or going to etc.. or maybe it's still possible... probably not though...
First you have to serialize the previous fields before you can add the next fields... that could be a drawback... but then again.. it doesn't really have to be like that if you work with seperate fields in memory... and then finally those fields can be encoded and decoded seperately or they could be copied if they already serialized / deserialized... so this kinda creates a problem... since many possible design decissions possible...
Ofcourse it would be handy if the software works directly on universal codes... this would make the software scale better automatically without much or even any code changes necessary.
Working with encoded fields should therefore be recommanded with special algorithm which take adventage of it, to create more flexiblity... and finally for network and storage purposes the fields could simply be copied... the serialize them after each other as a sort of simple compression...
since on current hardware the fields will have some unused bits because the memory works with bytes... etc. *)
Though this code is complexer I like it better since it's more memory efficient and actually also more speed efficient... less marker bits to process.
Even for small numbers like 32 bits or 64 bits it's not that bad: for example:
64 bits require 1x1,1x2,1x4,1x8,1x16,1x32,1x64 = 7 bits.
So that's a total of 7 markers bits, 7 length bits, 64 data bits = 78 bits.
Another extreme example, the opposite a large file:
20 Gigabyte = 20 * 8 = 160 gigabits.
8 bits for marker, 8 bits for length field, 160 gigabits = 160 gigabits plus a little.
Compare that with 320 gigabits for version 1.
So a major efficiency improvement and still very probably just as flexible ! ;)
Well a bit long and messy post, but I am lazy nowadays =D
But thinking about starting to use this stuff to make my software more future-ready =DDD
BYYYYYYYYYEEEEEE, BYEEEEEEEEEEEEEEEEEEEEEE, Skybuck.
P.S.: My you enjoy universal codes, and maybe the Skybuck's Power be with you =D
MooseFET - 26 May 2007 14:49 GMT [...]
> Length1 + Length2 + Data > [quoted text clipped - 5 lines] > The first field is like a marker for the second field, and the second field > is length for the data in binary encoding. If you go back and look in the responces to the earlier posts you made on this subject, you will find a couple far more efficient and less limiting suggestions were made back then. They would have allowed you to transmit 8 bit data in far less than the 14 bits your method is suggesting.
This method you are now posting is nothing new. I was something like it implemented in the 1980s.
BTW: You can do better if you leave the LSBs off the numbers.
Skybuck Flying - 26 May 2007 15:57 GMT > [...] >> Length1 + Length2 + Data [quoted text clipped - 12 lines] > on this subject, you will find a couple far more efficient and less > limiting suggestions were made back then. I read all reactions and I can not remember seeing such a suggestion.
> They would have allowed you > to transmit 8 bit data in far less than the 14 bits your method is > suggesting. It's funny to see you did not get the method correct or maybe you just made a sloppy mistake.
For 8 bits the encoding will be:
8 bit for data.
Prefixed with:
The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.
Prefixed with:
The marker for the length field which is: 4 bits as well.
For a total of 8 + 8 = 16 bits :)
> This method you are now posting is nothing new. I was something like > it implemented in the 1980s. Maybe, maybe not.
You made same claims without backing them up with evidence :)
> BTW: You can do better if you leave the LSBs off the numbers. I do not understand what you are hinting at.
Leaving out the least significant bit leads to inprecision ?
I don't want that !
Thanks anyway, funny to see america wake up and see my post around this time
:) Bye, Skybuck.
MooseFET - 26 May 2007 20:40 GMT > "MooseFET" <kensm...@rahul.net> wrote in message [....]
> > [...] > >> Length1 + Length2 + Data [quoted text clipped - 21 lines] > It's funny to see you did not get the method correct or maybe you just made > a sloppy mistake. No, it appears I gave your credit for understanding something you didn't.
> For 8 bits the encoding will be: > [quoted text clipped - 3 lines] > > The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits. That is in effiecient. You never need to code for a zero length so the length field would only need 3 bits.
> Prefixed with: > > The marker for the length field which is: 4 bits as well. Again it is not needed to allow for the zero length case so this is only 3 bits.
> For a total of 8 + 8 = 16 bits :) So it is worse than I thought.
> > This method you are now posting is nothing new. I was something like > > it implemented in the 1980s. > > Maybe, maybe not. > > You made same claims without backing them up with evidence :) Oh well, you can decide not to believe me if you like. Its up to you.
> > BTW: You can do better if you leave the LSBs off the numbers. > > I do not understand what you are hinting at. > > Leaving out the least significant bit leads to inprecision >? There is no need to have an LSB on the length specifier. The number can have an extra MSB in it for the case when needed to round it up to the next length that can be specified. This reduces the length field and the length of the length field.
It is still quite an inefficient way to do it. It is also limited in length. There are far better options. Take a look at Huffman codes you will see how they handle using differing numbers of bits. Any of several systems based on those ideas would work better for you.
Skybuck Flying - 27 May 2007 03:18 GMT Ok,
Now I understand what you're saying.
Marker:
1
Length
0
Data
Missing.
Encoding an empty universal code could be handy for software which uses classess to implement universal codes and simply uses this encoding to describe "no data".
You mention other systems, are they universal codes, meaning they describe their own length and scalable indefinetly ?
I know static huffman does not scale, it's static afterall ;)
Nor will dynamic huffman probably. How will you encode the new character ?
Again you do not seem to crash the powerfull concept of a universal code =D or you mistakenly think other systems are universal codes ;)
Nor will I modify the universal code just to save one bit and create a mismatch in binary system... cause that what your suggestion would mean.
0 would no longer represent zero... but 1 <- not gonna happen ;) way to confusing and I am lazy :)
Thanks anyway, it did show you have nice ideas ;)
Bye, Skybuck.
MooseFET - 27 May 2007 04:07 GMT > Ok, > [quoted text clipped - 11 lines] > > Missing. You've used more than 2 bits to say "no data" and 3 bits to say 1 or zero.
Better coding based on the idea behind Huffman :
00 - Nodata 01 - True 10 - False 11 - All other cases
Now we have the three simple cases done now we can decide how most efficient path goes from here. I'm lazy so I will only state a possible path that stays more efficent than your suggestion:
At 2 bits, your method would take over 4 bits so we can step directly to the 4 bit groups and always be better than your way:
1100 - 0 plus the following bit as a pair 1101 - 1 plus the following bit as a pair 1110 - Three bits follow 1111 - All other cases
This method can be extended without limit and will always be better.
The last time you posted this, someone pointed out a better pattern than this but this one is good enough to prove the point.
Skybuck Flying - 27 May 2007 06:08 GMT >> Ok, >> [quoted text clipped - 21 lines] > 10 - False > 11 - All other cases Seems like you need 2 bits to describe true.
While my encoding only need 1 bit, sure there is overhead, but that overhead is not part of the original data.
The original data can have any encoding that's the beauty of Skybuck's universal code, existing encodings can be integrated into the universal encoding.
No other special encodings are needed to represent data.
The data can stay in it's original form.
Only the marker and the length fields are prefixed to it.
Alternatively new encodings for the data section can be used as well for example to represent larger numbers for arbitrary precision math.
> Now we have the three simple cases done now we can decide how > most efficient path goes from here. I'm lazy so I will only state a [quoted text clipped - 12 lines] > The last time you posted this, someone pointed out a better pattern > than this but this one is good enough to prove the point. I do not completely understand why you are making a big deal out of trying to come up with a better encoding for the 1 bit of data case...
Also it seems you are trying to use a new encoding... that's not too shabby... it can probably be integrated into the universal code data field as well lol... but not really necessary.. that would just be double.
So for now I do not agree that "it's better" or even "more efficient" and I definelty do not see how this would be "more flexible" or "self describing length ?".
That's a good question: where is the length field in your scenerio ? ;)
Seems like you need the universal code after all ;)
The data of the universal code can use any other encoding... isn't that beautifull ? =D
Also what did you describe: mangled encoding ?
Or did you describe a new encoding for one of the fields: Marker, Length, Data ?
I like keeping those fields nicely seperated ;)
So I definetly do not want a mangled encoding. <- bleh .
Please clearify or try again and then come again =D
Bye, Skybuck.
MooseFET - 27 May 2007 15:26 GMT [....]
> > You've used more than 2 bits to say "no data" and 3 bits to say 1 or > > zero. [quoted text clipped - 10 lines] > While my encoding only need 1 bit, sure there is overhead, but that overhead > is not part of the original data. You have more than one bit of overhead so you need more than two bits to describe true. You aren't using your system until you add your overhead.
I only need two bits to describe true so I've used fewer bits.
> The original data can have any encoding that's the beauty of Skybuck's > universal code, existing encodings can be integrated into the universal > encoding. That is also true of what I suggested. Any number of bits can be encoded and those bits can have any meaning.
> No other special encodings are needed to represent data. > > The data can stay in it's original form. If you look arefully at what I did, you will see that that is also true of my method. The only difference is that my method is much better.
> Only the marker and the length fields are prefixed to it. > > Alternatively new encodings for the data section can be used as well for > example to represent larger numbers for arbitrary precision math. My method imposed no limits on the payload data. It just used less bits in the overhead to get the job done.
> > Now we have the three simple cases done now we can decide how > > most efficient path goes from here. I'm lazy so I will only state a [quoted text clipped - 15 lines] > I do not completely understand why you are making a big deal out of trying > to come up with a better encoding for the 1 bit of data case... I'm pointing out that from zero to an infinite number of bits that my method is better. It always has less overhead than your method. I did the 0, 1 , 2, and 3 bit cases to show how the pattern goes.
I figured that showing how the first cases go would be enough to show the pattern to you.
[....]
> So for now I do not agree that "it's better" or even "more efficient" and I > definelty do not see how this would be "more flexible" or "self describing > length ?". You don't agree only because you haven't understood the method. The system I showed have every advantage you could ever imagine yours has but uses far few bits to do it.
> That's a good question: where is the length field in your scenerio ? ;) Go back ad reread what I wrote. What I did is straight forward enough.
[...]
> Or did you describe a new encoding for one of the fields: Marker, Length, > Data ? I showed how to do everything you are doing but to use far fewer bits to do it.
Skybuck Flying - 27 May 2007 16:20 GMT How would you encode a larger example with your system, for example the following data bits:
111010101010100110100101101110011
Maybe that would clearify your idea a bit :)
Bye, Skybuck.
MooseFET - 27 May 2007 17:37 GMT > How would you encode a larger example with your system, for example the > following data bits: > > 111010101010100110100101101110011 33 Bits by my count.
> Maybe that would clearify your idea a bit :) I stopped after 3 bits, so I'll pick up at 4 bits and again.
The first 4 bits of the prefix are already stated as being 1111 so we only need to deal with the next 6 bits.
00XXXX - The 4 bit value XXXX 010XXX - Two more bits follow making a total of 5 bits 0110XX - Four more bits follow making 6 bits 01110X - Six bits follow making 7 bits 011110 - 8 bits follow 10NNNN - (9+NNNN) bits follow 9..24 110NNN - Two more of N follow then (25+N) bits * 1110NN - 4 more bits of N follow then (57+N) bits 11110N - 6 more bits of N follow then (121+N) bits 111111 - All other cases
* Your question is answered here.
Notice how my system is not limited and has a smaller overhead whereas your system is limited to only handle lengths up to 32768 bits and takes more.
Skybuck Flying - 28 May 2007 05:56 GMT >> How would you encode a larger example with your system, for example the >> following data bits: [quoted text clipped - 25 lines] > your system is limited to only handle lengths up to 32768 bits and > takes more. Why do you think my system is limited ? It's not limited at all.
I have a question for your system:
After you have encoded, how do you decode ?
How do you detect where the prefixes start and the data begins ?
Bye, Skybuck.
Skybuck Flying - 28 May 2007 06:23 GMT >>> How would you encode a larger example with your system, for example the >>> following data bits: [quoted text clipped - 8 lines] >> The first 4 bits of the prefix are already stated as being 1111 so we >> only need to deal with the next 6 bits. from previous post:
> > 1100 - 0 plus the following bit as a pair > > 1101 - 1 plus the following bit as a pair > > 1110 - Three bits follow > > 1111 - All other cases
>> 00XXXX - The 4 bit value XXXX >> 010XXX - Two more bits follow making a total of 5 bits [quoted text clipped - 6 lines] >> 11110N - 6 more bits of N follow then (121+N) bits >> 111111 - All other cases Your encoding starts either with a zero, followed by ones, followed by a terminating zero, followed by the data.
Or
Your encoding starts with a one, followed by a terminating zero, followed by the data.
This way the length of the prefix can be determined.
Knowing the length of the prefix how to determine how many data bits follow ?
(Also the 33 bits I described are data only, no overhead bits included, the overhead bits would be 12 bits, total: 45 bits).
Bye, Skybuck.
MooseFET - 28 May 2007 14:44 GMT > >> How would you encode a larger example with your system, for example the > >> following data bits: [quoted text clipped - 27 lines] > > Why do you think my system is limited ? It's not limited at all. How many bits are in the length of length field? If that number is finite, the maximum length is finite.
> I have a question for your system: > > After you have encoded, how do you decode ? > > How do you detect where the prefixes start and the data begins ? You know how long the one prefix+data group is so you know when to look for the next.
> Bye, > Skybuck. Skybuck Flying - 28 May 2007 16:03 GMT >> >> How would you encode a larger example with your system, for example >> >> the [quoted text clipped - 30 lines] > > How many bits are in the length of length field? Ah !
That's the trick right there.
The encoding for the length of length is:
1. Variable.
2. Zero's followed by a terminating 1.
To figure out the length of length field:
Count the zero's and count the final 1.
And that's your length of length right there ! =D
See, unlimited, just like a null terminator for c strings.
> If that number is finite, the maximum length is finite. Nope, it's not :)
>> I have a question for your system: >> [quoted text clipped - 4 lines] > You know how long the one prefix+data group is so you know when to > look for the next. I provided some simple pseudo code for my encoding/decoding.
Can you provide some pseudo code for your encoding/decoding ?
Bye, Skybuck.
MooseFET - 28 May 2007 18:45 GMT [...]
> > How many bits are in the length of length field? > [quoted text clipped - 9 lines] > > To figure out the length of length field: Oh, that's a hopelessly inefficient way to do it.
[....]
> I provided some simple pseudo code for my encoding/decoding. > > Can you provide some pseudo code for your encoding/decoding ? Yes I can. Am I so lazy that it is unlikely that I will> Also yes.
Skybuck Flying - 28 May 2007 20:18 GMT > [...] >> > How many bits are in the length of length field? [quoted text clipped - 12 lines] > > Oh, that's a hopelessly inefficient way to do it. I got a newsflash for ya:
Let's say we want to encode
18.446.744.073.709.551.616 data bits.
The overhead will be:
64 bits for the length. 64 bits for the marker.
Overhead: 128 bits.
Minor overhead.
> [....] >> [quoted text clipped - 5 lines] > Am I so lazy that it is unlikely that I will> > Also yes. My overhead requires few instructions, which is good, since one must remember cpu usage plays a role as well.
Now what was your encoding scheme ? hmmmm ?
Bye, Skybuck.
Rudy Velthuis - 28 May 2007 20:34 GMT > I got a newsflash for ya: > > Let's say we want to encode > > 18.446.744.073.709.551.616 data bits. How often would you have to do that? What about an integer of, say, a maximum of 32 bits (they are pretty common, and happen much more often than your super-long integers). What is the overhead for an integer of 32 bits?
 Signature Rudy Velthuis http://rvelthuis.de
"Gigerenzer's Law of Indispensable Ignorance: The world cannot function without partially ignorant people." -- Gerd Gigerenzer
Skybuck Flying - 27 May 2007 06:19 GMT Also,
Your reasoning must be inheritly flawed.
Huffman compression is based on the idea that some characters or data will occur more often then others.
Binary encoding is the most efficient "general" encoding where no occurence statistics are known before hand.
Then there is also dynamic huffman compression which learns the occurence as it views the data... however this requires describing new characters...
And a very important question is how do you describe/encode the new characters ?
Fixed bits for new characters/data is ofcourse flawed, because it's not variable bit, thus not an universal code :)
Some without completely understand what your little tiny encoding thingy was about.. it's probably based on flawed reasoning in the first place :)
Bye, Skybuck ;)
P.S.: I like my universal encoding to be efficient for every possible case/scenerio ;) =D not optimized for certain scenerio's and it definetly should self describe the length of itself/scale/be flexible ofcourse otherwise it wouldn't be a universal code ! DUH ;) =D
>> Ok, >> [quoted text clipped - 38 lines] > The last time you posted this, someone pointed out a better pattern > than this but this one is good enough to prove the point. Rudy Velthuis - 27 May 2007 11:09 GMT > Also, > > Your reasoning must be inheritly flawed. Yours is. Read up on Huffman coding and be amazed.
 Signature Rudy Velthuis http://rvelthuis.de
"Sometimes a scream is better than a thesis." -- Ralph Waldo Emerson (1803-1882)
MooseFET - 26 May 2007 21:19 GMT > "MooseFET" <kensm...@rahul.net> wrote in message [....]
> > [...] > >> Length1 + Length2 + Data [quoted text clipped - 21 lines] > It's funny to see you did not get the method correct or maybe you just made > a sloppy mistake. No, it appears I gave your credit for understanding something you didn't.
> For 8 bits the encoding will be: > [quoted text clipped - 3 lines] > > The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits. That is in effiecient. You never need to code for a zero length so the length field would only need 3 bits.
> Prefixed with: > > The marker for the length field which is: 4 bits as well. Again it is not needed to allow for the zero length case so this is only 3 bits.
> For a total of 8 + 8 = 16 bits :) So it is worse than I thought.
> > This method you are now posting is nothing new. I was something like > > it implemented in the 1980s. > > Maybe, maybe not. > > You made same claims without backing them up with evidence :) Oh well, you can decide not to believe me if you like. Its up to you.
> > BTW: You can do better if you leave the LSBs off the numbers. > > I do not understand what you are hinting at. > > Leaving out the least significant bit leads to inprecision >? There is no need to have an LSB on the length specifier. The number can have an extra MSB in it for the case when needed to round it up to the next length that can be specified. This reduces the length field and the length of the length field.
It is still quite an inefficient way to do it. It is also limited in length. There are far better options. Take a look at Huffman codes.
> I don't want that ! > [quoted text clipped - 3 lines] > Bye, > Skybuck. Skybuck Flying - 28 May 2007 07:37 GMT "Skybuck Flying" <spam@hotmail.com> wrote in message news:...
>>> On May 27, 8:20 am, "Skybuck Flying" <s...@hotmail.com> wrote: >>>> How would you encode a larger example with your system, for example the [quoted text clipped - 40 lines] > Knowing the length of the prefix how to determine how many data bits > follow ? Well the first part is easy to figure out:
// decode:
If encoding starts with a zero then begin length of data = number of ones in prefix + 4; end;
However what about the second part ?:
if encoding starts with a one then begin length of data = ?; end;
// encode:
Also how to encode data using your system ?
Bye, Skybuck.
Skybuck Flying - 28 May 2007 07:48 GMT "Skybuck Flying" <spam@hotmail.com> wrote in message news:...
> "Skybuck Flying" <spam@hotmail.com> wrote in message news:... >> [quoted text clipped - 30 lines] >>>> 11110N - 6 more bits of N follow then (121+N) bits >>>> 111111 - All other cases Ok, I guess this table doesn't show the whole picture.
This table is supposed to be:
1100 0X 1101 1X 1110 XXX 1111 00XX XX 1111 010X XXXX 1111 0110 XXXX XX 1111 0111 0XXX XXXX 1111 0111 10XX XXXX XX * 1111 10NN NNNN NNNN NNN 1111 110N NNNN 1111 1110 NNNN NN 1111 1111 0NNN NNNN 1111 1111 11 - All other cases
* 10 bits overhead for only 8 bits.
Doesn't seem like your encoding is more efficient.
My encoding needs:
0x1 + 0x2 + 0x4 + 1x8 = 4 bits + 4 for markers = 8 bits overhead.
Total bits for my encoding: 16. Total bits for your encoding: 18.
Your encoding uses 2 more bits !
Since I use a binary encoding system for the length, the longer the data bits are the more efficient my encoding system will prove to be.
Binary is exponential after all ;)
The "unary-like" marker bits for the binary length bits will prove to be a very small overhead =D
Bye, Bye, Skybuck.
Skybuck Flying - 28 May 2007 20:40 GMT "Skybuck Flying" <spam@hotmail.com> wrote in message news:...
>> On May 28, 8:03 am, "Skybuck Flying" <s...@hotmail.com> wrote: >> [...] [quoted text clipped - 26 lines] > > Overhead: 128 bits. Hmm I was off by 1 bit.
The correct way to calculate this:
Ceil( Log10( 1 + Numbers of Bits ) / Log10( 2 ) );
Or simply:
Ceil( Log2( 1+ Number of Bits ) );
Also if result = 0 then result = 1;
One needs 1 bit to encode zero after all ;)
So for the number of bits mentioned:
18.446.744.073.709.551.616 data bits: Overhead: 130 bits.
18.446.744.073.709.551.616 - 1 data bits: Overhead: 128 bits ;)
Sometimes I like being accurate on usenet :)
Bye, Skybuck.
|
|
|