Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsPC HardwareCPUMotherboardsVideo CardsStorageNetworkingPeripheralsBrand Name Systems
Related Topics
Video GamesWindowsMS Server ProductsMS OfficeMore Topics ...

Hardware Forum / Motherboards / ASUS / May 2007

Tip: Looking for answers? Try searching our database.

Skybuck's Universal Code 4 (memory efficient)

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Skybuck Flying - 26 May 2007 08:01 GMT
Skybuck's Universal Code 4:

The first version of Skybuck's Universal Code describes the basic idea but
it's memory inefficient for large ammounts of data/information.

For each data bit a marker bit was necessary.

Also the interleaving of bits could be cumbersome for software and maybe
even hardware ;)

Thus I present to you a somewhat more efficient universal code, which is
still variable bit-based but more efficient, though I am not sure if I like
the bitness of it... maybe I'll think up a byte version for
intel system which are octal/byte based :)

For now however the basic concept for version 4 is:

Length1 + Length2 + Data

This represent 3 fields stuck together in the information stream.

The first field describes the length of the second field. The second field
describes the length of the data.

The first field is like a marker for the second field, and the second field
is length for the data in binary encoding.

So I could also be described as:

Marker + Length + Data

But this could confuse people with version 1... where the term marker ment a
full marker...

So maybe describe it better as:

LengthMarker + Length + Data

or

LengthOfLength + Length + Data.

The first field consists out of zero's with a 1 terminator to describe the
length of the second field, the second field uses binary encoding, and the
data can be anything.

An example:

Suppose the following data has to be encoded/stored:

1234567890123
Hello World !

13 characters.

Length = 13 + 1 for null terminator = 14.

Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.

Thus also 4 bits needed for the marker.

The information stream as stored as follows:

length marker:
0001

length:
0111

data:
xxxxxxetc.

00010111xxxxxxetc

the length field describes the data length in bits to make it bit flexible.

So now software can read the information stream as follows:

// read length marker:

LengthMarkerCount := 0;
repeat
if ReadBit( Bit ) then
begin
 LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
 break; // error.
end;
until Bit = 1;

// allocate length field.

SetLengthInBits( Length, LengthMarkerCount );

// read length:
LengthCount := 0;
repeat
if ReadBit( Bit ) then
begin
 SetBit( Length, LengthCount, Bit );
 LengthCount := LengthCount + 1;
end else
begin
 break; // error, or retry, inform, etc.
end;
until LengthCount = LengthMarkerCount;

// allocate data field in bits.

SetLengthInBits( Data, Length );

// read data
DataCount := 0;
repeat
if ReadBit( Bit ) then
begin
 SetBit( Data, DataCount, Bit );
 DataCount := DataCount + 1;
end else
begin
 break; // error or retry.
end;
until DataCount = Length;

// ofcourse the example would not be complete without a write example how to
encode information so let's do that too:

For example take the string as example.

DataCountInBits := Length( 'Hello World !'+#0 ) * 8;
LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions
to determine bits needed to encode the value in binary.
LengthMarkerCountInBits := LengthCountInBits; // same

// now first write the marker:

if LengthMarkerCountInBits = 0 then exit; // strange error which should not
happen.

LengthMarkerCount := 0;

repeat
if LengthMarkerCount = LengthMarkerCountInBits then
begin
 // write terminating bit.
 if WriteBit( 1 ) then
 begin
  LengthMarkerCount := LengthMarkerCount + 1;
 end else
 begin
  // error, or retry, inform, etc.
 end;
end else
begin
 // write leading bit(s)
 if WriteBit( 0 ) then
 begin
  LengthMarkerCount := LengthMarkerCount + 1;
 end else
 begin
  // error, or retry, inform, etc.
 end;
end;

until LengthMarkerCount = LengthMarkerCountInBits;

// now write the binary length field.

if LengthCountInBits = 0 then exit; // strange error which should not
happen.

LengthCount := 0;

repeat
if LengthCount = LengthCountInBits then
begin
 if GetBit( Length, LengthCount, Bit ) then
 begin
  // write binary length bit.
  if WriteBit( Bit ) then
  begin
   LengthCount := LengthCount + 1;
  end else
  begin
   // error, or retry, inform, etc.
  end;
 end else
 begin
  break; // memory error.
 end;
end;

until LengthCount = LengthCountInBits;

// now write the data field:

if DataCountInBits = 0 then exit; // strange error which should not happen.

DataCount := 0;

repeat
if DataCount = DataCountInBits then
begin
 if GetBit( Data, DataCount, Bit ) then
 begin
  // write data bit.
  if WriteBit( Bit ) then
  begin
   DataCount := DataCount + 1;
  end else
  begin
   // error, or retry, inform, etc.
  end;
 end else
 begin
  break; // memory error.
 end;
end;

until DataCount = DataCountInBits;

There, that should give you an idea of how to do it. Ofcourse this is only
concept code... not really tested it... but should work nicely.

Some random and some crappy thoughts about using it in software development
source codes (can be skipped for reading):

(*
I am not sure if I like working with bits like this.. so as I wrote
before... I might think up a byte version of the universal code... which
might be more easy to program and handle...

not sure yet... working with bits can be straight forward to... but then
again sometimes not... depends on the source/classes etc... don't know
really, haven't tried yet...

Extra classess, and methods definetly needed... would be handy to have some
code which can simply encoded any type of data really... like a pointer and
a size or so... probably
easy to make... but also everything has to be "serialized" and
"deserialized" which is a bit weird and such... also offsets go out the
window ;) <- can almost no longer be used
since you don't know where you were or going to etc.. or maybe it's still
possible... probably not though...

First you have to serialize the previous fields before you can add the next
fields... that could be a drawback... but then again.. it doesn't really
have to be like that if you work
with seperate fields in memory... and then finally those fields can be
encoded and decoded seperately or they could be copied if they already
serialized / deserialized... so this kinda
creates a problem... since many possible design decissions possible...

Ofcourse it would be handy if the software works directly on universal
codes... this would make the software scale better automatically without
much or even any code changes necessary.

Working with encoded fields should therefore be recommanded with special
algorithm which take adventage of it, to create more flexiblity... and
finally for network and storage purposes
the fields could simply be copied... the serialize them after each other as
a sort of simple compression...

since on current hardware the fields will have some unused bits because the
memory works with bytes... etc.
*)

Though this code is complexer I like it better since it's more memory
efficient and actually also more speed efficient... less marker bits to
process.

Even for small numbers like 32 bits or 64 bits it's not that bad: for
example:

64 bits require 1x1,1x2,1x4,1x8,1x16,1x32,1x64 = 7 bits.

So that's a total of 7 markers bits, 7 length bits, 64 data bits = 78 bits.

Another extreme example, the opposite a large file:

20 Gigabyte = 20 * 8 = 160 gigabits.

8 bits for marker, 8 bits for length field, 160 gigabits = 160 gigabits plus
a little.

Compare that with 320 gigabits for version 1.

So a major efficiency improvement and still very probably just as flexible !
;)

Well a bit long and messy post, but I am lazy nowadays =D

But thinking about starting to use this stuff to make my software more
future-ready =DDD

BYYYYYYYYYEEEEEE,
 BYEEEEEEEEEEEEEEEEEEEEEE,
   Skybuck.

P.S.: My you enjoy universal codes, and maybe the Skybuck's Power be with
you =D
MooseFET - 26 May 2007 14:49 GMT
[...]
> Length1 + Length2 + Data
>
[quoted text clipped - 5 lines]
> The first field is like a marker for the second field, and the second field
> is length for the data in binary encoding.

If you go back and look in the responces to the earlier posts you made
on this subject, you will find a couple far more efficient and less
limiting suggestions were made back then.  They would have allowed you
to transmit 8 bit data in far less than the 14 bits your method is
suggesting.

This method you are now posting is nothing new.  I was something like
it implemented in the 1980s.

BTW:  You can do better if you leave the LSBs off the numbers.
Skybuck Flying - 26 May 2007 15:57 GMT
> [...]
>> Length1 + Length2 + Data
[quoted text clipped - 12 lines]
> on this subject, you will find a couple far more efficient and less
> limiting suggestions were made back then.

I read all reactions and I can not remember seeing such a suggestion.

> They would have allowed you
> to transmit 8 bit data in far less than the 14 bits your method is
> suggesting.

It's funny to see you did not get the method correct or maybe you just made
a sloppy mistake.

For 8 bits the encoding will be:

8 bit for data.

Prefixed with:

The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

Prefixed with:

The marker for the length field which is: 4 bits as well.

For a total of 8 + 8 = 16 bits :)

> This method you are now posting is nothing new.  I was something like
> it implemented in the 1980s.

Maybe, maybe not.

You made same claims without backing them up with evidence :)

> BTW:  You can do better if you leave the LSBs off the numbers.

I do not understand what you are hinting at.

Leaving out the least significant bit leads to inprecision ?

I don't want that !

Thanks anyway, funny to see america wake up and see my post around this time
:)

Bye,
 Skybuck.
MooseFET - 26 May 2007 20:40 GMT
> "MooseFET" <kensm...@rahul.net> wrote in message
[....]
> > [...]
> >> Length1 + Length2 + Data
[quoted text clipped - 21 lines]
> It's funny to see you did not get the method correct or maybe you just made
> a sloppy mistake.

No, it appears I gave your credit for understanding something you
didn't.

> For 8 bits the encoding will be:
>
[quoted text clipped - 3 lines]
>
> The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

That is in effiecient.  You never need to code for a zero length so
the length field would only need 3 bits.

> Prefixed with:
>
> The marker for the length field which is: 4 bits as well.

Again it is not needed to allow for the zero length case so this is
only 3 bits.

> For a total of 8 + 8 = 16 bits :)

So it is worse than I thought.

> > This method you are now posting is nothing new.  I was something like
> > it implemented in the 1980s.
>
> Maybe, maybe not.
>
> You made same claims without backing them up with evidence :)

Oh well, you can decide not to believe me if you like.  Its up to you.

> > BTW:  You can do better if you leave the LSBs off the numbers.
>
> I do not understand what you are hinting at.
>
> Leaving out the least significant bit leads to inprecision >?

There is no need to have an LSB on the length specifier.  The number
can have an extra MSB in it for the case when needed to round it up to
the next length that can be specified.  This reduces the length field
and the length of the length field.

It is still quite an inefficient way to do it.  It is also limited in
length.  There are far better options. Take a look at Huffman codes
you will see how they handle using differing numbers of bits.  Any of
several systems based on those ideas would work better for you.
Skybuck Flying - 27 May 2007 03:18 GMT
Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

Encoding an empty universal code could be handy for software which uses
classess to implement universal codes and simply uses this encoding to
describe "no data".

You mention other systems, are they universal codes, meaning they describe
their own length and scalable indefinetly ?

I know static huffman does not scale, it's static afterall ;)

Nor will dynamic huffman probably. How will you encode the new character ?

Again you do not seem to crash the powerfull concept of a universal code =D
or you mistakenly think other systems are universal codes ;)

Nor will I modify the universal code just to save one bit and create a
mismatch in binary system... cause that what your suggestion would mean.

0 would no longer represent zero... but 1 <- not gonna happen ;) way to
confusing and I am lazy :)

Thanks anyway, it did show you have nice ideas ;)

Bye,
 Skybuck.
MooseFET - 27 May 2007 04:07 GMT
> Ok,
>
[quoted text clipped - 11 lines]
>
> Missing.

You've used more than 2 bits to say "no data" and 3 bits to say 1 or
zero.

Better coding based on the idea behind Huffman :

00  -  Nodata
01  -  True
10  -  False
11  -  All other cases

Now we have the three simple cases done now we can decide how
most efficient path goes from here.  I'm lazy so I will only state a
possible path that stays more efficent than your suggestion:

At 2 bits, your method would take over 4 bits so we can step directly
to the 4 bit groups and always be better than your way:

1100 - 0 plus the following bit as a pair
1101 - 1 plus the following bit as a pair
1110 - Three bits follow
1111 - All other cases

This method can be extended without limit and will always be better.

The last time you posted this, someone pointed out a better pattern
than this but this one is good enough to prove the point.
Skybuck Flying - 27 May 2007 06:08 GMT
>> Ok,
>>
[quoted text clipped - 21 lines]
> 10  -  False
> 11  -  All other cases

Seems like you need 2 bits to describe true.

While my encoding only need 1 bit, sure there is overhead, but that overhead
is not part of the original data.

The original data can have any encoding that's the beauty of Skybuck's
universal code, existing encodings can be integrated into the universal
encoding.

No other special encodings are needed to represent data.

The data can stay in it's original form.

Only the marker and the length fields are prefixed to it.

Alternatively new encodings for the data section can be used as well for
example to represent larger numbers for arbitrary precision math.

> Now we have the three simple cases done now we can decide how
> most efficient path goes from here.  I'm lazy so I will only state a
[quoted text clipped - 12 lines]
> The last time you posted this, someone pointed out a better pattern
> than this but this one is good enough to prove the point.

I do not completely understand why you are making a big deal out of trying
to come up with a better encoding for the 1 bit of data case...

Also it seems you are trying to use a new encoding... that's not too
shabby... it can probably be integrated into the universal code data field
as well lol... but not really necessary.. that would just be double.

So for now I do not agree that "it's better" or even "more efficient" and I
definelty do not see how this  would be "more flexible" or "self describing
length ?".

That's a good question: where is the length field in your scenerio ? ;)

Seems like you need the universal code after all ;)

The data of the universal code can use any other encoding... isn't that
beautifull ? =D

Also what did you describe: mangled encoding ?

Or did you describe a new encoding for one of the fields: Marker, Length,
Data ?

I like keeping those fields nicely seperated ;)

So I definetly do not want a mangled encoding. <- bleh .

Please clearify or try again and then come again =D

Bye,
 Skybuck.
MooseFET - 27 May 2007 15:26 GMT
[....]
> > You've used more than 2 bits to say "no data" and 3 bits to say 1 or
> > zero.
[quoted text clipped - 10 lines]
> While my encoding only need 1 bit, sure there is overhead, but that overhead
> is not part of the original data.

You have more than one bit of overhead so you need more than two bits
to describe true.  You aren't using your system until you add your
overhead.

I only need two bits to describe true so I've used fewer bits.

> The original data can have any encoding that's the beauty of Skybuck's
> universal code, existing encodings can be integrated into the universal
> encoding.

That is also true of what I suggested.  Any number of bits can be
encoded and those bits can have any meaning.

> No other special encodings are needed to represent data.
>
> The data can stay in it's original form.

If you look arefully at what I did, you will see that that is also
true of my method.  The only difference is that my method is much
better.

> Only the marker and the length fields are prefixed to it.
>
> Alternatively new encodings for the data section can be used as well for
> example to represent larger numbers for arbitrary precision math.

My method imposed no limits on the payload data.  It just used less
bits in the overhead to get the job done.

> > Now we have the three simple cases done now we can decide how
> > most efficient path goes from here.  I'm lazy so I will only state a
[quoted text clipped - 15 lines]
> I do not completely understand why you are making a big deal out of trying
> to come up with a better encoding for the 1 bit of data case...

I'm pointing out that from zero to an infinite number of bits that my
method is better.  It always has less overhead than your method.  I
did the 0, 1 , 2, and 3 bit cases to show how the pattern goes.

I figured that showing how the first cases go would be enough to show
the pattern to you.

[....]
> So for now I do not agree that "it's better" or even "more efficient" and I
> definelty do not see how this  would be "more flexible" or "self describing
> length ?".

You don't agree only because you haven't understood the method.  The
system I showed have every advantage you could ever imagine yours has
but uses far few bits to do it.

> That's a good question: where is the length field in your scenerio ? ;)

Go back ad reread what I wrote.  What I did is straight forward
enough.

[...]
> Or did you describe a new encoding for one of the fields: Marker, Length,
> Data ?

I showed how to do everything you are doing but to use far fewer bits
to do it.
Skybuck Flying - 27 May 2007 16:20 GMT
How would you encode a larger example with your system, for example the
following data bits:

111010101010100110100101101110011

Maybe that would clearify your idea a bit :)

Bye,
 Skybuck.
MooseFET - 27 May 2007 17:37 GMT
> How would you encode a larger example with your system, for example the
> following data bits:
>
> 111010101010100110100101101110011
33 Bits by my count.

> Maybe that would clearify your idea a bit :)

I stopped after 3 bits, so I'll pick up at 4 bits and again.

The first 4 bits of the prefix are already stated as being 1111 so we
only need to deal with the next 6 bits.

00XXXX   - The 4 bit value XXXX
010XXX   - Two more bits follow making a total of 5 bits
0110XX   - Four more bits follow making 6 bits
01110X   - Six bits follow making 7 bits
011110   - 8 bits follow
10NNNN   - (9+NNNN) bits follow 9..24
110NNN   - Two more of N follow then (25+N) bits *
1110NN   - 4 more bits of N follow then (57+N) bits
11110N   - 6 more bits of N follow then (121+N) bits
111111   - All other cases

* Your question is answered here.

Notice how my system is not limited and has a smaller overhead whereas
your system is limited to only handle lengths up to  32768 bits and
takes more.
Skybuck Flying - 28 May 2007 05:56 GMT
>> How would you encode a larger example with your system, for example the
>> following data bits:
[quoted text clipped - 25 lines]
> your system is limited to only handle lengths up to  32768 bits and
> takes more.

Why do you think my system is limited ? It's not limited at all.

I have a question for your system:

After you have encoded, how do you decode ?

How do you detect where the prefixes start and the data begins ?

Bye,
 Skybuck.
Skybuck Flying - 28 May 2007 06:23 GMT
>>> How would you encode a larger example with your system, for example the
>>> following data bits:
[quoted text clipped - 8 lines]
>> The first 4 bits of the prefix are already stated as being 1111 so we
>> only need to deal with the next 6 bits.

from previous post:

> > 1100 - 0 plus the following bit as a pair
> > 1101 - 1 plus the following bit as a pair
> > 1110 - Three bits follow
> > 1111 - All other cases

>> 00XXXX   - The 4 bit value XXXX
>> 010XXX   - Two more bits follow making a total of 5 bits
[quoted text clipped - 6 lines]
>> 11110N   - 6 more bits of N follow then (121+N) bits
>> 111111   - All other cases

Your encoding starts either with a zero, followed by ones, followed by a
terminating zero, followed by the data.

Or

Your encoding starts with a one, followed by a terminating zero, followed by
the data.

This way the length of the prefix can be determined.

Knowing the length of the prefix how to determine how many data bits follow
?

(Also the 33 bits I described are data only, no overhead bits included, the
overhead bits would be 12 bits, total: 45 bits).

Bye,
 Skybuck.
MooseFET - 28 May 2007 14:44 GMT
> >> How would you encode a larger example with your system, for example the
> >> following data bits:
[quoted text clipped - 27 lines]
>
> Why do you think my system is limited ? It's not limited at all.

How many bits are in the length of length field?
If that number is finite, the maximum length is finite.

> I have a question for your system:
>
> After you have encoded, how do you decode ?
>
> How do you detect where the prefixes start and the data begins ?

You know how long the one prefix+data group is so you know when to
look for the next.

> Bye,
>   Skybuck.
Skybuck Flying - 28 May 2007 16:03 GMT
>> >> How would you encode a larger example with your system, for example
>> >> the
[quoted text clipped - 30 lines]
>
> How many bits are in the length of length field?

Ah !

That's the trick right there.

The encoding for the length of length is:

1. Variable.

2. Zero's followed by a terminating 1.

To figure out the length of length field:

Count the zero's and count the final 1.

And that's your length of length right there ! =D

See, unlimited, just like a null terminator for c strings.

> If that number is finite, the maximum length is finite.

Nope, it's not :)

>> I have a question for your system:
>>
[quoted text clipped - 4 lines]
> You know how long the one prefix+data group is so you know when to
> look for the next.

I provided some simple pseudo code for my encoding/decoding.

Can you provide some pseudo code for your encoding/decoding ?

Bye,
 Skybuck.
MooseFET - 28 May 2007 18:45 GMT
[...]
> > How many bits are in the length of length field?
>
[quoted text clipped - 9 lines]
>
> To figure out the length of length field:

Oh, that's a hopelessly inefficient way to do it.

[....]

> I provided some simple pseudo code for my encoding/decoding.
>
> Can you provide some pseudo code for your encoding/decoding ?

Yes I can.
Am I so lazy that it is unlikely that I will>
Also yes.
Skybuck Flying - 28 May 2007 20:18 GMT
> [...]
>> > How many bits are in the length of length field?
[quoted text clipped - 12 lines]
>
> Oh, that's a hopelessly inefficient way to do it.

I got a newsflash for ya:

Let's say we want to encode

18.446.744.073.709.551.616 data bits.

The overhead will be:

64 bits for the length.
64 bits for the marker.

Overhead: 128 bits.

Minor overhead.

> [....]
>>
[quoted text clipped - 5 lines]
> Am I so lazy that it is unlikely that I will>
> Also yes.

My overhead requires few instructions, which is good, since one must
remember cpu usage plays a role as well.

Now what was your encoding scheme ? hmmmm ?

Bye,
 Skybuck.
Rudy Velthuis - 28 May 2007 20:34 GMT
> I got a newsflash for ya:
>
> Let's say we want to encode
>
> 18.446.744.073.709.551.616 data bits.

How often would you have to do that? What about an integer of, say, a
maximum of 32 bits (they are pretty common, and happen much more often
than your super-long integers). What is the overhead for an integer of
32 bits?

Signature

Rudy Velthuis        http://rvelthuis.de

"Gigerenzer's Law of Indispensable Ignorance: The world cannot
function without partially ignorant people." -- Gerd Gigerenzer

Skybuck Flying - 27 May 2007 06:19 GMT
Also,

Your reasoning must be inheritly flawed.

Huffman compression is based on the idea that some characters or data will
occur more often then others.

Binary encoding is the most efficient "general" encoding where no occurence
statistics are known before hand.

Then there is also dynamic huffman compression which learns the occurence as
it views the data... however this requires describing new characters...

And a very important question is how do you describe/encode the new
characters ?

Fixed bits for new characters/data is ofcourse flawed, because it's not
variable bit, thus not an universal code :)

Some without completely understand what your little tiny encoding thingy was
about.. it's probably based on flawed reasoning in the first place :)

Bye,
 Skybuck ;)

P.S.: I like my universal encoding to be efficient for every possible
case/scenerio ;) =D not optimized for certain scenerio's and it definetly
should self describe the length of itself/scale/be flexible ofcourse
otherwise it wouldn't be a universal code ! DUH ;) =D

>> Ok,
>>
[quoted text clipped - 38 lines]
> The last time you posted this, someone pointed out a better pattern
> than this but this one is good enough to prove the point.
Rudy Velthuis - 27 May 2007 11:09 GMT
> Also,
>
> Your reasoning must be inheritly flawed.

Yours is. Read up on Huffman coding and be amazed.
Signature

Rudy Velthuis        http://rvelthuis.de

"Sometimes a scream is better than a thesis."
-- Ralph Waldo Emerson (1803-1882)

MooseFET - 26 May 2007 21:19 GMT
> "MooseFET" <kensm...@rahul.net> wrote in message
[....]
> > [...]
> >> Length1 + Length2 + Data
[quoted text clipped - 21 lines]
> It's funny to see you did not get the method correct or maybe you just made
> a sloppy mistake.

No, it appears I gave your credit for understanding something you
didn't.

> For 8 bits the encoding will be:
>
[quoted text clipped - 3 lines]
>
> The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

That is in effiecient.  You never need to code for a zero length so
the length field would only need 3 bits.

> Prefixed with:
>
> The marker for the length field which is: 4 bits as well.

Again it is not needed to allow for the zero length case so this is
only 3 bits.

> For a total of 8 + 8 = 16 bits :)

So it is worse than I thought.

> > This method you are now posting is nothing new.  I was something like
> > it implemented in the 1980s.
>
> Maybe, maybe not.
>
> You made same claims without backing them up with evidence :)

Oh well, you can decide not to believe me if you like.  Its up to you.

> > BTW:  You can do better if you leave the LSBs off the numbers.
>
> I do not understand what you are hinting at.
>
> Leaving out the least significant bit leads to inprecision >?

There is no need to have an LSB on the length specifier.  The number
can have an extra MSB in it for the case when needed to round it up to
the next length that can be specified.  This reduces the length field
and the length of the length field.

It is still quite an inefficient way to do it.  It is also limited in
length.  There are far better options. Take a look at Huffman codes.

> I don't want that !
>
[quoted text clipped - 3 lines]
> Bye,
>   Skybuck.
Skybuck Flying - 28 May 2007 07:37 GMT
"Skybuck Flying" <spam@hotmail.com> wrote in message news:...

>>> On May 27, 8:20 am, "Skybuck Flying" <s...@hotmail.com> wrote:
>>>> How would you encode a larger example with your system, for example the
[quoted text clipped - 40 lines]
> Knowing the length of the prefix how to determine how many data bits
> follow ?

Well the first part is easy to figure out:

// decode:

If encoding starts with a zero then
begin
   length of data  = number of ones in prefix + 4;
end;

However what about the second part ?:

if encoding starts with a one then
begin
 length of data = ?;
end;

// encode:

Also how to encode data using your system ?

Bye,
 Skybuck.
Skybuck Flying - 28 May 2007 07:48 GMT
"Skybuck Flying" <spam@hotmail.com> wrote in message news:...

> "Skybuck Flying" <spam@hotmail.com> wrote in message news:...
>>
[quoted text clipped - 30 lines]
>>>> 11110N   - 6 more bits of N follow then (121+N) bits
>>>> 111111   - All other cases

Ok, I guess this table doesn't show the whole picture.

This table is supposed to be:

1100 0X
1101 1X
1110 XXX
1111 00XX  XX
1111 010X  XXXX
1111 0110  XXXX   XX
1111 0111  0XXX   XXXX
1111 0111  10XX   XXXX   XX     *
1111 10NN NNNN NNNN NNN
1111 110N NNNN
1111 1110 NNNN  NN
1111 1111 0NNN  NNNN
1111 1111 11   - All other cases

* 10 bits overhead for only 8 bits.

Doesn't seem like your encoding is more efficient.

My encoding needs:

0x1 + 0x2 + 0x4 + 1x8 = 4 bits + 4 for markers = 8 bits overhead.

Total bits for my encoding: 16.
Total bits for your encoding: 18.

Your encoding uses 2 more bits !

Since I use a binary encoding system for the length, the longer the data
bits are the more efficient my encoding system will prove to be.

Binary is exponential after all ;)

The "unary-like" marker bits for the binary length bits will prove to be a
very small overhead =D

Bye,
 Bye,
   Skybuck.
Skybuck Flying - 28 May 2007 20:40 GMT
"Skybuck Flying" <spam@hotmail.com> wrote in message news:...

>> On May 28, 8:03 am, "Skybuck Flying" <s...@hotmail.com> wrote:
>> [...]
[quoted text clipped - 26 lines]
>
> Overhead: 128 bits.

Hmm I was off by 1 bit.

The correct way to calculate this:

Ceil( Log10( 1 + Numbers of Bits ) / Log10( 2 ) );

Or simply:

Ceil( Log2( 1+ Number of Bits ) );

Also if result = 0 then result = 1;

One needs 1 bit to encode zero after all ;)

So for the number of bits mentioned:

18.446.744.073.709.551.616 data bits:
Overhead: 130 bits.

18.446.744.073.709.551.616 - 1 data bits:
Overhead: 128 bits ;)

Sometimes I like being accurate on usenet :)

Bye,
 Skybuck.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.