Skip to content

Commit 674c364

Browse files
author
lsleonard
committed
Version 2 extends compression up to 512 bytes.
1. In td512.c, for 128 and more values, text and extended string modes are called for checked data. For other data, and for any remaining values from calls for 128 or more values, td64 is called. A minimum of 16 characters are compressed. 2. In tdString.c, extended string mode was modified to stop on the 65th unique value. This value is the last value output and the number of values at that point is returned. 3. In main.c, after decompression, the input file is verified against the decompressed output file.
1 parent 9ae70bb commit 674c364

10 files changed

+577
-354
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ td512 filename [loopCount]
77

88
loopCount (default 1) is the loop count to use for performance testing. Also see BENCHMARK_LOOP_COUNT macro in main.c.
99

10-
Tiny data compression is not usually supported by compression programs. Now with td512 you can compress data from 6 to 512 bytes. td512 is available under the GPL-3.0 License at https://github.com/lsleonard/tiny-data-compression. Although for some types of data, programs QuickLZ, Zstandard and Snappy can get better compression at 512 bytes than td512, the performance of td512 is very close. All these programs steadily decline in compression ratio as the number of bytes decreases to 128. At 64 bytes, none of these programs produces compression. td512 combines the compressed output of td64 for each block of 64 bytes in the input, meaning that the compression achieved at 512 bytes is the same as that for 64 bytes. The td512 algorithm emphasizes speed, and based on data in this paper, gets 26.5% average compression at 272 Mbytes per second on the Squash benchmark test data (see https://quixdb.github.io/squash-benchmark/#) running on a 2 GHz processor.
10+
Tiny data compression is not supported by standard compression programs. Now with td512 you can reasonably compress data from 16 to 512 bytes. td512 is available under the GPL-3.0 License at https://github.com/lsleonard/tiny-data-compression. Compared with QuickLZ, a fast compression program that is designed to compress smaller data sets, td512 gets as good or better compression for 512-byte blocks of most data types. QuickLZ steadily declines in compression ratio as the number of bytes decreases to 128, and at 64 bytes, produces no compression. td512 has good compression at 64 bytes with the td64 interface. td512 combines extended text and string modes for 128 to 512 bytes with the td64 interface to compress any remaining bytes in the input. The td512 algorithm emphasizes speed, and based on data in this paper, gets 31.6% average compression for 512-byte blocks at 250 Mbytes per second on the Squash benchmark test data (see https://quixdb.github.io/squash-benchmark/#) running on a 2 GHz quad-core processor. For 64-byte blocks on this benchmark data, td512 gets 25.3% average compression at 250 MBytes per second.
1111

12-
You can call the td512 and td512d functions to compress and decompress 1 to 512 bytes. The td512 interface performs compression of 6 to 512 bytes, but accepts 1 to 5 bytes and stores them without compression. td512 acts as a wrapper that uses the td64 interface to compress blocks of 64 bytes until the final block of 64 or fewer bytes is compressed. Along with the number of bytes processed, a pass/fail bit is stored for each 64-byte (or smaller) block compressed, and the compressed or uncompressed data is output.
12+
You can call the td512 and td512d functions to compress and decompress 1 to 512 bytes. The td512 interface performs compression of 16 to 512 bytes, but accepts 1 to 15 bytes and stores them without compression. Along with its extended text and string modes, td512 acts as a wrapper that uses the td64 interface to compress blocks of 64 bytes until the final block of 64 or fewer bytes is compressed. Along with the number of bytes processed, a pass/fail bit is stored for each block compressed, and the compressed or uncompressed data is output.
1313

1414
With td64, you can call the td5 and td5d functions to compress and decompress 1 to 5 values. This interface is not used by td512 because the number of bytes generated is often more than the number of values to compress. Or you can call td64 and td64d functions to compress and decompress 6 to 64 values. The td64 interface returns pass (number of compressed bits) or fail (0) and outputs only compressed values. Decompression requires input of the number of original values.
1515

Tiny Data Compression with td512.docx

13.8 KB
Binary file not shown.

main.c

Lines changed: 57 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,43 @@
11
//
22
// main.c
33
// high-speed lossless tiny data compression for 1 to 512 bytes based on td512
4-
// version 1.1
54
//
65
// file-based test bed outputs .td512 file with encoded values
76
// then reads in that file and generates .td512d file with
87
// original values.
98
//
10-
// Created by Stevan Leonard on 10/30/21.
11-
// Copyright © 2021 Oxford House Software. All rights reserved.
12-
//
9+
// Created by L. Stevan Leonard on 10/31/21.
10+
// Copyright © 2021-2022 L. Stevan Leonard. All rights reserved.
1311
/*
14-
This program is free software: you can redistribute it and/or modify
15-
it under the terms of the GNU General Public License as published by
16-
the Free Software Foundation, either version 3 of the License, or
17-
(at your option) any later version.
18-
19-
This program is distributed in the hope that it will be useful,
20-
but WITHOUT ANY WARRANTY; without even the implied warranty of
21-
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
22-
GNU General Public License for more details.
23-
24-
You should have received a copy of the GNU General Public License
25-
along with this program. If not, see <https://www.gnu.org/licenses/>.//
26-
*/
12+
This program is free software: you can redistribute it and/or modify
13+
it under the terms of the GNU General Public License as published by
14+
the Free Software Foundation, either version 3 of the License, or
15+
(at your option) any later version.
16+
17+
This program is distributed in the hope that it will be useful,
18+
but WITHOUT ANY WARRANTY; without even the implied warranty of
19+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20+
GNU General Public License for more details.
21+
22+
You should have received a copy of the GNU General Public License
23+
along with this program. If not, see <https://www.gnu.org/licenses/>.
24+
*/
2725
#include "td512.h" // td512 functions
2826

2927
#include <stdio.h>
3028
#include <stdlib.h>
3129
#include <time.h>
3230
#include <string.h>
3331

34-
#define BENCHMARK_LOOP_COUNT // // special loop count for benchmarking
32+
#define BENCHMARK_LOOP_COUNT // special loop count for benchmarking
3533
//#define TEST_TD512 // invokes test_td512_1to512
3634

35+
#ifdef TD512_TEST_MODE
36+
extern uint32_t gExtendedStringCnt;
37+
extern uint32_t gExtendedTextCnt;
38+
extern uint32_t gtd64Cnt;
39+
#endif
40+
3741
int32_t test_td512_1to512(void)
3842
{
3943
// generate data then run through compress and decompress and compare for 1 to 512 values
@@ -63,7 +67,7 @@ int32_t test_td512_1to512(void)
6367
}
6468
if (textData[0] == 'i')
6569
{
66-
// set all values to 0x83 and run again
70+
// set all values to same value and run again
6771
memset(textData, 0x91, sizeof(textData));
6872
goto RUN_512;
6973
}
@@ -90,7 +94,7 @@ int main(int argc, char* argv[])
9094
int loopCnt; // argv[4] option: default is 1
9195
uint32_t blockSize=512; // block size to use when iterating through file
9296

93-
printf("tiny data compression td512 %s block size: %d\n", TD64_VERSION, blockSize);
97+
printf("tiny data compression td512 %s block size: %d\n", TD512_VERSION, blockSize);
9498
#ifdef TEST_TD512
9599
int32_t retVal;
96100
if ((retVal=test_td512_1to512()) != 0) // do check of 1 to 512 values
@@ -124,7 +128,7 @@ int main(int argc, char* argv[])
124128
fclose(ifile);
125129

126130
// allocate "uncompressed size" + 3 bytes per block for the destination buffer
127-
dst = (unsigned char*) malloc(len + 3 * (len / blockSize + 1));
131+
dst = (unsigned char*) malloc(len + 4 * (len / blockSize + 1));
128132
if (argc >= 3)
129133
{
130134
sscanf(argv[2], "%d", &loopCnt);
@@ -135,9 +139,9 @@ int main(int argc, char* argv[])
135139
{
136140
loopCnt = 1;
137141
}
138-
#ifdef BENCHMARK_LOOP_COUNT // // special loop count for benchmarking
142+
#ifdef BENCHMARK_LOOP_COUNT // special loop count for benchmarking
139143
loopCnt = 100000000 / len;
140-
loopCnt = (loopCnt < 20) ? 20 : loopCnt;
144+
loopCnt = (loopCnt < 20) ? 10 : loopCnt;
141145
loopCnt = (loopCnt > 2000) ? 2000 : loopCnt;
142146
#endif
143147
loopNum = 0;
@@ -154,7 +158,7 @@ int main(int argc, char* argv[])
154158
while (nBytesRemaining > 0)
155159
{
156160
uint32_t nBlockBytes=(uint32_t)nBytesRemaining>=blockSize ? blockSize : (uint32_t)nBytesRemaining;
157-
nCompressedBytes = td512(src+srcBlockOffset, dst+dstBlockOffset, nBlockBytes);
161+
nCompressedBytes = td512(src+srcBlockOffset, dst+dstBlockOffset, nBlockBytes);
158162
if (nCompressedBytes < 0)
159163
exit(nCompressedBytes); // error occurred
160164
nBytesRemaining -= nBlockBytes;
@@ -173,12 +177,17 @@ int main(int argc, char* argv[])
173177
}
174178
timeSpent = minTimeSpent;
175179
printf("compression=%.02f%% %.00f bytes per second inbytes=%lu outbytes=%u\n", (float)100*(1.0-((float)totalCompressedBytes/(float)len)), (float)len/(float)timeSpent, len, totalCompressedBytes);
180+
#ifdef TD512_TEST_MODE
181+
double totalBlocks=gExtendedTextCnt+gExtendedStringCnt+gtd64Cnt;
182+
printf("TD512_TEST_MODE\n Extended text mode=%.01f%% Extended string mode= %.01f%% td64 =%.01f%%\n", (float)gExtendedTextCnt/totalBlocks*100, (float)gExtendedStringCnt/totalBlocks*100, (float)gtd64Cnt/totalBlocks*100);
183+
#endif
176184

177185
fwrite(dst, totalCompressedBytes, 1, ofile);
178186
fclose(ofile);
179187
free(src);
180188
free(dst);
181189

190+
// **********************
182191
// decompress
183192
ifile = fopen(ofileName, "rb");
184193
strcpy(ofileName, argv[1]);
@@ -194,6 +203,7 @@ int main(int argc, char* argv[])
194203
fread(src, 1, len3, ifile);
195204
len2 = len; // output==input
196205
dst = (unsigned char*) malloc(len2);
206+
fclose(ifile);
197207

198208
minTimeSpent=600;
199209
loopNum = 0;
@@ -208,6 +218,8 @@ int main(int argc, char* argv[])
208218
{
209219
int32_t nRetBytes;
210220
nRetBytes = td512d(src+srcBlockOffset, dst+dstBlockOffset, &bytesProcessed);
221+
if (nRetBytes != 512)
222+
nRetBytes = nRetBytes;
211223
if (nRetBytes < 0)
212224
return nRetBytes;
213225
nBytesRemaining -= bytesProcessed;
@@ -227,7 +239,27 @@ int main(int argc, char* argv[])
227239
timeSpent = minTimeSpent;
228240
printf("decompression=%.00f bytes per second inbytes=%lu outbytes=%lu\n", (float)len/(float)timeSpent, len3, len);
229241
fwrite(dst, len, 1, ofile);
230-
fclose(ifile);
231242
fclose(ofile);
243+
free(src);
244+
// verify original input file with decompressed output
245+
ifile = fopen(argv[1], "rb");
246+
if (!ifile)
247+
{
248+
printf("td512 error: file not found to verify with decompressed output file: %s\n", argv[1]);
249+
return 9;
250+
}
251+
// allocate source buffer and read file
252+
src = (unsigned char*) malloc(len);
253+
fread(src, 1, len, ifile);
254+
fclose(ifile);
255+
if ((memcmp(src, dst, len)) != 0)
256+
{
257+
printf("td512 error: decompressed file differs from original input file\n");
258+
free(src);
259+
free(dst);
260+
return 11;
261+
}
262+
free(src);
263+
free(dst);
232264
return 0;
233265
}

0 commit comments

Comments
 (0)