Skip to content

Commit 5b83d5e

Browse files
committed
Merge branch 'release'
2 parents 6eeb3e3 + 978410b commit 5b83d5e

File tree

10 files changed

+60
-36
lines changed

10 files changed

+60
-36
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
LZ4/LICENSE$
88
\.md$
99
^docs$
10+
\.TMP$
1011
\.png$
1112
\.yml$
1213
dataset\.fst$

DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ Description: Multithreaded serialization of compressed data frames using the
55
'fst' format. The 'fst' format allows for random access of stored data and
66
compression with the LZ4 and ZSTD compressors created by Yann Collet. The ZSTD
77
compression library is owned by Facebook Inc.
8-
Version: 0.8.6
9-
Date: 2018-05-15
8+
Version: 0.8.8
9+
Date: 2018-06-06
1010
Authors@R: c(
1111
person("Mark", "Klik", email = "[email protected]", role = c("aut", "cre", "cph")),
1212
person("Yann", "Collet", role = c("ctb", "cph"),

NEWS.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11

2-
# fst 0.8.6
2+
# fst 0.8.8 (June 6, 2018)
3+
4+
Version 0.8.8 of the `fst` package is an intermediate release designed to fix valgrind warnings reported on CRAN builds (per request of CRAN maintainers). These warnings were due to `fst` writing uninitialized data buffers to file, which was done to maximize speed. To fix these warnings (and for safety), all memory blocks are now initialized to zero before being written to disk.
5+
6+
# fst 0.8.6 (May 15, 2018)
37

48
Version 0.8.6 of the `fst` package brings clearer printing of `fst_table` objects. It also includes optimizations for controlling the number of threads used by the package during reads and writes and after a fork has ended. The `LZ4` and `ZSTD` compression libraries are updated to their latest (and fastest) releases. UTF-8 encoded column names are now correctly stored in the `fst` format.
59

cran-comments.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
## Submission
33

4-
In this submission of fst, build errors reported in the CRAN check results for version 0.8.4 are addressed (thanks Kurt Hornik for the warning). These errors were due to failing unit tests and have been traced back to changes in the data.table code base (for the ITime type) between version 1.10.4-3 and 1.11.0. All issues have been resolved in this release.
4+
This submission of fst adresses valgrind warnings that are reported on the v0.8.6 package build on CRAN. These warnings are caused by writing uninitialized (meta-data) buffers to file (to increase write performance). With this submission, all allocated memory is initialized before writing.
55

66
## Test environments
77

@@ -25,17 +25,12 @@ The install size on different platforms varies significantly, from 1.42 MB (wind
2525

2626
## Valgrind
2727

28-
The following warnings are generated with valgrind when tests are run:
29-
30-
* Syscall param write(buf) points to uninitialised byte(s)
31-
* Conditional jump or move depends on uninitialised value(s)
32-
* Syscall param writev(vector[...]) points to uninitialised byte(s)
33-
34-
Like in previous fst versions, all warnings are generated in source file 'src/fstcore/interface/fststore.cpp' and are caused by writing uninitialised data to file. This is done intentionally (to increase performance) and the specific on-disk data is overwritten at a later point with initialised values.
28+
To reproduce the CRAN valgrind report, an instrumented (level 2) build of R was constructed on a fresh Ubuntu 16.04 image using config.site and configure parameters as specified in the memtests README file on CRAN. That build shows no valgrind warnings using the current submision.
3529

3630
## Downstream dependencies
3731

3832
I have run R CMD check on downstream dependencies and found no issues:
3933

4034
* heims: runs without warnings or errors.
4135
* rio: runs without warnings or errors.
36+
* grattan: runs without warnings or errors.

src/fstcore/blockstreamer/blockstreamer_v2.cpp

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,12 @@ void fdsStreamUncompressed_v2(ofstream& myfile, char* vec, unsigned long long ve
196196
}
197197

198198

199+
// header structure
200+
//
201+
// 4 | unsigned int | maximum compressed size of block
202+
// 4 | unsigned int | number of elements in block
203+
204+
199205
// Method for writing column data of any type to a stream.
200206
void fdsStreamcompressed_v2(ofstream& myfile, char* colVec, unsigned long long nrOfRows, int elementSize,
201207
StreamCompressor* streamCompressor, int blockSizeElems, std::string annotation, bool hasAnnotation)
@@ -225,9 +231,12 @@ void fdsStreamcompressed_v2(ofstream& myfile, char* colVec, unsigned long long n
225231

226232
// Blocks meta information
227233
// Aligned at 8 byte boundary
228-
std::unique_ptr<char[]> blockIndexP(new char[(2 + nrOfBlocks) * 8]);
234+
unsigned int block_index_size = (2 + nrOfBlocks) * 8;
235+
std::unique_ptr<char[]> blockIndexP(new char[block_index_size]);
229236
char* blockIndex = blockIndexP.get(); // 1 long file pointer with 2 highest bytes indicating algorithmID
230237

238+
memset(blockIndex, 0, block_index_size);
239+
231240
unsigned int* maxCompSize = reinterpret_cast<unsigned int*>(&blockIndex[0]); // maximum uncompressed block length
232241
unsigned int* blockSizeElements = reinterpret_cast<unsigned int*>(&blockIndex[4]); // number of elements per block
233242

@@ -259,6 +268,8 @@ void fdsStreamcompressed_v2(ofstream& myfile, char* colVec, unsigned long long n
259268
std::unique_ptr<char[]> threadBufferP(new char[nrOfThreads * MAX_COMPRESSBOUND * batchSize]);
260269
char* threadBuffer = threadBufferP.get();
261270

271+
// TODO: possibly memset to zero to avoid valgrind warnings
272+
262273
int nrOfBatches = nrOfBlocks / batchSize; // number of complete batches with complete blocks
263274

264275
if (nrOfBatches > 0)
@@ -352,7 +363,7 @@ void fdsStreamcompressed_v2(ofstream& myfile, char* colVec, unsigned long long n
352363
// Might be usefull in future implementation
353364
*maxCompSize = maxCompressionSize;
354365

355-
// Write last block position
366+
// Write last block position, note that nrOfBlocks is previously decreased by 1
356367
blockPosition = reinterpret_cast<unsigned long long*>(&blockIndex[COL_META_SIZE + 8 + nrOfBlocks * 8]);
357368
*blockPosition = blockIndexPos;
358369

src/fstcore/character/character_v6.cpp

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828

2929
#include <fstream>
3030
#include <memory>
31+
#include <cstring> // memset
3132

3233

3334
// #include <boost/unordered_map.hpp>
@@ -114,6 +115,9 @@ void fdsWriteCharVec_v6(ofstream& myfile, IStringWriter* stringWriter, int compr
114115
std::unique_ptr<char[]> metaP(new char[metaSize]);
115116
char* meta = metaP.get();
116117

118+
// clear memory for safety
119+
memset(meta, 0, metaSize);
120+
117121
// Set column header
118122
unsigned int* isCompressed = reinterpret_cast<unsigned int*>(meta);
119123
unsigned int* blockSizeChar = reinterpret_cast<unsigned int*>(&meta[4]);
@@ -151,6 +155,9 @@ void fdsWriteCharVec_v6(ofstream& myfile, IStringWriter* stringWriter, int compr
151155
std::unique_ptr<char[]> metaP(new char[metaSize]);
152156
char* meta = metaP.get();
153157

158+
// clear memory for safety
159+
memset(meta, 0, metaSize);
160+
154161
// Set column header
155162
unsigned int* isCompressed = reinterpret_cast<unsigned int*>(meta);
156163
unsigned int* blockSizeChar = reinterpret_cast<unsigned int*>(&meta[4]);
@@ -204,7 +211,7 @@ void fdsWriteCharVec_v6(ofstream& myfile, IStringWriter* stringWriter, int compr
204211

205212
stringWriter->SetBuffersFromVec(block * BLOCKSIZE_CHAR, (block + 1) * BLOCKSIZE_CHAR);
206213
unsigned long long totSize = storeCharBlockCompressed_v6(myfile, stringWriter, block * BLOCKSIZE_CHAR,
207-
(block + 1) * BLOCKSIZE_CHAR, streamCompressInt, streamCompressChar, *algoInt, *algoChar, *intBufSize, block);
214+
(block + 1) * BLOCKSIZE_CHAR, streamCompressInt, streamCompressChar, *algoInt, *algoChar, *intBufSize, block);
208215

209216
fullSize += totSize;
210217
*blockPos = fullSize;
@@ -218,7 +225,7 @@ void fdsWriteCharVec_v6(ofstream& myfile, IStringWriter* stringWriter, int compr
218225

219226
stringWriter->SetBuffersFromVec(nrOfBlocks * BLOCKSIZE_CHAR, vecLength);
220227
unsigned long long totSize = storeCharBlockCompressed_v6(myfile, stringWriter, nrOfBlocks * BLOCKSIZE_CHAR,
221-
vecLength, streamCompressInt, streamCompressChar, *algoInt, *algoChar, *intBufSize, nrOfBlocks);
228+
vecLength, streamCompressInt, streamCompressChar, *algoInt, *algoChar, *intBufSize, nrOfBlocks);
222229

223230
fullSize += totSize;
224231
*blockPos = fullSize;

src/fstcore/compression/compression.cpp

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -657,13 +657,14 @@ void LogicDecompr64(char* logicalVec, const unsigned long long* compBuf, int nrO
657657
}
658658

659659

660-
// Compression buffer should be at least 1 + (nrOfLogicals - 1) / 256 elements in length (factor 32)
660+
// Compression buffer should be at least 1 + (nrOfLogicals - 1) / 256 elements (long ints) in length (factor 32)
661661
void LogicCompr64(const char* logicalVec, unsigned long long* compress, int nrOfLogicals)
662662
{
663663
const unsigned long long* logicals = (const unsigned long long*) logicalVec;
664-
int nrOfLongs = nrOfLogicals / 32;
664+
int nrOfLongs = nrOfLogicals / 32; // number of full longs
665665

666666
// Define filters
667+
// TODO: define these as constants
667668
unsigned long long BIT = (1LL << 32) | 1LL;
668669
unsigned long long BIT0 = (BIT << 16) | (BIT << 15);
669670
unsigned long long BIT1 = (BIT << 17) | (BIT << 14);
@@ -710,15 +711,20 @@ void LogicCompr64(const char* logicalVec, unsigned long long* compress, int nrOf
710711

711712

712713
// Process remainder
713-
int remain = nrOfLogicals % 32;
714+
int remain = nrOfLogicals % 32; // nr of logicals remaining
714715
if (remain == 0) return;
715716

717+
unsigned long long remainLongs[16]; // at maximum nrOfRemainLongs equals 16
718+
int* remain_ints = reinterpret_cast<int*>(remainLongs);
719+
716720
// Compress the remainder in identical manner as the blocks here (for random access) !!!!!!
717721
logics = &logicals[16 * nrOfLongs];
718722

719-
int nrOfRemainLongs = 1 + (remain - 1) / 2; // per 2 logicals
720-
unsigned long long remainLongs[16]; // at maximum nrOfRemainLongs equals 16
723+
const int nrOfRemainLongs = 1 + (remain - 1) / 2; // per 2 logicals
724+
725+
// please valgrind: only use initialized bytes for calculations
721726
memcpy(remainLongs, logics, remain * sizeof(int));
727+
memset(&remain_ints[remain], 0, (32 - remain) * 4); // clear remaining ints
722728

723729
unsigned long long compRes = 0;
724730
for (int remainNr = 0; remainNr < nrOfRemainLongs; ++remainNr)
@@ -1030,8 +1036,8 @@ unsigned int ZSTD_INT_TO_SHORT_SHUF2_C(char* dst, unsigned int dstCapacity, cons
10301036

10311037
unsigned int ZSTD_INT_TO_SHORT_SHUF2_D(char* dst, unsigned int dstCapacity, const char* src, unsigned int compressedSize)
10321038
{
1033-
int nrOfLongs = 1 + (dstCapacity - 1) / 16; // srcSize is processed in blocks of 32 bytes
1034-
int nrOfDstInts = dstCapacity / 4;
1039+
unsigned int nrOfLongs = 1 + (dstCapacity - 1) / 16; // srcSize is processed in blocks of 32 bytes
1040+
unsigned int nrOfDstInts = dstCapacity / 4;
10351041

10361042
// Compress buffer
10371043
char buf[MAX_SIZE_COMPRESS_BLOCK_HALF];

src/fstcore/interface/fststore.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,10 @@ void FstStore::fstWrite(IFstTable &fstTable, const int compress) const
236236
// size of fst file header
237237
const unsigned long long metaDataSize = tableHeaderSize + keyIndexHeaderSize + chunksetHeaderSize + colNamesHeaderSize;
238238
char * metaDataWriteBlock = new char[metaDataSize]; // fst metadata
239+
240+
// clear memory for safety (avoids valgrind warnings)
241+
memset(metaDataWriteBlock, 0, metaDataSize);
242+
239243
std::unique_ptr<char[]> metaDataPtr = std::unique_ptr<char[]>(metaDataWriteBlock);
240244

241245

@@ -375,6 +379,10 @@ void FstStore::fstWrite(IFstTable &fstTable, const int compress) const
375379
// Size of chunkset index header plus data chunk header
376380
const unsigned long long chunkIndexSize = CHUNK_INDEX_SIZE + DATA_INDEX_SIZE + 8 * nrOfCols;
377381
char* chunkIndex = new char[chunkIndexSize];
382+
383+
// clear memory for safety
384+
memset(chunkIndex, 0, chunkIndexSize);
385+
378386
std::unique_ptr<char[]> chunkIndexPtr = std::unique_ptr<char[]>(chunkIndex);
379387

380388
// Chunkset data index [node D, leaf of C] [size: 96]

src/fstcore/logical/logical_v10.cpp

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -36,19 +36,7 @@ using namespace std;
3636
void fdsWriteLogicalVec_v10(ofstream &myfile, int* boolVector, unsigned long long nrOfLogicals, int compression,
3737
std::string annotation, bool hasAnnotation)
3838
{
39-
// TODO: create multi-threaded code for a fixed ratio compressor
40-
41-
//if (compression == 0)
42-
//{
43-
// FixedRatioCompressor* compressor = new FixedRatioCompressor(CompAlgo::LOGIC64); // compression level not relevant here
44-
// fdsStreamUncompressed_v2(myfile, (char*) boolVector, nrOfLogicals, 4, BLOCKSIZE_LOGICAL, compressor, annotation, hasAnnotation);
45-
46-
// delete compressor;
47-
48-
// return;
49-
//}
50-
51-
int blockSize = 4 * BLOCKSIZE_LOGICAL; // block size in bytes
39+
const int blockSize = 4 * BLOCKSIZE_LOGICAL; // block size in bytes
5240

5341
if (compression <= 50) // compress 1 - 50
5442
{
@@ -57,7 +45,7 @@ void fdsWriteLogicalVec_v10(ofstream &myfile, int* boolVector, unsigned long lon
5745
StreamCompressor* streamCompressor = new StreamCompositeCompressor(defaultCompress, compress2, 2 * compression);
5846
streamCompressor->CompressBufferSize(blockSize);
5947

60-
fdsStreamcompressed_v2(myfile, (char*) boolVector, nrOfLogicals, 4, streamCompressor, BLOCKSIZE_LOGICAL, annotation, hasAnnotation);
48+
fdsStreamcompressed_v2(myfile, reinterpret_cast<char*>(boolVector), nrOfLogicals, 4, streamCompressor, BLOCKSIZE_LOGICAL, annotation, hasAnnotation);
6149

6250
delete defaultCompress;
6351
delete compress2;

tests/testthat/test_lintr.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ library(lintr)
88
# * RcppExports not excluded
99

1010
test_that("Package Style", {
11+
12+
# lintr throws a lot of valgrind warnings, so skip on CRAN for now
13+
skip_on_cran()
14+
1115
lints <- with_defaults(line_length_linter = line_length_linter(120))
1216
lints <- lints[!(names(lints) %in%
1317
c("object_usage_linter", "camel_case_linter", "commas_linter", "multiple_dots_linter"))]

0 commit comments

Comments
 (0)