Skip to content

Commit 421c51f

Browse files
authored
support reading extended timestamp information (#160)
1 parent b15f865 commit 421c51f

File tree

5 files changed

+138
-35
lines changed

5 files changed

+138
-35
lines changed

README.md

Lines changed: 44 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -147,26 +147,14 @@ See `open()` for the meaning of the options and callback.
147147

148148
### dosDateTimeToDate(date, time)
149149

150-
Converts MS-DOS `date` and `time` data into a JavaScript `Date` object.
151-
Each parameter is a `Number` treated as an unsigned 16-bit integer.
152-
Note that this format does not support timezones.
153-
The returned `Date` object will be constructed using the local timezone.
150+
*Deprecated*. Since yauzl 3.2.0, it is highly recommended to call [`entry.getLastModDate()`](#getlastmoddateoptions)
151+
instead of this function due to enhanced support for reading third-party extra fields.
152+
If you ever have a use case for calling this function directly please
153+
[open an issue against yauzl](https://github.com/thejoshwolfe/yauzl/issues/new)
154+
requesting that this function be properly supported again.
154155

155-
In order to interpret the parameters in UTC time instead of local time, you can convert with the following snippet:
156-
157-
```js
158-
var timestampInterpretedAsLocal = yauzl.dosDateTimeToDate(date, time); // or entry.getLastModDate()
159-
var timestampInterpretedAsUTCInstead = new Date(
160-
timestampInterpretedAsLocal.getTime() -
161-
timestampInterpretedAsLocal.getTimezoneOffset() * 60 * 1000
162-
);
163-
```
164-
165-
Note that there is an ECMAScript proposal to add better timezone support to JavaScript called the `Temporal` API.
166-
Last I checked, it is at stage 3. https://github.com/tc39/proposal-temporal
167-
168-
Once that new API is available and stable, better timezone handling should be possible here somehow.
169-
Feel free to open a feature request against this library when the time comes.
156+
This function only remains exported in order to maintain compatibility with older version of yauzl.
157+
It will be removed in yauzl 4.0.0 unless someone asks for it to remain supported.
170158

171159
### getFileNameLowLevel(generalPurposeBitFlag, fileNameBuffer, extraFields, strictFileNames)
172160

@@ -437,8 +425,8 @@ These fields are of type `Number`:
437425
* `versionNeededToExtract`
438426
* `generalPurposeBitFlag`
439427
* `compressionMethod`
440-
* `lastModFileTime` (MS-DOS format, see [`getLastModDate()`](#getlastmoddate))
441-
* `lastModFileDate` (MS-DOS format, see [`getLastModDate()`](#getlastmoddate))
428+
* `lastModFileTime` (MS-DOS format, see [`getLastModDate()`](#getlastmoddateoptions))
429+
* `lastModFileDate` (MS-DOS format, see [`getLastModDate()`](#getlastmoddateoptions))
442430
* `crc32`
443431
* `compressedSize`
444432
* `uncompressedSize`
@@ -507,14 +495,46 @@ Prior to yauzl version 2.7.0, this field was erroneously documented as `comment`
507495
For compatibility with any code that uses the field name `comment`,
508496
yauzl creates an alias field named `comment` which is identical to `fileComment`.
509497

510-
#### getLastModDate()
498+
#### getLastModDate([options])
499+
500+
Returns the modification time of the file as a JavaScript `Date` object.
501+
The timezone situation is a mess; read on to learn more.
502+
503+
Due to the zip file specification having lackluster support for specifying timestamps natively,
504+
there are several third-party extensions that add better support.
505+
yauzl supports these encodings:
506+
507+
1. InfoZIP "universal timestamp" extended field (`0x5455` aka `"UT"`): signed 32-bit seconds since `1970-01-01 00:00:00Z`, which supports the years 1901-2038 (partially inclusive) with 1-second precision. The value is timezone agnostic, i.e. always UTC.
508+
2. NTFS extended field (`0x000a`): 64-bit signed 100-nanoseconds since `1601-01-01 00:00:00Z`, which supports the approximate years 20,000BCE-20,000CE with precision rounded to 1-millisecond (due to the JavaScript `Date` type). The value is timezone agnostic, i.e. always UTC.
509+
3. DOS `lastModFileDate` and `lastModFileTime`: supports the years 1980-2108 (inclusive) with 2-second precision. Timezone is interpreted either as the local timezone or UTC depending on the `timezone` option documented below.
511510

512-
Effectively implemented as the following. See [`dosDateTimeToDate()`](#dosdatetimetodatedate-time).
511+
If both the InfoZIP "universal timestamp" and NTFS extended fields are found, yauzl uses one of them, but which one is unspecified.
512+
If neither are found, yauzl falls back to the built-in DOS `lastModFileDate` and `lastModFileTime`.
513+
Every possible bit pattern of every encoding can be represented by a JavaScript `Date` object,
514+
meaning this function cannot fail (barring parameter validation), and will never return an `Invalid Date` object.
515+
516+
`options` may be omitted or `null`, and has the following defaults:
513517

514518
```js
515-
return dosDateTimeToDate(this.lastModFileDate, this.lastModFileTime);
519+
{
520+
timezone: "local", // or "UTC"
521+
forceDosFormat: false,
522+
}
516523
```
517524

525+
Set `forceDosFormat` to `true` (and do not set `timezone`) to enable pre-yauzl 3.2.0 behavior
526+
where the InfoZIP "universal timestamp" and NTFS extended fields are ignored.
527+
528+
The `timezone` option is only used in the DOS fallback.
529+
If `timezone` is omitted, `null` or `"local"`, the `lastModFileDate` and `lastModFileTime` are interpreted in the system's current timezone (using `new Date(year, ...)`).
530+
If `timezone` is `"UTC"`, the interpretation is in UTC+00:00 (using `new Date(Date.UTC(year, ...))`).
531+
532+
The JavaScript `Date` object, has several inherent limitations surrounding timezones.
533+
There is an ECMAScript proposal to add better timezone support to JavaScript called the `Temporal` API.
534+
Last I checked, it was at stage 3. https://github.com/tc39/proposal-temporal
535+
Once that new API is available and stable, better timezone handling should be possible here somehow.
536+
If you notice that the new API has become widely available, please open a feature request against this library to add support for it.
537+
518538
#### isEncrypted()
519539

520540
Returns is this entry encrypted with "Traditional Encryption".

index.js

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -589,8 +589,69 @@ ZipFile.prototype.readLocalFileHeader = function(entry, options, callback) {
589589

590590
function Entry() {
591591
}
592-
Entry.prototype.getLastModDate = function() {
593-
return dosDateTimeToDate(this.lastModFileDate, this.lastModFileTime);
592+
Entry.prototype.getLastModDate = function(options) {
593+
if (options == null) options = {};
594+
595+
if (!options.forceDosFormat) {
596+
// Check extended fields.
597+
for (var i = 0; i < this.extraFields.length; i++) {
598+
var extraField = this.extraFields[i];
599+
if (extraField.id === 0x5455) {
600+
// InfoZIP "universal timestamp" extended field (`0x5455` aka `"UT"`).
601+
// See the InfoZIP source code unix/unix.c:set_extra_field() and zipfile.c:ef_scan_ut_time().
602+
var data = extraField.data;
603+
if (data.length < 5) continue; // Too short.
604+
// The flags define which of the three fields are present: mtime, atime, ctime.
605+
// We only care about mtime.
606+
// Also, ctime is never included in practice.
607+
// And also, atime is only included in the local file header for some reason
608+
// despite the flags lying about its inclusion in the central header.
609+
var flags = data[0];
610+
var HAS_MTIME = 1;
611+
if (!(flags & HAS_MTIME)) continue; // This will realistically never happen.
612+
// Although the positions of all of the fields shift around depending on the presence of other fields,
613+
// mtime is always first if present, and that's the only one we care about.
614+
var posixTimestamp = data.readInt32LE(1);
615+
return new Date(posixTimestamp * 1000);
616+
} else if (extraField.id === 0x000a) {
617+
var data = extraField.data;
618+
// 4 bytes reserved
619+
var cursor = 4;
620+
while (cursor < data.length + 4) {
621+
// 2 bytes Tag
622+
var tag = data.readUInt16LE(cursor);
623+
cursor += 2;
624+
// 2 bytes Size
625+
var size = data.readUInt16LE(cursor);
626+
cursor += 2;
627+
if (tag !== 1) {
628+
// Wrong tag. This will realistically never happen.
629+
cursor += size;
630+
continue;
631+
}
632+
// Tag1 is actually the only defined Tag.
633+
if (size < 8 || cursor + size > data.length) break; // Invalid. Ignore.
634+
// 8 bytes Mtime
635+
var hundredNanoSecondsSince1601 = 4294967296 * data.readInt32LE(cursor + 4) + data.readUInt32LE(cursor)
636+
// Convert from NTFS to POSIX milliseconds.
637+
// The big number below is the milliseconds between year 1601 and year 1970
638+
// (i.e. the negative POSIX timestamp of 1601-01-01 00:00:00Z)
639+
var millisecondsSince1970 = hundredNanoSecondsSince1601 / 10000 - 11644473600000;
640+
// Note on numeric precision: JavaScript Number objects lose precision above Number.MAX_SAFE_INTEGER,
641+
// and NTFS timestamps are typically much bigger than that limit.
642+
// (MAX_SAFE_INTEGER would represent 1629-07-17T23:58:45.475Z.)
643+
// However, we're losing precision in the conversion from 100nanosecond units to millisecond units anyway,
644+
// and the time at which we also lose 1-millisecond precision is just past the JavaScript Date limit (by design).
645+
// Up through the year 2057, this conversion only drops 4 bits of precision,
646+
// which is well under the 13-14 bits ratio between the milliseconds and 100nanoseconds.
647+
return new Date(millisecondsSince1970);
648+
}
649+
}
650+
}
651+
}
652+
653+
// Fallback to non-extended encoding.
654+
return dosDateTimeToDate(this.lastModFileDate, this.lastModFileTime, options.timezone);
594655
};
595656
Entry.prototype.isEncrypted = function() {
596657
return (this.generalPurposeBitFlag & 0x1) !== 0;
@@ -602,7 +663,7 @@ Entry.prototype.isCompressed = function() {
602663
function LocalFileHeader() {
603664
}
604665

605-
function dosDateTimeToDate(date, time) {
666+
function dosDateTimeToDate(date, time, timezone) {
606667
var day = date & 0x1f; // 1-31
607668
var month = (date >> 5 & 0xf) - 1; // 1-12, 0-11
608669
var year = (date >> 9 & 0x7f) + 1980; // 0-128, 1980-2108
@@ -612,7 +673,13 @@ function dosDateTimeToDate(date, time) {
612673
var minute = time >> 5 & 0x3f; // 0-59
613674
var hour = time >> 11 & 0x1f; // 0-23
614675

615-
return new Date(year, month, day, hour, minute, second, millisecond);
676+
if (timezone == null || timezone === "local") {
677+
return new Date(year, month, day, hour, minute, second, millisecond);
678+
} else if (timezone === "UTC") {
679+
return new Date(Date.UTC(year, month, day, hour, minute, second, millisecond));
680+
} else {
681+
throw new Error("unrecognized options.timezone: " + options.timezone);
682+
}
616683
}
617684

618685
function getFileNameLowLevel(generalPurposeBitFlag, fileNameBuffer, extraFields, strictFileNames) {
@@ -843,9 +910,11 @@ function decodeBuffer(buffer, isUtf8) {
843910
}
844911

845912
function readUInt64LE(buffer, offset) {
846-
// there is no native function for this, because we can't actually store 64-bit integers precisely.
913+
// There is no native function for this, because we can't actually store 64-bit integers precisely.
847914
// after 53 bits, JavaScript's Number type (IEEE 754 double) can't store individual integers anymore.
848915
// but since 53 bits is a whole lot more than 32 bits, we do our best anyway.
916+
// As of 2020, Node has added support for BigInt, which obviates this whole function,
917+
// but yauzl hasn't been updated to depend on BigInt (yet?).
849918
var lower32 = buffer.readUInt32LE(offset);
850919
var upper32 = buffer.readUInt32LE(offset + 4);
851920
// we can't use bitshifting here, because JavaScript bitshifting only works on 32-bit integers.

test/success/unix-epoch.zip

178 Bytes
Binary file not shown.

test/success/unix-epoch/unix-epoch.txt

Whitespace-only changes.

test/test.js

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ var child_process = require("child_process");
99
var Readable = require("stream").Readable;
1010
var Writable = require("stream").Writable;
1111

12-
// this is the date i made the example zip files and their content files,
13-
// so this timestamp will be earlier than all the ones stored in these test zip files
14-
// (and probably all future zip files).
15-
// no timezone awareness, because that's how MS-DOS rolls.
12+
// This is a month before I made the example zip files and their content files,
13+
// so this timestamp will be earlier than all the ones stored in these test zip files unless otherwise noted.
1614
var earliestTimestamp = new Date(2014, 7, 18, 0, 0, 0, 0);
1715

16+
if (Date.now() / 1000 >= 2147483648) throw new Error("The year is 2038. The Epochalypse is uppon us. Signed 32-bit POSIX timestamps have collapsed. TODO: fix.");
17+
1818
var pend = new Pend();
1919
// 1 thing at a time for better determinism/reproducibility
2020
pend.max = 1;
@@ -98,8 +98,22 @@ listZipFiles([path.join(__dirname, "success"), path.join(__dirname, "wrong-entry
9898
if (fileComment !== "") throw new Error(testId + "expected empty fileComment");
9999
var messagePrefix = testId + fileName + ": ";
100100
var timestamp = entry.getLastModDate();
101-
if (timestamp < earliestTimestamp) throw new Error(messagePrefix + "timestamp too early: " + timestamp);
102-
if (timestamp > new Date()) throw new Error(messagePrefix + "timestamp in the future: " + timestamp);
101+
if (fileName === "unix-epoch.txt") {
102+
if (timestamp.getTime() !== 0) throw new Error(messagePrefix + "expected timestamp to be 0. found: " + timestamp);
103+
var dosTimestamp = entry.getLastModDate({forceDosFormat:true});
104+
if (dosTimestamp.getTime() === 0) throw new Error(messagePrefix + "DOS timestamp can't encode this time: " + dosTimestamp);
105+
var dosTimestampUTC = entry.getLastModDate({forceDosFormat:true, timezone: "UTC"});
106+
// This test is only meaningful when the system's local UTC offset is not 0.
107+
if (dosTimestamp - dosTimestampUTC !== dosTimestamp.getTimezoneOffset() * 60 * 1000) {
108+
throw new Error(messagePrefix +
109+
"expected UTC timezone to be different by the current timezone offset. " +
110+
"local: " + dosTimestamp + ", UTC: " + dosTimestampUTC
111+
);
112+
}
113+
} else {
114+
if (timestamp < earliestTimestamp) throw new Error(messagePrefix + "timestamp too early: " + timestamp);
115+
if (timestamp > new Date()) throw new Error(messagePrefix + "timestamp in the future: " + timestamp);
116+
}
103117

104118
var fileNameKey = fileName.replace(/\/$/, "");
105119
var expectedContents = expectedArchiveContents[fileNameKey];

0 commit comments

Comments
 (0)