Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
78568e5
Begin implementation of ByteArrayBytecodeGenerator
austnwil Oct 17, 2025
b921269
Implement readBigIntegerReference
austnwil Oct 17, 2025
c56816b
Implement readShortTimestampReference
austnwil Oct 20, 2025
b5b0c29
Rename UTF8 decoder
austnwil Oct 20, 2025
d60554c
Add tests; fix bug in readBytesReference
austnwil Oct 22, 2025
115c3f0
Add comments
austnwil Oct 22, 2025
7de8637
Add test for bytecode generator; refactor opcode handler test cases
austnwil Oct 23, 2025
6c6e377
Fix broken refill logic; adjust comments
austnwil Oct 23, 2025
b1ae1f4
Minor fixes
austnwil Oct 23, 2025
bdda6f3
Remove unnecessary test cases causing OOM errors
austnwil Oct 23, 2025
bcdf133
Apply suggestions from code review
austnwil Oct 27, 2025
5e1983a
Implement more PR suggestions
austnwil Oct 27, 2025
b95ea51
PR suggestion: Revert refactoring of opcode handler test cases
austnwil Oct 29, 2025
2cac7a2
Don't emit I_REFILL in generator
austnwil Oct 29, 2025
39294f3
Add generator tests
austnwil Oct 29, 2025
a6960eb
Replace UTF-8 test string with known source
austnwil Oct 30, 2025
6f571dc
Apply suggestion from @popematt
austnwil Oct 30, 2025
50045b7
Apply suggestion from @popematt
austnwil Oct 30, 2025
ad746ca
Apply suggestion from @popematt
austnwil Oct 30, 2025
785e1dd
Fix broken bytecode generator test
austnwil Oct 30, 2025
0ad98d6
Add various test cases to timestamp decoder
austnwil Oct 30, 2025
558a2b6
Fix bug in day-precision timestamp decoder when day has high bit set
austnwil Oct 30, 2025
ec2a964
Merge branch 'ion11' into austnwil/ion11-bytearray-bytecode-generator
austnwil Oct 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/main/java/com/amazon/ion/bytecode/BytecodeEmitter.kt
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,14 @@ internal object BytecodeEmitter {
fun emitShortTimestampReference(destination: BytecodeBuffer, precisionAndOffsetMode: Int, dataPosition: Int) {
destination.add2(Instructions.I_SHORT_TIMESTAMP_REF.packInstructionData(precisionAndOffsetMode), dataPosition)
}

@JvmStatic
fun emitRefill(destination: BytecodeBuffer) {
destination.add(Instructions.I_REFILL)
}

@JvmStatic
fun emitEndOfInput(destination: BytecodeBuffer) {
destination.add(Instructions.I_END_OF_INPUT)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker.

We don't need to have helper functions for everything. In this case, it's worth asking what is this actually helping with? Some of the other helper functions in the class handle things like spreading the bits for a long value or converting from float to the int bits.

If you compare these two, I think the version with BytecodeHelper doesn't add value, and maybe even obscures what's going on.

destination.add(Instructions.I_REFILL)

BytecodeHelper.emitRefill(destination)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

}
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,99 @@
// SPDX-License-Identifier: Apache-2.0
package com.amazon.ion.bytecode.bin11

import com.amazon.ion.Decimal
import com.amazon.ion.IonException
import com.amazon.ion.Timestamp
import com.amazon.ion.bytecode.BytecodeEmitter
import com.amazon.ion.bytecode.BytecodeGenerator
import com.amazon.ion.bytecode.bin11.bytearray.OpcodeHandlerTable
import com.amazon.ion.bytecode.bin11.bytearray.PrimitiveDecoder.readFixedIntAsBigInteger
import com.amazon.ion.bytecode.bin11.bytearray.ShortTimestampDecoder
import com.amazon.ion.bytecode.util.AppendableConstantPoolView
import com.amazon.ion.bytecode.util.ByteSlice
import com.amazon.ion.bytecode.util.BytecodeBuffer
import com.amazon.ion.bytecode.util.unsignedToInt
import com.amazon.ion.impl.bin.utf8.Utf8StringDecoder
import com.amazon.ion.impl.bin.utf8.Utf8StringDecoderPool
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings
import java.math.BigInteger
import java.nio.ByteBuffer

@SuppressFBWarnings("EI_EXPOSE_REP2", justification = "constructor does not make a defensive copy of source as a performance optimization")
internal class ByteArrayBytecodeGenerator11
@SuppressFBWarnings("URF_UNREAD_FIELD", justification = "field will be read once this class is implemented")
constructor(
internal class ByteArrayBytecodeGenerator11(
private val source: ByteArray,
private var i: Int,
) {
// TODO: This should implement BytecodeGenerator
private var currentPosition: Int,
) : BytecodeGenerator {
private val utf8Decoder: Utf8StringDecoder = Utf8StringDecoderPool.getInstance().orCreate

override fun refill(
destination: BytecodeBuffer,
constantPool: AppendableConstantPoolView,
macroSrc: IntArray,
macroIndices: IntArray,
symTab: Array<String?>
) {
var opcode = 0
Comment thread
austnwil marked this conversation as resolved.
Outdated
while (currentPosition < source.size && !isSystemValue(opcode)) {
opcode = source[currentPosition++].unsignedToInt()
val handler = OpcodeHandlerTable.handler(opcode)
currentPosition += handler.convertOpcodeToBytecode(
opcode,
source,
currentPosition,
destination,
constantPool,
macroSrc,
macroIndices,
symTab
)
}

if (currentPosition < source.size) {
BytecodeEmitter.emitRefill(destination)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to work on writing tests for a different BytecodeGenerator implementation, and I found that it added a lot of noise to have the REFILL instruction added by the BytecodeGenerator. It's also something that's the same regardless of the implementation, and it's a concern of the reader rather than being part of the data, so I think it makes more sense for the BytecodeIonReader to handle adding in the REFILL instruction.

} else {
BytecodeEmitter.emitEndOfInput(destination)
}
}

override fun readBigIntegerReference(position: Int, length: Int): BigInteger {
return readFixedIntAsBigInteger(source, position, length)
}

override fun readDecimalReference(position: Int, length: Int): Decimal {
TODO("Not yet implemented")
}

override fun readShortTimestampReference(position: Int, opcode: Int): Timestamp {
return ShortTimestampDecoder.readTimestamp(source, position, opcode)
}

override fun readTimestampReference(position: Int, length: Int): Timestamp {
TODO("Not yet implemented")
}

override fun readTextReference(position: Int, length: Int): String {
val buffer = ByteBuffer.wrap(source, position, length)
return utf8Decoder.decode(buffer, length)
}

override fun readBytesReference(position: Int, length: Int): ByteSlice {
return ByteSlice(source, position, position + length)
}

override fun ionMinorVersion(): Int {
return 1
}
Comment thread
austnwil marked this conversation as resolved.
Outdated

override fun getGeneratorForMinorVersion(minorVersion: Int): BytecodeGenerator {
return when (minorVersion) {
1 -> ByteArrayBytecodeGenerator11(source, currentPosition)
// TODO: update with ByteArrayBytecodeGenerator10 once it implements BytecodeGenerator
else -> throw IonException("Minor version $minorVersion not yet implemented for ByteArray-backed data sources.")
}
}

private fun isSystemValue(opcode: Int): Boolean {
return opcode in 0xE0..0xE8
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,17 @@ internal object PrimitiveDecoder {
}
}

@JvmStatic
fun readFixedIntAsBigInteger(source: ByteArray, start: Int, length: Int): BigInteger {
// TODO: ion-java#1114
if (source.size < start + length) throw IonException("Incomplete data: start=$start, length=$length, limit=${source.size}")
val bytes = ByteArray(length)
for (i in 0 until length) {
bytes[i] = source[start + length - i - 1]
}
return BigInteger(bytes)
}

@JvmStatic
fun readFixedUInt16(source: ByteArray, position: Int): UShort {
// TODO: ion-java#1114
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
package com.amazon.ion.bytecode.bin11.bytearray

import com.amazon.ion.Timestamp
import com.amazon.ion.bytecode.bin11.bytearray.PrimitiveDecoder.readFixedInt16
import com.amazon.ion.bytecode.bin11.bytearray.PrimitiveDecoder.readFixedInt32
import com.amazon.ion.bytecode.bin11.bytearray.PrimitiveDecoder.readFixedInt8AsShort
import com.amazon.ion.bytecode.bin11.bytearray.PrimitiveDecoder.readFixedIntAsLong
import java.math.BigDecimal

/**
* Helper class for decoding the various short timestamp encoding variants from a [ByteArray].
Comment thread
austnwil marked this conversation as resolved.
Outdated
*/
internal object ShortTimestampDecoder {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just TimestampDecoder? There's not really a need to have separate short and long timestamp decoder classes.

private const val MASK_4 = 0b1111
private const val MASK_5 = 0b11111
private const val MASK_6 = 0b111111
private const val MASK_7 = 0b1111111
private const val MASK_4L = 0b1111L
private const val MASK_5L = 0b11111L
private const val MASK_6L = 0b111111L
private const val MASK_7L = 0b1111111L
private const val MASK_10L = 0b1111111111L
private const val MASK_20L = 0b11111111111111111111L
private const val MASK_30L = 0b111111111111111111111111111111L
private const val MASK_UTC_OR_UNKNOWN_BIT = 0b1000_00000000_00000000_00000000
private const val MASK_UTC_OR_UNKNOWN_BITL = 0b1000_00000000_00000000_00000000L

private val opcodeToDecoderFunctionTable = arrayOf(
ShortTimestampDecoder::readTimestampToYear,
ShortTimestampDecoder::readTimestampToMonth,
ShortTimestampDecoder::readTimestampToDay,
ShortTimestampDecoder::readTimestampToMinuteUTCOrUnknown,
ShortTimestampDecoder::readTimestampToSecondUTCOrUnknown,
ShortTimestampDecoder::readTimestampToMillisecondUTCOrUnknown,
ShortTimestampDecoder::readTimestampToMicrosecondUTCOrUnknown,
ShortTimestampDecoder::readTimestampToNanosecondUTCOrUnknown,
ShortTimestampDecoder::readTimestampToMinuteWithOffset,
ShortTimestampDecoder::readTimestampToSecondWithOffset,
ShortTimestampDecoder::readTimestampToMillisecondWithOffset,
ShortTimestampDecoder::readTimestampToMicrosecondWithOffset,
ShortTimestampDecoder::readTimestampToNanosecondWithOffset,
)

fun readTimestampToYear(source: ByteArray, position: Int): Timestamp {
val year = readFixedInt8AsShort(source, position).toInt()
return Timestamp.forYear(year + 1970)
}

fun readTimestampToMonth(source: ByteArray, position: Int): Timestamp {
val yearAndMonth = readFixedInt16(source, position).toInt()
val year = yearAndMonth.and(MASK_7)
val month = yearAndMonth.shr(7)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, com.amazon.ion.impl.bin.Ion_1_1_Constants has constants for all of these offsets so that we don't need to be using "magic" numbers here.


return Timestamp.forMonth(year + 1970, month)
}

fun readTimestampToDay(source: ByteArray, position: Int): Timestamp {
val yearMonthAndDay = readFixedInt16(source, position).toInt()
val year = yearMonthAndDay.and(MASK_7)
val month = yearMonthAndDay.shr(7).and(MASK_4)
val day = yearMonthAndDay.shr(11)

return Timestamp.forDay(year + 1970, month, day)
}

fun readTimestampToMinuteUTCOrUnknown(source: ByteArray, position: Int): Timestamp {
val data = readFixedInt32(source, position)
val year = data.and(MASK_7)
val month = data.shr(7).and(MASK_4)
val day = data.shr(11).and(MASK_5)
val hour = data.shr(16).and(MASK_5)
val minute = data.shr(21).and(MASK_6)
val isUTC = data.and(MASK_UTC_OR_UNKNOWN_BIT) != 0

return Timestamp.forMinute(year + 1970, month, day, hour, minute, if (isUTC) 0 else null)
}

fun readTimestampToSecondUTCOrUnknown(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 5)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val second = data.shr(28).and(MASK_6L).toInt()
val isUTC = data.and(MASK_UTC_OR_UNKNOWN_BITL) != 0L

return Timestamp.forSecond(year + 1970, month, day, hour, minute, second, if (isUTC) 0 else null)
}

fun readTimestampToMillisecondUTCOrUnknown(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 6)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val second = data.shr(28).and(MASK_6L)
val fractionalSecond = data.shr(34).and(MASK_10L)
val isUTC = data.and(MASK_UTC_OR_UNKNOWN_BITL) != 0L

val secondBigDecimal = BigDecimal.valueOf(second)
val fractionalSecondBigDecimal = BigDecimal.valueOf(fractionalSecond, 3)
return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), if (isUTC) 0 else null)
}

fun readTimestampToMicrosecondUTCOrUnknown(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 7)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val second = data.shr(28).and(MASK_6L)
val fractionalSecond = data.shr(34).and(MASK_20L)
val isUTC = data.and(MASK_UTC_OR_UNKNOWN_BITL) != 0L

val secondBigDecimal = BigDecimal.valueOf(second)
val fractionalSecondBigDecimal = BigDecimal.valueOf(fractionalSecond, 6)
return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), if (isUTC) 0 else null)
}

fun readTimestampToNanosecondUTCOrUnknown(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 8)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val second = data.shr(28).and(MASK_6L)
val fractionalSecond = data.ushr(34).and(MASK_30L)
val isUTC = data.and(MASK_UTC_OR_UNKNOWN_BITL) != 0L

val secondBigDecimal = BigDecimal.valueOf(second)
val fractionalSecondBigDecimal = BigDecimal.valueOf(fractionalSecond, 9)
return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), if (isUTC) 0 else null)
}

fun readTimestampToMinuteWithOffset(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 5)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val offset = data.shr(27).and(MASK_7L).toInt()

return Timestamp.forMinute(year + 1970, month, day, hour, minute, (offset - 56) * 15)
}

fun readTimestampToSecondWithOffset(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 5)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val offset = data.shr(27).and(MASK_7L).toInt()
val second = data.shr(34).and(MASK_6L).toInt()

return Timestamp.forSecond(year + 1970, month, day, hour, minute, second, (offset - 56) * 15)
}

fun readTimestampToMillisecondWithOffset(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 7)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val offset = data.shr(27).and(MASK_7L).toInt()
val second = data.shr(34).and(MASK_6L)
val fractionalSecond = data.shr(40).and(MASK_10L)

val secondBigDecimal = BigDecimal.valueOf(second)
val fractionalSecondBigDecimal = BigDecimal.valueOf(fractionalSecond, 3)
return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), (offset - 56) * 15)
}

fun readTimestampToMicrosecondWithOffset(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 8)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val offset = data.shr(27).and(MASK_7L).toInt()
val second = data.shr(34).and(MASK_6L)
val fractionalSecond = data.shr(40).and(MASK_20L)

val secondBigDecimal = BigDecimal.valueOf(second)
val fractionalSecondBigDecimal = BigDecimal.valueOf(fractionalSecond, 6)
return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), (offset - 56) * 15)
}

fun readTimestampToNanosecondWithOffset(source: ByteArray, position: Int): Timestamp {
val data = readFixedIntAsLong(source, position, 8)
val highFractionalSecondByte = readFixedInt8AsShort(source, position + 8).toLong().and(MASK_6L)
val year = data.and(MASK_7L).toInt()
val month = data.shr(7).and(MASK_4L).toInt()
val day = data.shr(11).and(MASK_5L).toInt()
val hour = data.shr(16).and(MASK_5L).toInt()
val minute = data.shr(21).and(MASK_6L).toInt()
val offset = data.shr(27).and(MASK_7L).toInt()
val second = data.shr(34).and(MASK_6L)
val fractionalSecond = data.ushr(40).or(highFractionalSecondByte.shl(24))

val secondBigDecimal = BigDecimal.valueOf(second)
val fractionalSecondBigDecimal = BigDecimal.valueOf(fractionalSecond, 9)
return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), (offset - 56) * 15)
}

fun readTimestamp(source: ByteArray, position: Int, precisionAndOffsetMode: Int): Timestamp {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is precisionAndOffsetMode the opcode?

Copy link
Copy Markdown
Contributor Author

@austnwil austnwil Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just the low nibble. Looks like the bytecode reference does call for the entire opcode, so I made that change

// TODO: calling function references like this might be slower than just using a conditional or other solutions.
// Might be worth looking into.
val decoder = opcodeToDecoderFunctionTable[precisionAndOffsetMode]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it will be. A when on the opcode is probably going to be faster because it will be compiled as a tableswitch instruction in the java bytecode instead of having to load the array, and then load a value from the array (both of which require fetching objects from the heap), and then the JVM potentially still has to perform a vtable lookup to find the correct method implementation for the interface.

The extra indirection here also limits the JVM's ability to perform method inlining. For example, if an application only reads data that uses 0x85 opcode, the JVM could inline that particular method into this method, eliminating the overhead of the function call. However, when you're using method references like this, hinders that ability.

FYI, I think I wrote a note about that here: https://github.com/popematt/ion-java/blob/6f5274ecc5e5812f08884a6d1aa1c4d7546861fe/src/main/java/com/amazon/ion/v8/TimestampHelper.kt#L34-L38

So why are use using an array of function references for the opcode lookup? The main reasons are that (1) the opcode handler table is already going to be hard for the JVM to predict anyway because there's going to be lots of different opcodes in normal Ion data, (2) the when would make the method very long and hard to read, and (3) the compiled method is going to be quite large and unable to be inlined at its own call sites.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great explanation!

return decoder(source, position)
}
}
Loading
Loading