Package htsjdk.samtools.cram.build
Class ContainerFactory
java.lang.Object
htsjdk.samtools.cram.build.ContainerFactory
Aggregates SAMRecord objects into one or more
Containers, composed of one or more Slices.
based on a set of rules implemented by this class in combination with the parameter values provided via a
CRAMEncodingStrategy object.
The general call pattern is to pass records in one at a time, and process Containers as they are returned:
long containerOffset = initialOffset; // after writing header, etc
ContainerFactory containerFactory = new ContainerFactory(...)
// retrieve input records and obtain/emit Containers as they are produced by the factory...
while (inputSAM.hasNext() {
Container container = containerFactory.getNextContainer(inputSAM.next, containerOffset);
if (container != null) {
containerOffset = writeContainer(container...)
}
}
// if there is a final Container, retrieve and emit it
Container finalContainer = containerFactory.getFinalContainer(containerOffset);
if (finalContainer != null) {
containers.add(finalContainer);
}
Multiple slices are only aggregated into a single container if slices/container is > 1, *and* all of the
slices are SINGLE_REFERENCE and have the same (mapped) reference context. MULTI_REFERENCE slices are never
aggregated with other slices into a single container, no matter how many slices/container are requested,
since it can be very inefficient to do so (the spec requires that if any slice in a container is
multiple-reference, all slices in the container must also be MULTI_REFERENCE).
For coordinate sorted inputs, a MULTI_REFERENCE slice is only created when there are not enough reads mapped
to a single reference sequence to reach the MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD. This usually only happens
near the end of the reads mapped to a given sequence. When that happens, a small MULTI_REFERENCE slice for the
remaining reads mapped to the previous sequence, plus some subsequent records are accumulated until
MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD is hit, and the resulting MULTI_REFERENCE slice will be emitted into
it's own container.-
Constructor Summary
ConstructorsConstructorDescriptionContainerFactory(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource) -
Method Summary
Modifier and TypeMethodDescriptiongetFinalContainer(long containerByteOffset) Obtain aContainerfrom any remaining accumulated SAMRecords, if any.final ContainergetNextContainer(SAMRecord samRecord, long containerByteOffset) booleanshouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries) Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters.
-
Constructor Details
-
ContainerFactory
public ContainerFactory(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource) - Parameters:
samFileHeader- theSAMFileHeader(used to determine sort order and resolve read groups)encodingStrategy- theCRAMEncodingStrategyparameters to usereferenceSource- theCRAMReferenceSourceto use for containers created by this factory
-
-
Method Details
-
getNextContainer
-
getFinalContainer
Obtain aContainerfrom any remaining accumulated SAMRecords, if any. -
shouldEmitContainer
public boolean shouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries) Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters. A container is emitted if: - the requested number of slices per container has been reached, or - a multi-reference slice has been accumulated (a multi-ref slice will always be emitted into it's own container as soon as it's generated, since we dont want to confer multi-ref-ness on the next slice, which might otherwise be single-ref), or - we haven't reached the requested number of slices, but we're changing reference contexts and we don't want to create a MULTI-REF container out of two or more SINGLE_REF slices with different contexts, since by the spec we'd be forced to call that container MULTI-REF, and thus the slices would have to be multi-ref. So instead emit a single ref container- Parameters:
currentReferenceContextID-nextRecordIndex-numberOfSliceEntries-- Returns:
- true if a
Containershould be emitted, otherwise false
-