Package com.actelion.research.chem.io
Class DWARFileParser
- java.lang.Object
-
- com.actelion.research.chem.io.CompoundFileParser
-
- com.actelion.research.chem.io.DWARFileParser
-
- All Implemented Interfaces:
DescriptorConstants,CompoundTableConstants
public class DWARFileParser extends CompoundFileParser implements DescriptorConstants, CompoundTableConstants
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description classDWARFileParser.SpecialField
-
Field Summary
Fields Modifier and Type Field Description static intMODE_BUFFER_HEAD_AND_TAILstatic intMODE_COORDINATES_PREFER_2Dstatic intMODE_COORDINATES_PREFER_3Dstatic intMODE_COORDINATES_REQUIRE_2Dstatic intMODE_COORDINATES_REQUIRE_3Dstatic intMODE_EXTRACT_DETAILS-
Fields inherited from class com.actelion.research.chem.io.CompoundFileParser
mReader
-
Fields inherited from interface com.actelion.research.chem.io.CompoundTableConstants
cAllowLogModeForNegativeOrZeroValues, cAutoStartMacro, cColumnName, cColumnNameRowList, cColumnProperty, cColumnPropertyBinBase, cColumnPropertyBinIsDate, cColumnPropertyBinIsLog, cColumnPropertyBinSize, cColumnPropertyCommentDepartment, cColumnPropertyCommentUploadStatus, cColumnPropertyCyclicDataMax, cColumnPropertyDataMax, cColumnPropertyDataMin, cColumnPropertyDescriptorVersion, cColumnPropertyDetailCount, cColumnPropertyDetailName, cColumnPropertyDetailSeparator, cColumnPropertyDetailSource, cColumnPropertyDetailType, cColumnPropertyDisplayGroup, cColumnPropertyEnd, cColumnPropertyFormula, cColumnPropertyGroupName, cColumnPropertyImagePath, cColumnPropertyIsClusterNo, cColumnPropertyIsDisplayable, cColumnPropertyIsFragment, cColumnPropertyLaunchAllowMultiple, cColumnPropertyLaunchCommand, cColumnPropertyLaunchCount, cColumnPropertyLaunchDecoration, cColumnPropertyLaunchName, cColumnPropertyLaunchOption, cColumnPropertyLookupCount, cColumnPropertyLookupDetailURL, cColumnPropertyLookupEncode, cColumnPropertyLookupFilter, cColumnPropertyLookupFilterRemoveMinus, cColumnPropertyLookupName, cColumnPropertyLookupURL, cColumnPropertyOpenExternalName, cColumnPropertyOpenExternalPath, cColumnPropertyOrbitType, cColumnPropertyParentColumn, cColumnPropertyReactionPart, cColumnPropertyReferencedColumn, cColumnPropertyReferenceStrengthColumn, cColumnPropertyReferenceType, cColumnPropertyReferenceTypeRedundant, cColumnPropertyReferenceTypeTopDown, cColumnPropertyRelatedCatalystColumn, cColumnPropertyRelatedIdentifierColumn, cColumnPropertySpecialType, cColumnPropertyStart, cColumnPropertySuperpose, cColumnPropertySuperposeAlign, cColumnPropertySuperposeMolecule, cColumnPropertyUseThumbNail, cColumnRelationTypes, cColumnType2DCoordinates, cColumnType3DCoordinates, cColumnTypeAtomColorInfo, cColumnTypeIDCode, cColumnTypeReactionMapping, cColumnTypeReactionObjects, cColumnTypeRXNCode, cColumnUnassignedCode, cColumnUnassignedItemText, cDataDependentPropertiesEnd, cDataDependentPropertiesStart, cDataTypeAutomatic, cDataTypeCode, cDataTypeDate, cDataTypeFloat, cDataTypeInteger, cDataTypeString, cDataTypeText, cDefaultDetailSeparator, cDetailDataEnd, cDetailDataStart, cDetailID, cDetailIndexSeparator, cEntrySeparator, cEntrySeparatorBytes, cExtensionNameFileExplanation, cExtensionNameMacroList, cFileExplanationEnd, cFileExplanationStart, cHitlistData, cHitlistDataEnd, cHitlistDataStart, cHitlistName, cLineSeparator, cLineSeparatorByte, cMacroListEnd, cMacroListStart, cMaxDateOrDoubleCategoryCount, cMaxTextCategoryCount, cNativeFileCreated, cNativeFileHeaderEnd, cNativeFileHeaderStart, cNativeFileRowCount, cNativeFileVersion, cParentSpecialColumnTypes, cPropertiesEnd, cPropertiesStart, cRangeNotAvailable, cRangeSeparation, cReactionHiliteModeCode, cReactionHiliteModeNone, cReactionHiliteModeReactionCenter, cReactionHiliteModeText, cReactionPartDelimiter, cReactionPartProducts, cReactionPartReactants, cReactionPartReaction, cStructureHiliteModeCode, cStructureHiliteModeCurrentRow, cStructureHiliteModeFilter, cStructureHiliteModeNone, cStructureHiliteModeText, cSummaryModeCode, cSummaryModeMaximum, cSummaryModeMean, cSummaryModeMedian, cSummaryModeMinimum, cSummaryModeNormal, cSummaryModeSum, cSummaryModeText, cSuperposeAlignValueShape, cSuperposeValueReferenceRow, cTemplateTagName, cTextExclusionTypeContains, cTextExclusionTypeEndsWith, cTextExclusionTypeEquals, cTextExclusionTypeRegEx, cTextExclusionTypeStartsWith, cTextMultipleCategories, cViewConfigTagName, cViewNameEnd, cViewNameStart, NEWLINE_REGEX, NEWLINE_STRING, TAB_STRING
-
Fields inherited from interface com.actelion.research.chem.descriptor.DescriptorConstants
DESCRIPTOR_BINARY_SKELETONSPHERES, DESCRIPTOR_CenteredSkeletonFragments, DESCRIPTOR_EXTENDED_LIST, DESCRIPTOR_FFP512, DESCRIPTOR_Flexophore, DESCRIPTOR_FULL_FRAGMENT_SET, DESCRIPTOR_HashedCFp, DESCRIPTOR_IntegerVector, DESCRIPTOR_LIST, DESCRIPTOR_MAX_COMMON_SUBSTRUCT, DESCRIPTOR_OrganicFunctionalGroups, DESCRIPTOR_PFP512, DESCRIPTOR_PhysicoChemicalProperties, DESCRIPTOR_PTREE, DESCRIPTOR_ReactionFP, DESCRIPTOR_ShapeAlign, DESCRIPTOR_ShapeAlignSingleConf, DESCRIPTOR_SkeletonSpheres, DESCRIPTOR_SUBSTRUCT_QUERY_IN_BASE, DESCRIPTOR_TopoPPHistDist, DESCRIPTOR_TYPE_MOLECULE, DESCRIPTOR_TYPE_REACTION, DESCRIPTOR_TYPE_UNKNOWN
-
-
Constructor Summary
Constructors Constructor Description DWARFileParser(java.io.File file)Constructs a DWARFileParser from a File with coordinate mode MODE_COORDINATES_PREFER_2D.DWARFileParser(java.io.File file, int mode)Constructs a DWARFileParser from a File with the specified coordinate mode.DWARFileParser(java.io.Reader reader)Constructs a DWARFileParser from a Reader with coordinate mode MODE_COORDINATES_PREFER_2D.DWARFileParser(java.io.Reader reader, int mode)Constructs a DWARFileParser from a Reader with the specified coordinate mode.DWARFileParser(java.lang.String fileName)Constructs a DWARFileParser from a file name with coordinate mode MODE_COORDINATES_PREFER_2D.DWARFileParser(java.lang.String fileName, int mode)Constructs a DWARFileParser from a file name with the specified coordinate mode.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected booleanadvanceToNext()Dont't call this method directly.intgetChildFieldIndex(java.lang.String parentColumnName, java.lang.String childType)java.util.PropertiesgetColumnProperties(java.lang.String columnName)Returns the original column properties of any source column by column name.java.lang.StringgetCoordinates()This returns encoded atom coordinates according to the defined mode.java.lang.StringgetCoordinates2D()java.lang.StringgetCoordinates3D()java.lang.ObjectgetDescriptor(java.lang.String shortName)If the file source contains encoded descriptors, then overwrite this method to save the calculation time.java.util.HashMap<java.lang.String,byte[]>getDetails()Provided that the mode contains MODE_EXTRACT_DETAILS, then this method returns a map of all embedded detail objects of the DWAR file.java.lang.StringgetFieldData(int no)Returns the cell content of the current row.java.lang.String[]getFieldNames()Compiles all column names that contain alpha-numerical information.java.util.ArrayList<java.lang.String>getHeadOrTail()Provided that the mode contains MODE_BUFFER_HEAD_AND_TAIL, then this method returns a list of all header/footer rows of the DWAR file.java.lang.StringgetIDCode()Either this method and getCoordinates() or getMolecule() must be overwritten!!!java.lang.StringgetIndex()java.lang.StringgetMoleculeName()java.lang.StringgetRow()Returns the entire line containing all row dataintgetRowCount()Depending on data source returns the total row count or -1 if unknownjava.lang.StringgetSpecialFieldData(int fieldIndex)intgetSpecialFieldIndex(java.lang.String columnName)java.util.TreeMap<java.lang.String,DWARFileParser.SpecialField>getSpecialFieldMap()Returns a columnName->SpecialField map of all non-alphanumerical columns.java.lang.StringgetStructureCoordinates3DColumnName()booleanhasStructureCoordinates()booleanhasStructureCoordinates2D()booleanhasStructureCoordinates3D()booleanhasStructures()If you don't read any records after calling this method, don't forget to call close() to close the underlying file.-
Methods inherited from class com.actelion.research.chem.io.CompoundFileParser
close, createParser, getDescriptorHandlerFactory, getFieldIndex, getMolecule, isOpen, next, setDescriptorHandlerFactory
-
-
-
-
Field Detail
-
MODE_COORDINATES_PREFER_2D
public static final int MODE_COORDINATES_PREFER_2D
- See Also:
- Constant Field Values
-
MODE_COORDINATES_PREFER_3D
public static final int MODE_COORDINATES_PREFER_3D
- See Also:
- Constant Field Values
-
MODE_COORDINATES_REQUIRE_2D
public static final int MODE_COORDINATES_REQUIRE_2D
- See Also:
- Constant Field Values
-
MODE_COORDINATES_REQUIRE_3D
public static final int MODE_COORDINATES_REQUIRE_3D
- See Also:
- Constant Field Values
-
MODE_BUFFER_HEAD_AND_TAIL
public static final int MODE_BUFFER_HEAD_AND_TAIL
- See Also:
- Constant Field Values
-
MODE_EXTRACT_DETAILS
public static final int MODE_EXTRACT_DETAILS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
DWARFileParser
public DWARFileParser(java.lang.String fileName)
Constructs a DWARFileParser from a file name with coordinate mode MODE_COORDINATES_PREFER_2D.- Parameters:
fileName-
-
DWARFileParser
public DWARFileParser(java.io.File file)
Constructs a DWARFileParser from a File with coordinate mode MODE_COORDINATES_PREFER_2D.- Parameters:
file-
-
DWARFileParser
public DWARFileParser(java.io.Reader reader)
Constructs a DWARFileParser from a Reader with coordinate mode MODE_COORDINATES_PREFER_2D.- Parameters:
reader-
-
DWARFileParser
public DWARFileParser(java.lang.String fileName, int mode)Constructs a DWARFileParser from a file name with the specified coordinate mode.- Parameters:
fileName-mode- one of 4 MODE_COORDINATE... modes
-
DWARFileParser
public DWARFileParser(java.io.File file, int mode)Constructs a DWARFileParser from a File with the specified coordinate mode.- Parameters:
file-mode- one of 4 MODE_COORDINATE... modes
-
DWARFileParser
public DWARFileParser(java.io.Reader reader, int mode)Constructs a DWARFileParser from a Reader with the specified coordinate mode.- Parameters:
reader-mode- one of 4 MODE_COORDINATE... modes
-
-
Method Detail
-
hasStructures
public boolean hasStructures()
If you don't read any records after calling this method, don't forget to call close() to close the underlying file.- Returns:
- whether the file contains chemical structures
-
hasStructureCoordinates
public boolean hasStructureCoordinates()
- Returns:
- whether the file contains chemical structures with explicit atom coordinates
-
hasStructureCoordinates2D
public boolean hasStructureCoordinates2D()
- Returns:
- whether the file contains chemical structures with explicit atom coordinates
-
hasStructureCoordinates3D
public boolean hasStructureCoordinates3D()
- Returns:
- whether the file contains chemical structures with explicit atom coordinates
-
getStructureCoordinates3DColumnName
public java.lang.String getStructureCoordinates3DColumnName()
-
getFieldNames
public java.lang.String[] getFieldNames()
Description copied from class:CompoundFileParserCompiles all column names that contain alpha-numerical information. Columns containing chemistry objects, coordinates or descriptors don't appear in the list.- Specified by:
getFieldNamesin classCompoundFileParser- Returns:
- columns name array in the order of appearance
-
getSpecialFieldIndex
public int getSpecialFieldIndex(java.lang.String columnName)
- Parameters:
columnName-- Returns:
- field index for special fields, e.g. to be used for getSpecialFieldData()
-
getChildFieldIndex
public int getChildFieldIndex(java.lang.String parentColumnName, java.lang.String childType)- Parameters:
parentColumnName-childType-- Returns:
- field index for special fields, e.g. to be used for getSpecialFieldData()
-
getRowCount
public int getRowCount()
Description copied from class:CompoundFileParserDepending on data source returns the total row count or -1 if unknown- Specified by:
getRowCountin classCompoundFileParser- Returns:
- number of rows or -1
-
getHeadOrTail
public java.util.ArrayList<java.lang.String> getHeadOrTail()
Provided that the mode contains MODE_BUFFER_HEAD_AND_TAIL, then this method returns a list of all header/footer rows of the DWAR file. If this method is called before all rows have been read, then the header lines including column properties and the column title line are returned. If this method is called before all rows have been read, then all lines after the data table, i.e. the runtime properties, are returned.- Returns:
-
getDetails
public java.util.HashMap<java.lang.String,byte[]> getDetails()
Provided that the mode contains MODE_EXTRACT_DETAILS, then this method returns a map of all embedded detail objects of the DWAR file. This method must not be called before all rows have been read.- Returns:
-
getRow
public java.lang.String getRow()
Returns the entire line containing all row data- Returns:
-
advanceToNext
protected boolean advanceToNext()
Description copied from class:CompoundFileParserDont't call this method directly. Use next() instead.- Specified by:
advanceToNextin classCompoundFileParser- Returns:
- false if there is no next row
-
getIDCode
public java.lang.String getIDCode()
Description copied from class:CompoundFileParserEither this method and getCoordinates() or getMolecule() must be overwritten!!!- Overrides:
getIDCodein classCompoundFileParser- Returns:
- the row content of the first column containing chemical structures
-
getCoordinates
public java.lang.String getCoordinates()
This returns encoded atom coordinates according to the defined mode. If the compound file does not contain atom coordinates, then null is returned. If mode is one of MODE_COORDINATES_REQUIRE... and the required coordinate dimensionality (2D or 3D) is not available then null is returned. If mode is one of MODE_COORDINATES_PREFER... and the preferred coordinate dimensionality (2D or 3D) is not available then coordinates in another dimensionality are returned.- Overrides:
getCoordinatesin classCompoundFileParser- Returns:
- idcoords of first chemical structure column of the current row
-
getCoordinates2D
public java.lang.String getCoordinates2D()
-
getCoordinates3D
public java.lang.String getCoordinates3D()
-
getMoleculeName
public java.lang.String getMoleculeName()
- Specified by:
getMoleculeNamein classCompoundFileParser- Returns:
- name/id of (primary) chemical structure of the current row
-
getDescriptor
public java.lang.Object getDescriptor(java.lang.String shortName)
Description copied from class:CompoundFileParserIf the file source contains encoded descriptors, then overwrite this method to save the calculation time.- Overrides:
getDescriptorin classCompoundFileParser- Returns:
- descriptor as int[] or whatever is the descriptors binary format
-
getIndex
public java.lang.String getIndex()
- Returns:
- the String encoded FragFp descriptor of the first column containing chemical structures
-
getFieldData
public java.lang.String getFieldData(int no)
Description copied from class:CompoundFileParserReturns the cell content of the current row. Multi-line cell entries are separated by a '\n' character.- Specified by:
getFieldDatain classCompoundFileParser- Parameters:
no- refers to alpha-numerical columns only, as getFieldNames()- Returns:
-
getSpecialFieldMap
public java.util.TreeMap<java.lang.String,DWARFileParser.SpecialField> getSpecialFieldMap()
Returns a columnName->SpecialField map of all non-alphanumerical columns. SpecialField.type is one of the types defined in CompoundTableConstants: cColumnTypeIDCode,cColumnTypeRXNCode,cColumnType2DCoordinates,cColumnType3DCoordinates, cColumnTypeAtomColorInfo, and descriptor shortNames;- Returns:
- special fields
-
getSpecialFieldData
public java.lang.String getSpecialFieldData(int fieldIndex)
- Parameters:
fieldIndex- is available from special-field-TreeMap by getSpecialFieldMap().get(columnName).fieldIndex- Returns:
- String encoded data content of special field, e.g. idcode
-
getColumnProperties
public java.util.Properties getColumnProperties(java.lang.String columnName)
Returns the original column properties of any source column by column name.- Parameters:
columnName-- Returns:
-
-