Code archives/File Utilities/JPEG reference parser
This code has been declared by its author to be Public Domain code.
Download source code
| |||||
| This is mainly intended as a reference for parsing a JPEG file correctly/safely. Includes workarounds for common invalid/unexpected/truncated data, checks all byte reads (I think!) and deliberately parses the entire JPEG file. Why? I wrote a simple JPEG parser as part of this Code Archives entry... Retrieve image information without loading entire image ... but it just aborts after reading the relevant information, which is fine for what it does. However, on trying to expand on this to read EXIF data recently, although I had some success and now know how to do this, my EXIF reader crashed on files with unexpected data, as EXIF requires you to seek ahead/back, loop around all over the place, etc. (That's also why this code, unlike the above, will only read local files, since you can't seek back in online streams; in fact, this limitation turns out to be a deliberate part of the TIFF specification -- and EXIF uses the TIFF format, within a JPEG file, to store its data! Oh, what fun!) So, this code tries to read each and every marker/section in a JPEG file properly and should serve as a reasonable basis for anyone trying to read JPEGs (in any language)... and for me to try and write a crash-free EXIF reader. This is not optimised for speed (skipping scan data is done byte-by-byte and would be better done by reading chunks of the file for proper usage, but this brings in some complications so not done here). TIP: You can add an "Exit" at the end of each Case statement under "Frame markers (image data)" to just show the width and height information for the main image in each file and ignore the rest of the file. Tested on 15000+ images! If you find any JPEG files that cause it to spew, let me know... Prepare for disappointment unless you're writing your own JPEG/EXIF reader! It was interesting to see just how many JPEG files are badly written, and how much wasteage there is in many files too. JPEGs can contain thumbnail images (that's why Windows, digital cameras, etc, can display previews so quickly), but my very own camera has produced JPEG photos 3072 x 2048 @ 24-bit that also contained a 'hidden' secondary image of 1536 x 1024 @ 24-bit. That's not a thumbnail! Don't know what that is! Kind of explains the 2 MB size... no need to worry about storage space on modern SD cards, I guess... To self: "Right... deep breath... EXIF." | |||||
' -----------------------------------------------------------------------------
' JPEG reference parser...
' -----------------------------------------------------------------------------
SuperStrict
' -----------------------------------------------------------------------------
' READ THIS! Change to your own image folder for demo...
' -----------------------------------------------------------------------------
' Demo code at bottom includes single-file test...
Local folder$
folder = "H:\Docs\My Pictures\"
' -----------------------------------------------------------------------------
' Utility functions...
' -----------------------------------------------------------------------------
Function StreamRemainder:Int (jpeg:TStream)
Return StreamSize (jpeg) - StreamPos (jpeg)
End Function
Function SkipData (jpeg:TStream, datalength:Int)
ReadString jpeg, Min (datalength, StreamRemainder (jpeg))
End Function
Function PrintImageData (jpeg:TStream, datalength:Short)
' Check we have enough bytes left in file, abort if not...
If StreamRemainder (jpeg) < datalength
SkipData jpeg, datalength
Return
EndIf
Local bpp:Int = ReadByte (jpeg) ' Bits per pixel
Local height:Int = ReadShort (jpeg) ' Height
Local width:Int = ReadShort (jpeg) ' Width
Local components:Int = ReadByte (jpeg) ' Components per pixel (1 for grayscale, 3 for RGB)
Local depth:Int = bpp * components
Local colors:Int = 2 ^ depth
SkipData jpeg, datalength - 6 ' ' Skip rest of frame header after reading above six bytes...
Print "Image details: " + width + " x " + height + " @ " + depth + "-bit (" + Int (2 ^ depth) + " colours)"
End Function
' -----------------------------------------------------------------------------
' JPEG parser...
' -----------------------------------------------------------------------------
Function ParseJPEG (f:String)
Print "Info for " + f + " (file size: " + FileSize (f) + " bytes)"
Print ""
Local jpeg:TStream = BigEndianStream (ReadStream (f))
If jpeg And StreamSize (jpeg) > 1 ' Next two bytes are safe!
Try
' Start of Image (SOI) marker ($FFD8) -- MUST BE PRESENT!
If ReadByte (jpeg) = $FF And ReadByte (jpeg) = $D8
Print ""
Print "--------------------------------------------------------------------------------"
Print "Start of Image marker $D8 found at byte offset 0"
Print "--------------------------------------------------------------------------------"
Print "Assuming JPEG file"
Local loop:Int ' For byte seek loops
Local datalength:Int ' Block length store
Local checkff:Byte ' Byte to be tested for $FF (start of block)...
Local marker:Byte ' Block marker code
Local startofblock:Int ' Record marker location
Local startofdata:Int ' Record data location after marker
Local markerinfo:String
' OK, start reading the file...
While Not Eof (jpeg)
' Searching for blocks beginning with $FF, then single byte marker, then data...
' |FFxx|length_of_block|data_data_data...
' |FFxx|length_of_block| is four bytes total, two each...
' ---------------------------------------------------------
' You are here --> |FFxx|length_of_block|data_data_data...
' ---------------------------------------------------------
startofblock = StreamPos (jpeg) ' Tracker for bytes read...
' Looking for FF first...
Repeat
checkff = ReadByte (jpeg) ' Some Photoshop 7 files have a huge string of zeroes directly after block's stated data length
Until (checkff = $FF) Or (Eof (jpeg))
' Used later...
startofdata = 0
datalength = 0
markerinfo = ""
If Not Eof (jpeg) And checkff = $FF
' ... then xx, the byte AFTER the FF block marker, skipping if FF (padding)...
Repeat
marker = ReadByte (jpeg)
Until (marker <> $FF) Or (Eof (jpeg))
' -----------------------------------------------------
' We are now here --> |length_of_block|data_data_data...
' -----------------------------------------------------
' Grab next two bytes (length of block) before proceeding, unless marker is standalone...
Select marker
Case $D0, $D1, $D2, $D3, $D4, $D5, $D6, $D7, $D8, $D9, $0, $FF
' Standalone markers with no following data.
Default
datalength = 0
If StreamRemainder (jpeg) > 1
datalength = ReadShort (jpeg) - 2 ' The 2 subtracted bytes store the length itself...
EndIf
End Select
' -----------------------------------------------------
' Now here --> |data_data_data...
' -----------------------------------------------------
' Record start of data so we can deduce how many bytes are read in each Case afterwards...
startofdata = StreamPos (jpeg)
Select marker
' ------------------------------------------------
' Padding
' ------------------------------------------------
Case $0, $FF
' Ignore these...
' ------------------------------------------------
' Frame decoding table markers
' ------------------------------------------------
Case $C4
markerinfo = "Define Huffman Table"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
Case $CC
markerinfo = "Define Arithmetic Table"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Frame markers (image data)
' ------------------------------------------------
' NB. Printing data for ALL image frames found, but the first one listed is always the main image...
' You can add an "Exit" at the end of each of these cases to only read the main image
' information...
Case $C0, $C1, $C2, $C3
markerinfo = "Start of Frame (non-differential Huffman coding)"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
PrintImageData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
Case $C5, $C6, $C7
markerinfo = "Start of Frame (differential Huffman coding)"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
PrintImageData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
Case $C8, $C9, $CA, $CB
markerinfo = "Start of Frame (non-differential arithmetic coding)"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
PrintImageData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
Case $CD, $CE, $CF
markerinfo = "Start of Frame (differential arithmetic coding)"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
PrintImageData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Restart markers (only used when decoding images)
' ------------------------------------------------
Case $D0, $D1, $D2, $D3, $D4, $D5, $D6, $D7
markerinfo = "Restart"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
Print "No data for this type of marker"
' Standalone marker, no following data...
' ------------------------------------------------
' Start of JPEG data
' ------------------------------------------------
Case $D8
markerinfo = "Start of Image"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' Now going through scan (picture) data and ignoring it because it's bloody complicated...
Local newff:Byte
Local foundmarker:Int
Local startdatascan:Int = StreamPos (jpeg)
Local bytesread:Int
While Not Eof (jpeg) And (Not foundmarker)
If ReadByte (jpeg) = $FF
If Not Eof (jpeg)
' See if it's a new block marker...
newff = ReadByte (jpeg)
If (newff <> 0)
foundmarker = newff
Exit
EndIf
EndIf
EndIf
Wend
If Eof (jpeg)
If foundmarker
Print "~nMarker $" + Right (Hex (foundmarker), 2) + " found at end of file"
Else
Print "~nFile ends with extraneous data"
EndIf
Else
' Go back two bytes if we ran into a marker so that it can be processed in main loop...
If foundmarker Then SeekStream jpeg, StreamPos (jpeg) - 2
EndIf
' ------------------------------------------------
' End of JPEG data
' ------------------------------------------------
Case $D9
markerinfo = "End of Image"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' Now seeking through scan (picture) data...
Local newff:Byte
Local foundmarker:Int
Local startdatascan:Int = StreamPos (jpeg)
Local bytesread:Int
While Not Eof (jpeg) And (Not foundmarker)
If ReadByte (jpeg) = $FF
If Not Eof (jpeg)
' See if it's a new block marker...
newff = ReadByte (jpeg)
If (newff <> 0) And (newff <> $FF)
foundmarker = newff
Exit
EndIf
EndIf
EndIf
Wend
If Eof (jpeg)
If foundmarker
Print "~nMarker $" + Right (Hex (foundmarker), 2) + " found at end of file"
Else
Print "~nFile ends with extraneous data"
EndIf
Else
' Go back two bytes if we ran into a marker so that it can be processed in main loop...
If foundmarker Then SeekStream jpeg, StreamPos (jpeg) - 2
EndIf
' ------------------------------------------------
' Image data
' ------------------------------------------------
Case $DA
markerinfo = "Start of Scan"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' Now seeking through scan (picture) data...
Local newff:Byte
Local foundmarker:Int
Local startdatascan:Int = StreamPos (jpeg)
Local bytesread:Int
While Not Eof (jpeg) And (Not foundmarker)
If ReadByte (jpeg) = $FF
If Not Eof (jpeg)
' See if it's a new block marker...
newff = ReadByte (jpeg)
Select newff
' Ignore 0 (means valid FF value in scan data), FF (possible padding data), D0-D7 (restart markers)...
Case $0, $FF, $D0, $D1, $D2, $D3, $D4, $D5, $D6, $D7
' Ignore these and move on...
Default
' Valid marker; break out of bank stream reader and then the file stream reader...
foundmarker = newff
Exit
End Select
EndIf
EndIf
Wend
If Eof (jpeg)
If foundmarker
Print "~nMarker $" + Right (Hex (foundmarker), 2) + " found at end of file"
Else
Print "~nFile ends with extraneous data"
EndIf
Else
' Go back two bytes if we ran into a marker so that it can be processed in main loop...
If foundmarker Then SeekStream jpeg, StreamPos (jpeg) - 2
EndIf
' ------------------------------------------------
' Quantization table, ignored
' ------------------------------------------------
Case $DB
markerinfo = "Define Quantization Table"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Number of lines in scan, ignored
' ------------------------------------------------
Case $DC
markerinfo = "Define Number of Lines"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Restart interval, ignored
' ------------------------------------------------
Case $DD
markerinfo = "Define Restart Interval"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Hierarchical progression, ignored
' ------------------------------------------------
Case $DE
markerinfo = "Define Hierarchical Progression"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Expand reference components, ignored
' ------------------------------------------------
Case $DF
markerinfo = "Expand reference components"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' APP0 marker (mainly to state JFIF-compatibility)
' ------------------------------------------------
Case $E0 ' JFIF marker
markerinfo = "JFIF/APP0"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' APP1 marker (mainly used for EXIF data)
' ------------------------------------------------
Case $E1 ' EXIF information
markerinfo = "EXIF/APP1"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Application-specific markers
' ------------------------------------------------
Case $E2, $E3, $E4, $E5, $E6, $E7, $E8, $E9, $EA, $EB, $EC, $EE, $EF
markerinfo = "Application-specific"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Application-specific, but usually Photoshop
' ------------------------------------------------
Case $ED
markerinfo = "Photoshop/APP14"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Comment marker
' ------------------------------------------------
Case $FE
markerinfo = "Comment"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
SkipData jpeg, datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
Else
Print "No data for this type of marker"
EndIf
' ------------------------------------------------
' Unknown marker. This shouldn't appear!
' ------------------------------------------------
Default
markerinfo = "UNIMPLEMENTED"
Print ""
Print "--------------------------------------------------------------------------------"
Print markerinfo + " marker $" + Right (Hex (marker), 2) + " found at byte offset " + startofblock
Print "--------------------------------------------------------------------------------"
Print ""
If datalength
Print "Data starts at byte offset " + startofdata + " and is " + datalength + " bytes long"
SkipData jpeg, datalength
Else
Print "No data for this type of marker"
EndIf
End Select
Else
' We reached end of file or read an invalid byte (should be an $FF marker)
' so ignore the rest of the file...
Exit
EndIf
Wend
Else
Print "Not a JPEG file!"
EndIf
Catch ReadFail:Object
Notify "Read error in " + f + "; " + StreamPos (jpeg)
End Try
CloseStream jpeg
Else
Print "File not found, or shorter than two required bytes!"
EndIf
End Function
' -----------------------------------------------------------------------------
' D E M O . . .
' -----------------------------------------------------------------------------
' Uncomment these 4 lines to test a single picture...
'Local img$
'img = "CHANGE ME"
'ParseJPEG img
'End
' -----------------------------------------------------------------------------
' Or name a local folder (sub-folders will be read too)...
' -----------------------------------------------------------------------------
ParseFolder folder
' -----------------------------------------------------------------------------
' Test function to iterate through all sub-folders...
' -----------------------------------------------------------------------------
Global ImageCount:Long
Function ParseFolder (dir:String)
If Right (dir:String, 1) <> "\" And Right (dir:String, 1) <> "/"
dir:String = dir:String + "/"
EndIf
Local folder:Int = ReadDir (dir:String)
If folder
Repeat
Local entry:String = NextFile (folder)
If entry = "" Then Exit
If entry <> "." And entry <> ".."
Local file:String
Local full:String
If FileType (dir + entry) = FILETYPE_FILE
file = entry
full = dir
If Right (full, 1) <> "\" And Right (full, 1) <> "/"
full = full + "\"
EndIf
full = full + file
ImageCount = ImageCount + 1
Print ""
Print "--------------------------------------------------------------------------------"
Print "Reading image number " + ImageCount + "..."
Print "--------------------------------------------------------------------------------"
ParseJPEG full
Else
If FileType (dir + entry) = FILETYPE_DIR
file = entry
If file <> "." And file <> ".."
Local ffolder:String = dir
If Right (ffolder, 1) <> "\" And Right (ffolder, 1) <> "/"
ffolder = ffolder + "\"
EndIf
ffolder = ffolder + file
ParseFolder (ffolder)
EndIf
EndIf
EndIf
EndIf
Forever
EndIf
End Function |
Comments
None.
Code Archives Forum