What is SegyTool?


SegyTool is a full featured interactive viewer and editor for SEGY format files.
It is written in Java and will run on any platform that supports the Java Run-Time Environment

Features of Segytool include


    A brief history of formats


Data can be stored in a computer in various formats. The SEGY format itself can contain
several formats within the same file depending on what data is being represented.

A reminder;
There are 8 bits in a byte regardless of the computer.
1 Kilobyte is 1024 bytes or 2**10.
1 Megabyte is 1,048,576 bytes, or 1024 kilobytes or 2**20.
1 Gigabyte is 1,073,741,824 bytes, or 1024 megabytes, or 2**30.
The formats most likely to be encountered are;

EBCDIC Stands for "Extended Binary Coded Digital Interchange Character". A format for representing text data that was used at the time the SEGY standard was developed. Has been largely supplanted by ASCII (see below). Text data in SEGY files is still written in EBCDIC for reasons of backward compatibility although some PC based systems use ASCII.

ASCII Stands for "American Standard Code for Information Interchange". This was the standard format for text information in all North American and European computers. ASCII uses 7 bits to represent all the letters of the alphabet, the numbers 0-9, and all special characters and punctuation. The 8th bit is a sign bit. With 7 bits ASCII can represent only 128 characters which is enough for English and most Western European languages but is inadequate for Asian languages with much larger alphabets. ASCII is now being replaced by 32 bit character sets which can represent all the characters in all the alphabets in the world. It is unlikely however that we will see these character sets in SEGY for some time to come.

16 bit integer or short integer. A 16 bit (2 byte ) integer. Short integers are now largely obsolete. Modern computers deal with numbers 32 bits (or more) at a time. In fact modern computers take longer to deal with 16 bit numbers than with 32 bit numbers. A 16 bit integer can represent a range of values from -32767 to +32767, (2**15). The 16th bit is the sign bit. Obviously a 16 bit integer cannot represent a UTM coordinate or a trace number in a large 3D data volume. It was used primarily to save space and because the most powerful computers in existence at the time worked with data in 16 bit chunks.

32 bit or long integer. The format used for integers in modern computers. This format can represent 2**31 or +-2,147,483,648. 32 bit integers have been used since the inception of the SEGY standard for large values such as UTM coordinates.

IBM floating point.. A 32 bit (4 byte) floating point format. This was the standard floating point format at the time the SEGY standard was set down. It is not used in any modern computers, (even those made by IBM). Most SEGY data is still written in IBM float however and any program dealing with SEGY must be able to make the conversion.

IEEE floating point. A 32 bit (4 byte) floating point format. The modern standard for floating point values. Used internally by most computers. Some SEGY data, particularly that which is written for a PC based system contains IEEE floating point data.

A further consideration is byte order. Personal computers user "little endian" or "low order byte first". Sun and other workstations use "big endian" or "high order byte first".

To understand the difference consider the number "1" written in binary as a 16 bit (2 byte) integer. In low byte order it would appear as

00000001 00000000

In high byte order it would appear as

00000000 00000001

Obviously reading a low order byte number as high order would result in an error. Instead of "1" the number would be interpreted as "256". Most SEGY data has been written with the high order byte first but this is changing as PCs are used more and more. Any program dealing with SEGY data should be able to work with either byte order.


    Segy format overview


The SEGY format has been adapted by the SEG as a standard for trace sequential seismic data. The SEGY format is widely supported and is in fact used almost exclusively used for the exchange of seismic data. All geophysical interpretation workstations read SEGY and some even use SEGY as their internal format.

With a standard so widely used there are of course, millions of tapes and disk files in existence containing SEGY data.

SegyTool does not read SEGY data from tape so the procedures for reading SEGY data from tape will not be covered here. The essential layout of a SEGY data set is the same whether on disk or tape.

The SEGY standard is made up of;

  1. An EBCDIC format header of exactly 3200 bytes. There is one EBCDIC header per SEGY file.This header contains text which (hopefully ) describes the area name, line name, shotpoint range, recording parameters, and processing history. Not all EBCDIC header are so informative but most do contain the area name and line name. Information is usually written in 40 lines of 80 characters each.

 

  1. Binary header of exactly 400 bytes. There is only one binary header per file. This header contains the number of samples, sample rate, and format code. The layout of the binary header is as follows.



           There are many other data fields in the binary header but the above represent the critical values for viewing and editing.


  1. Trace header of exactly 240 bytes. There is one trace header per trace. The header contains information about the trace such as shotpoint number, CDP, and survey locations. The number of samples and sample rate for each trace are also written in the header.
  1. Data samples. Each trace consists of a trace header followed by n data samples where n is the number of samples per trace as defined in the trace header. Note that most programs that read SEGY disk files, including SegyTool, set the number of samples by the value in the binary header and assume a consistent number of bytes for each trace. The number of bytes per sample is dependant upon the format of the data samples. Floating point and 32 bit integers use 4 bytes per sample, 16 bit integer uses 2 bytes, 8 bit only one.The most common sample formats are IBM float and 16 bit integer, although SeisX and SeisWare use IEEE float instead of IBM. This is for performance reasons as IEEE is the native floating point format for the computers these applications run on. 32 bit and 8 bit integer samples are rarely seen. Note that the sample format has nothing to do with the format of trace header values such as shotpoints or XY coordinates.
  2. 3 and 4 above are repeated for each trace in the file.

    The number of samples multiplied by the sample rate in milliseconds yields the record length. The number of bytes per trace can be computed from the number of samples multiplied by the bytes per sample plus 240 bytes for the trace header. The overall size of the file will be exactly the number of bytes per trace times the number of traces plus 400 bytes for the binary header and 3200 bytes for the EBCDIC header.