Lossy compression discards some original content to create smaller files Quizlet

Compression

Computer files and other kinds of data are compressed in smaller sizes for storage and transportation. They are later decompressed and used in their original form.

- Since disks are big mainly used for quicker access and transportation
- Used in phone calls, high quality videos
- Helps eleminate buffering and poor quality
- Helps with storing and transportation costs

ZIP

Same-as-earlier and shorter-symbol trick are only things you need to produce these files.

- Most popular format for compressed files on personal computers
- Broken into chunks

Uncompressed file =>
1. Compressed using same-as-earlier, so repeated data is replaced by shorter instruction
2. New file is scanned to see which symbols are the most frequent and then the appropriate table (most common symbols have least numeric codes) is made.
3. Numberic codes from step 2 are applied and the table is stored in the metadata otherwise it would be impossible to decode.
- Different files or even different parts of files have different tables

Type: Files, folders

Compression: Lossless

Works: Stores more than one type of file. Removes redundant data. Dictionary, but mostly deflate (see above). Most popular way for compressing.

Same-as-Earlier Trick

Write out new portions of data then encode to go back a certain number of characters and from there copy a certain amount of characters. Any new characters that were not earlier stated in that order you will need to write out.

- Example: FGAC-BDCF-FGAC-BD = FGACBDCF-b8c6

Can also be used to compress repetitions. The copy can be more than the back. Copy the message being regenerated. Each time 2 more characters will come that you will then copy (the new ones) until you have copied 3 times.
- Example: ABABABAB = AB-b2c3

LZ77, Lempel, ZIV, 77

Same-as-earlier trick (references previous instances of pattern in a code) invented by two Israeli computer scientists Abraham _________ and Jacob ___ published in 19__.

Could use one byte to define how far back to go and then how many characters to repeat or copy.
Common notation humans, not computers use: <11110, 101> = <30, 5> = Go back 30, copy 5

Claude Shannon, Robert Fano

Developed error coding, helped the emergence of compression, founded field of information theory. Error coding adds redundancy while compression takes out redundancy. Often error coding happens affer compression. Described one of the earliest compression techiniques in his seminal paper. Proved that better compression techniques existed. __________ ____ also discovered the technique at about the same time. Gave his MIT students the option to find better compression for their term paper and one solved the problem.

David Huffman

Student of Robert Fano came up with the Huffman tree which is a shorter-symbol trick. Still used today.

Shannon-Fano coding

Robert Rano and Shannon discovered one of the earliest compression techniques that is kind of like the shorter-symbol trick, but while it is occationally still used in ZIP there are better ones. They thought they could do better.

Shorter-Symbol Trick

Uses "abbriviations." Translate a sentence into numbers for the computer using abbriviations and string the numbers together for the computer to read. Must not be open to interpretation meaning that each character is 2 values since they are stored with no seperation. Usually ends with shorter total length.

- Everything in a computer gets stored as a binary number and then translated before being displayed on the screen or printed
- Idea that since e and t are used more they should be represented using single digits
- Example: A=01, B=02, L=12. ALB = 011202 not 1122
- Since having one digit values makes the code ambiguous you then add additional values to the front of each telling the computer how many places it takes.
- Example: t=1, e=9, u=38, r=39, f=233, j=234. All 1 digit are either 1 or 9, 2 digit = 3-8, 3 digit = 2

Lossless

Something for nothing. This compression algorithm can take a data file and compress it to a fraction of the original size and later decompress it to exactly the same thing. Decreases size substantially for most common types of files, but not always.

- Text, people who work with audio or video and need all detailed information

run-length encoding

Look for adjacent reptitions in the data and use that pattern and state the frequency. Encodes a run of reptitions with the length of the run.

- Usually used along with Huffman or other techniques
- The data repeated must have a pattern with no gaps
- Example: ABABABAB = 4AB

Leave-it-out trick/Discarding data

Since pictures take up a lot of space, a lot more than text files we can reduce the size. Take out every other row and every other column of pixels. Both dimensions are reduced by 50% and the entire photo is 25% smaller. You can repeat this again, but quality does go down.

- The pixels taken out become the same color as one of their neighbors
- Quality is definitely lower in detailed areas
- Never actually this simple, JPEG compresses in a much better way that makes compressed files indistinguishable from the original image. Still suffers from compression artifacts if compression is too extreme.
- Basically for JPEG the image is divided into 8by8 pixels and then each square is compressed with the color that is most common and will represent a single number or will be split into colors if it is like a gradiant.
- For music files anything we can't hear is generally taken out and chunks are made and compressed looking for patterns and will be described with less numbers.

Just remove data that is not needed (cannot get data back). Like group colors that are similar and discard the best. Important to find appropriate balance between size and quality. Can be used along with run-length to compress further.
We can see around 10 million not 16.7 million colors.

Pixelation/Colorbanding

Happens when algorithms for lossy compression like leave it out are too agressive.

Lossy

This compression algorithm leads to slight changes in the original file after decompression takes place. As long as it looks the same or sounds the same to humans it doesn't matter. Sometimes if used to extremely you will only be able to recognize, but the quality will be very bad.

- Used often on images, video, audio
- Smaller size, worse quality
- bigger size, better quality

megapixel

one million pixels; describes the size of the images captured by a camera

Compression artifacts

Not only loss of detail, but noticeable new features that are introduced by a particular method of lossy compression followed by decompression. Like pixelated edges.

BMP

Bitmap
Type: Raster graphic, bitmap digital images, photos, scans, wallpapers

Compression: Lossless/Uncompressed

Works: Run-length. Stores color data for each pixel (24 bits) without compression, largely supported by windows, but works with other OS

JPEG/JPG

Type: Raster graphic digital photos

Compression: Lossy
(block based, huffman)

Works: 8 by 8 pixel blocks and uses discrete cosine transform to remove high-frequency information in quantization. Samples color information then makes maller, huffman. Small file size for good quality. Compresses more than png, but less quality.

WAV

Type: Audio, CDs

Compression: Uncompressed

Works: Holds audio file. Stream of binary data composed from raw audio, assigned to one or multiple channels. Headers have metadata include sample and bit rate. Max size of 4GB. Used commonly in Windows. .aiff and this contain package pulse-code modulation (PCM) streams that convert analog signals into digital forms. CAN CONTAIN COMPRESSED FILES USING METADATA.

PNG

Type: High quality raster images

Compression: Lossless

Works: Uses deflate, combo of LZ77 and huffman coding.

MP3

Type: Music, CD

Compression: Lossy

Works: Tries not to lose quality. 32MB song compressed to 3MB. Eliminates sounds we can't hear. From 96-300 kilobits per sec.

GIF

Type: Raster image, multiple BMP images in single file

Compression: Lossless

Works: Dictionary based on LZ78 (successor to 77 made by Welch). Animated graphics file that uses LZ77. Used for reactions, animations. Max of 256 colors per image. Metadata can create different color pallets.

RAW

Type: Images (digital cameras and scanners), ?videos?

Compression: Uncompressed

Works: Raw data, directly from source. Edited without being disruptive befofre these files are processed.

Uncompressed

All information from original file will be kept in same format without changing any bits. Some data will be lost from what is happening in real life.

Codec

a computer program that enCOdes or DECodes in.

psycophysics, psychoacustics

Branch of psychology that is devoted to knowing what the human eye or ear cannot detect. When we know what we cannot detect we can take it out in lossy. Study relationship between stimulation and sensation. Extensive reseach on how humans see colors and how many shades they can distinguish

- We cannot tell the difference between similar shades of green so the computer changes similar colors to the same.
- Audio files may reduce sample rate or bit rate without us relizing

___________ is a branch of this that deals with sound. We can hear from 20 Hz - 20,000 Hz, delete other frequencies using lossy.

sample rate

Number of values taken per second when converting an analog signal to a digital one. How often an analog signal is used when converting to digital representation. Measured in kilohertz. Most humans can't tell the difference past 60 kHz.

Bit rate

The number of total bits processed per second. Typical home wireless connections can process 20 Mbps, but without compression 4K TV have way over 200 Mbps. Determined by multiplying bit depth and sample rate. If there are multiple channels double/triple/etc. this.

bit depth

Number of bits used for each sample taken of the analog's wave's amplitude.

metadata

Data about the data, comes at the beginning of the file. Usually this is needed otherwise the bits could mean anything.

- May include: Title, author, keywords, data created, location where created, file size, height, width, etc

Text

- All data must be retained.
- Doesn't take as much storage as other things, but there is so much of it.

Fixed-length code

Blocks of code that are always the same size.
- Problem: Wasted bits (0's in front)
- Example: ASCII: Standard for encoding text in binary. 1 byte. Easy to locate one word since everything is block of 8. ASCII is 7 bits, extended ASCII is 8.

Variable-length code

Each data block can be a different length.
- Letters that are more common have less bits.
- Example: Morse code

Prefix-free code

Since computers have no pauses or spaces this code works by ensuring that the beginning of each character does not match any other character.

- Example: If A starts with 0 then nothing else can start with 0, if B starts with 10 nothing else can start with that
- Type of variable length coding (not all same length).
- Less common symbols = more bits.
- Used in Huffman trees
- Can be used wherever there is redundancy. Frequent colors, repeated sounds, patterns of bits.

Binary trees

A data structure that can have at most 2 nodes.

Huffman Tree

He discovered the most efficient way to generate prefix-free code using a binary tree. Most efficient way to compress text at level of individual characters.

How it works: Scans all characters in the file and creates tree with least used characters at the bottom and works it's way to most used at the top. Most frequent characters on top, so path to get to them is shortest. Since only two options at each node then you can represent with 0's and 1's. Nodes are either sums of all number below or character along with frequency.

To compress: Locate in a tree and start at top and trace path to it. Left = 0, right = 1

To decompress: Follow the pattern until a character is reached then start back at the top with the next character.

Dictionary

In metadata used to explain what words or letter groups were swapped with what. Key explains the instructions to encode or decode compressed data. This combined with huffman trees creates prefix-free code where the most common patterns have the shortest code.
- Example: Swap "and" and "th" with symbols that represent bits.

Images

There are way more bytes than in text.
Lossless: Find patterns/runs then convert to binary using bits to represent length of run and bits to represent the pattern
Lossy: Take out unimportant information
Metadata: Includes height and width, predetermined by the file type
Pixel data: Start from top left and fill in appropriate colors

Video, bandwidth

Usually 24-30 images per second. Compressed using intrafram or interframe.
Important for streaming since digital connections have not enough ________ (amount of bit rate availible) to process a uncompressed image.

Intraframe

Spatial compression.
Compress each frame of video independently using same algorithms as other images.

Interframe

Temporal compression. Reuses redundant pixels from one frame to the next, so if background is the same you can just leave those pixels as is or slightly change them and only really change the pixels that need updating.

Audio

Lossy: Discard data we can't hear or redundant data. Change sample rate or bit depth which changes bit rate.

Lossless: Basically take anything else and apply to sound. Run-length encoding used for silence. Same-as-earlier used for repetition, dictionary.

redundancy

Finding frequencies or patterns in code.

Does lossy compression discards some original content to create smaller files?

Lossy data compression permanently discards some of the original data. It exploits the fact that human beings cannot detect subtle differences in sounds and colours. Key data is retained and less important data is discarded when lossy data compression takes place.

What is lossy compression quizlet?

lossy compression. reduction of a file's size by removing some of the data that is not noticeable by human senses. lossless compression. reduction of a file's size where no data is lost.

What is lossy compression used for?

Lossy compression can help you speed up your site, particularly if you have image-heavy content. You can use this compression type on various file formats, including Joint Photographic Experts Group (JPEG) and Graphics Interchange Format (GIF). You can also apply lossy compression to video and audio files.

How does lossy reduce file size?

Lossy compression reduces file size by removing unnecessary bits of information. This type of compression is most commonly used on image, video, and audio files, where a perfect representation of the source media is not required.