Optimizing custom JPEG decompression
The aim of my code is to decode an image format that is based on the JPEG chain of compression/decompression, however it is not compatible with the default JPEG flow as far as I know, since all libraries I have tried fail to properly decode the data. I am only interested decompression in this case. It follows the standard pattern:
- Read Huffman values -> Like normal JPEG
- unzigzag -> Like normal JPEG
- Dequantize -> Like normal JPEG
- IDCT -> Almost like normal JPEG, but different range/clamping
- Color Space conversion -> Custom, not YCbCr
For one 8x8 except for the last step that looks like this right now:
int16_t processBlock(int16_t prevDc, BitStream &stream, const tHuffTable &dcTable, const tHuffTable &acTable,
                     float *quantTable, bool isLuminance, int16_t *outBlock) {
    int16_t workBlock[64] = {0};
    int16_t curDc = decodeBlock(stream, workBlock, dcTable, acTable, prevDc); 
    unzigzag(workBlock);
    dequantize(workBlock, quantTable);
    idct(outBlock, workBlock, isLuminance);
    return curDc;
}
after this the outBlock is treated by the color space conversion based on the image type.
What I want to optimize is the overall performance. The entire image is decompressed in the following way with 4 luminance blocks for component 1, 1 chrominance block for component 2 and 1 chrominance block for component 3. There are 4 more blocks for another luminance component, but I dont know what it is used for, so we can ignore it. The code looks like this:
void decodeImageType0(
        uint32_t width,
        uint32_t height,
        std::vector<uint8_t> &outData,
        BitStream &stream,
        const tHuffTable &dcLumTable,
        const tHuffTable &acLumTable,
        const tHuffTable &dcCromTable,
        const tHuffTable &acCromTable,
        float *lumQuant[4],
        float *cromQuant[4]) {
    int16_t lum0[4][64]{};
    int16_t lum1[4][64]{};
    int16_t crom0[64]{};
    int16_t crom1[64]{};
    uint32_t colorBlock[16 * 16]{};
    const auto actualHeight = ((height + 15) / 16) * 16;
    const auto actualWidth = ((width + 15) / 16) * 16;
    int16_t prevDc[4] = {0};
    for (auto y = 0; y < (actualHeight / 16); ++y) {
        for (auto x = 0; x < (actualWidth / 16); ++x) {
            for (auto &lum : lum0) {
                prevDc[0] = processBlock(prevDc[0], stream, dcLumTable, acLumTable, lumQuant[0], true, lum);
            }
            prevDc[1] = processBlock(prevDc[1], stream, dcCromTable, acCromTable, cromQuant[1], false, crom0);
            prevDc[2] = processBlock(prevDc[2], stream, dcCromTable, acCromTable, cromQuant[2], false, crom1);
            for (auto &lum : lum1) {
                prevDc[3] = processBlock(prevDc[3], stream, dcLumTable, acLumTable, lumQuant[3], true, lum);
            }
            decodeColorBlockType0(lum0, lum1, crom0, crom1, colorBlock);
            for (auto row = 0; row < 16; ++row) {
                if(y * 16 + row >= height || x * 16 >= width) {
                    continue;
                }
                const auto numPixels = std::min(16u, width - x * 16);
                memcpy(outData.data() + (y * 16 + row) * width * 4 + x * 16 * 4, &colorBlock[row * 16], numPixels * 4);
            }
        }
    }
}
Now my measurements have shown that over 80% of the time is spent inside the idct function, so this is where I want to optimize. The function looks like this, after I applied what I could think of to optimize it. I have created a cache of the static coefficients used in the IDCT process which significantly improved performance, but I hope there is still room for more, for example nanojpg is 3 times faster (however with invalid results).
float idctHelper(const int16_t *inBlock, int32_t u, int32_t v, int32_t blockWidth, int32_t blockHeight) {
    glm::vec<4, float, glm::packed_lowp> vec3{};
    float result = 0.0f;
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; x += 4) {
            const auto idx = (v * 8 + u) * 64 + y * 8 + x;
            vec3 = glm::vec<4, float, glm::packed_lowp>(inBlock[y * blockWidth + x], inBlock[y * blockWidth + x + 1], inBlock[y * blockWidth + x + 2], inBlock[y * blockWidth + x + 3]) *
                    glm::vec<4, float, glm::packed_lowp>(idctLookup[idx], idctLookup[idx + 1], idctLookup[idx + 2], idctLookup[idx + 3]);
            result += vec3.x + vec3.y + vec3.z + vec3.w;
        }
    }
    return result;
}
template<typename T, typename U = T>
U clamp(T value, T min, T max) {
    return static_cast<U>(std::min<T>(std::max<T>(value, min), max));
}
void idct(int16_t *outBlock, int16_t *inBlock, bool isLuminance, int32_t blockWidth = 8, int32_t blockHeight = 8) {
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; ++x) {
            auto value = static_cast<int16_t>(std::round(
                    0.25f * idctHelper(inBlock, x, y, blockWidth, blockHeight)));
            if (isLuminance) {
                value = clamp<int16_t>(static_cast<int16_t>(value + 128), 0, 255);
            } else {
                value = clamp<int16_t>(value, -256, 255);
            }
            outBlock[y * blockWidth + x] = value;
        }
    }
}
This is the cache that is created once:
float alphaFunction(int32_t n) {
    static float INV_SQRT_2 = 1.0f / sqrtf(2.0f);
    if (n == 0) {
        return INV_SQRT_2;
    } else {
        return 1;
    }
}
        for (auto u = 0; u < 8; ++u) {
            for (auto v = 0; v < 8; ++v) {
                for (auto x = 0; x < 8; ++x) {
                    for (auto y = 0; y < 8; ++y) {
                        idctLookup[(v * 8 + u) * 64 + y * 8 + x] = alphaFunction(x) * alphaFunction(y) *
                                                                   cosf((2 * u + 1) * x * (float) M_PI / 16.0f) *
                                                                   cosf((2 * v + 1) * y * (float) M_PI / 16.0f);
                    }
                }
            }
        }
c++ performance image
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
The aim of my code is to decode an image format that is based on the JPEG chain of compression/decompression, however it is not compatible with the default JPEG flow as far as I know, since all libraries I have tried fail to properly decode the data. I am only interested decompression in this case. It follows the standard pattern:
- Read Huffman values -> Like normal JPEG
- unzigzag -> Like normal JPEG
- Dequantize -> Like normal JPEG
- IDCT -> Almost like normal JPEG, but different range/clamping
- Color Space conversion -> Custom, not YCbCr
For one 8x8 except for the last step that looks like this right now:
int16_t processBlock(int16_t prevDc, BitStream &stream, const tHuffTable &dcTable, const tHuffTable &acTable,
                     float *quantTable, bool isLuminance, int16_t *outBlock) {
    int16_t workBlock[64] = {0};
    int16_t curDc = decodeBlock(stream, workBlock, dcTable, acTable, prevDc); 
    unzigzag(workBlock);
    dequantize(workBlock, quantTable);
    idct(outBlock, workBlock, isLuminance);
    return curDc;
}
after this the outBlock is treated by the color space conversion based on the image type.
What I want to optimize is the overall performance. The entire image is decompressed in the following way with 4 luminance blocks for component 1, 1 chrominance block for component 2 and 1 chrominance block for component 3. There are 4 more blocks for another luminance component, but I dont know what it is used for, so we can ignore it. The code looks like this:
void decodeImageType0(
        uint32_t width,
        uint32_t height,
        std::vector<uint8_t> &outData,
        BitStream &stream,
        const tHuffTable &dcLumTable,
        const tHuffTable &acLumTable,
        const tHuffTable &dcCromTable,
        const tHuffTable &acCromTable,
        float *lumQuant[4],
        float *cromQuant[4]) {
    int16_t lum0[4][64]{};
    int16_t lum1[4][64]{};
    int16_t crom0[64]{};
    int16_t crom1[64]{};
    uint32_t colorBlock[16 * 16]{};
    const auto actualHeight = ((height + 15) / 16) * 16;
    const auto actualWidth = ((width + 15) / 16) * 16;
    int16_t prevDc[4] = {0};
    for (auto y = 0; y < (actualHeight / 16); ++y) {
        for (auto x = 0; x < (actualWidth / 16); ++x) {
            for (auto &lum : lum0) {
                prevDc[0] = processBlock(prevDc[0], stream, dcLumTable, acLumTable, lumQuant[0], true, lum);
            }
            prevDc[1] = processBlock(prevDc[1], stream, dcCromTable, acCromTable, cromQuant[1], false, crom0);
            prevDc[2] = processBlock(prevDc[2], stream, dcCromTable, acCromTable, cromQuant[2], false, crom1);
            for (auto &lum : lum1) {
                prevDc[3] = processBlock(prevDc[3], stream, dcLumTable, acLumTable, lumQuant[3], true, lum);
            }
            decodeColorBlockType0(lum0, lum1, crom0, crom1, colorBlock);
            for (auto row = 0; row < 16; ++row) {
                if(y * 16 + row >= height || x * 16 >= width) {
                    continue;
                }
                const auto numPixels = std::min(16u, width - x * 16);
                memcpy(outData.data() + (y * 16 + row) * width * 4 + x * 16 * 4, &colorBlock[row * 16], numPixels * 4);
            }
        }
    }
}
Now my measurements have shown that over 80% of the time is spent inside the idct function, so this is where I want to optimize. The function looks like this, after I applied what I could think of to optimize it. I have created a cache of the static coefficients used in the IDCT process which significantly improved performance, but I hope there is still room for more, for example nanojpg is 3 times faster (however with invalid results).
float idctHelper(const int16_t *inBlock, int32_t u, int32_t v, int32_t blockWidth, int32_t blockHeight) {
    glm::vec<4, float, glm::packed_lowp> vec3{};
    float result = 0.0f;
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; x += 4) {
            const auto idx = (v * 8 + u) * 64 + y * 8 + x;
            vec3 = glm::vec<4, float, glm::packed_lowp>(inBlock[y * blockWidth + x], inBlock[y * blockWidth + x + 1], inBlock[y * blockWidth + x + 2], inBlock[y * blockWidth + x + 3]) *
                    glm::vec<4, float, glm::packed_lowp>(idctLookup[idx], idctLookup[idx + 1], idctLookup[idx + 2], idctLookup[idx + 3]);
            result += vec3.x + vec3.y + vec3.z + vec3.w;
        }
    }
    return result;
}
template<typename T, typename U = T>
U clamp(T value, T min, T max) {
    return static_cast<U>(std::min<T>(std::max<T>(value, min), max));
}
void idct(int16_t *outBlock, int16_t *inBlock, bool isLuminance, int32_t blockWidth = 8, int32_t blockHeight = 8) {
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; ++x) {
            auto value = static_cast<int16_t>(std::round(
                    0.25f * idctHelper(inBlock, x, y, blockWidth, blockHeight)));
            if (isLuminance) {
                value = clamp<int16_t>(static_cast<int16_t>(value + 128), 0, 255);
            } else {
                value = clamp<int16_t>(value, -256, 255);
            }
            outBlock[y * blockWidth + x] = value;
        }
    }
}
This is the cache that is created once:
float alphaFunction(int32_t n) {
    static float INV_SQRT_2 = 1.0f / sqrtf(2.0f);
    if (n == 0) {
        return INV_SQRT_2;
    } else {
        return 1;
    }
}
        for (auto u = 0; u < 8; ++u) {
            for (auto v = 0; v < 8; ++v) {
                for (auto x = 0; x < 8; ++x) {
                    for (auto y = 0; y < 8; ++y) {
                        idctLookup[(v * 8 + u) * 64 + y * 8 + x] = alphaFunction(x) * alphaFunction(y) *
                                                                   cosf((2 * u + 1) * x * (float) M_PI / 16.0f) *
                                                                   cosf((2 * v + 1) * y * (float) M_PI / 16.0f);
                    }
                }
            }
        }
c++ performance image
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
The aim of my code is to decode an image format that is based on the JPEG chain of compression/decompression, however it is not compatible with the default JPEG flow as far as I know, since all libraries I have tried fail to properly decode the data. I am only interested decompression in this case. It follows the standard pattern:
- Read Huffman values -> Like normal JPEG
- unzigzag -> Like normal JPEG
- Dequantize -> Like normal JPEG
- IDCT -> Almost like normal JPEG, but different range/clamping
- Color Space conversion -> Custom, not YCbCr
For one 8x8 except for the last step that looks like this right now:
int16_t processBlock(int16_t prevDc, BitStream &stream, const tHuffTable &dcTable, const tHuffTable &acTable,
                     float *quantTable, bool isLuminance, int16_t *outBlock) {
    int16_t workBlock[64] = {0};
    int16_t curDc = decodeBlock(stream, workBlock, dcTable, acTable, prevDc); 
    unzigzag(workBlock);
    dequantize(workBlock, quantTable);
    idct(outBlock, workBlock, isLuminance);
    return curDc;
}
after this the outBlock is treated by the color space conversion based on the image type.
What I want to optimize is the overall performance. The entire image is decompressed in the following way with 4 luminance blocks for component 1, 1 chrominance block for component 2 and 1 chrominance block for component 3. There are 4 more blocks for another luminance component, but I dont know what it is used for, so we can ignore it. The code looks like this:
void decodeImageType0(
        uint32_t width,
        uint32_t height,
        std::vector<uint8_t> &outData,
        BitStream &stream,
        const tHuffTable &dcLumTable,
        const tHuffTable &acLumTable,
        const tHuffTable &dcCromTable,
        const tHuffTable &acCromTable,
        float *lumQuant[4],
        float *cromQuant[4]) {
    int16_t lum0[4][64]{};
    int16_t lum1[4][64]{};
    int16_t crom0[64]{};
    int16_t crom1[64]{};
    uint32_t colorBlock[16 * 16]{};
    const auto actualHeight = ((height + 15) / 16) * 16;
    const auto actualWidth = ((width + 15) / 16) * 16;
    int16_t prevDc[4] = {0};
    for (auto y = 0; y < (actualHeight / 16); ++y) {
        for (auto x = 0; x < (actualWidth / 16); ++x) {
            for (auto &lum : lum0) {
                prevDc[0] = processBlock(prevDc[0], stream, dcLumTable, acLumTable, lumQuant[0], true, lum);
            }
            prevDc[1] = processBlock(prevDc[1], stream, dcCromTable, acCromTable, cromQuant[1], false, crom0);
            prevDc[2] = processBlock(prevDc[2], stream, dcCromTable, acCromTable, cromQuant[2], false, crom1);
            for (auto &lum : lum1) {
                prevDc[3] = processBlock(prevDc[3], stream, dcLumTable, acLumTable, lumQuant[3], true, lum);
            }
            decodeColorBlockType0(lum0, lum1, crom0, crom1, colorBlock);
            for (auto row = 0; row < 16; ++row) {
                if(y * 16 + row >= height || x * 16 >= width) {
                    continue;
                }
                const auto numPixels = std::min(16u, width - x * 16);
                memcpy(outData.data() + (y * 16 + row) * width * 4 + x * 16 * 4, &colorBlock[row * 16], numPixels * 4);
            }
        }
    }
}
Now my measurements have shown that over 80% of the time is spent inside the idct function, so this is where I want to optimize. The function looks like this, after I applied what I could think of to optimize it. I have created a cache of the static coefficients used in the IDCT process which significantly improved performance, but I hope there is still room for more, for example nanojpg is 3 times faster (however with invalid results).
float idctHelper(const int16_t *inBlock, int32_t u, int32_t v, int32_t blockWidth, int32_t blockHeight) {
    glm::vec<4, float, glm::packed_lowp> vec3{};
    float result = 0.0f;
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; x += 4) {
            const auto idx = (v * 8 + u) * 64 + y * 8 + x;
            vec3 = glm::vec<4, float, glm::packed_lowp>(inBlock[y * blockWidth + x], inBlock[y * blockWidth + x + 1], inBlock[y * blockWidth + x + 2], inBlock[y * blockWidth + x + 3]) *
                    glm::vec<4, float, glm::packed_lowp>(idctLookup[idx], idctLookup[idx + 1], idctLookup[idx + 2], idctLookup[idx + 3]);
            result += vec3.x + vec3.y + vec3.z + vec3.w;
        }
    }
    return result;
}
template<typename T, typename U = T>
U clamp(T value, T min, T max) {
    return static_cast<U>(std::min<T>(std::max<T>(value, min), max));
}
void idct(int16_t *outBlock, int16_t *inBlock, bool isLuminance, int32_t blockWidth = 8, int32_t blockHeight = 8) {
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; ++x) {
            auto value = static_cast<int16_t>(std::round(
                    0.25f * idctHelper(inBlock, x, y, blockWidth, blockHeight)));
            if (isLuminance) {
                value = clamp<int16_t>(static_cast<int16_t>(value + 128), 0, 255);
            } else {
                value = clamp<int16_t>(value, -256, 255);
            }
            outBlock[y * blockWidth + x] = value;
        }
    }
}
This is the cache that is created once:
float alphaFunction(int32_t n) {
    static float INV_SQRT_2 = 1.0f / sqrtf(2.0f);
    if (n == 0) {
        return INV_SQRT_2;
    } else {
        return 1;
    }
}
        for (auto u = 0; u < 8; ++u) {
            for (auto v = 0; v < 8; ++v) {
                for (auto x = 0; x < 8; ++x) {
                    for (auto y = 0; y < 8; ++y) {
                        idctLookup[(v * 8 + u) * 64 + y * 8 + x] = alphaFunction(x) * alphaFunction(y) *
                                                                   cosf((2 * u + 1) * x * (float) M_PI / 16.0f) *
                                                                   cosf((2 * v + 1) * y * (float) M_PI / 16.0f);
                    }
                }
            }
        }
c++ performance image
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
The aim of my code is to decode an image format that is based on the JPEG chain of compression/decompression, however it is not compatible with the default JPEG flow as far as I know, since all libraries I have tried fail to properly decode the data. I am only interested decompression in this case. It follows the standard pattern:
- Read Huffman values -> Like normal JPEG
- unzigzag -> Like normal JPEG
- Dequantize -> Like normal JPEG
- IDCT -> Almost like normal JPEG, but different range/clamping
- Color Space conversion -> Custom, not YCbCr
For one 8x8 except for the last step that looks like this right now:
int16_t processBlock(int16_t prevDc, BitStream &stream, const tHuffTable &dcTable, const tHuffTable &acTable,
                     float *quantTable, bool isLuminance, int16_t *outBlock) {
    int16_t workBlock[64] = {0};
    int16_t curDc = decodeBlock(stream, workBlock, dcTable, acTable, prevDc); 
    unzigzag(workBlock);
    dequantize(workBlock, quantTable);
    idct(outBlock, workBlock, isLuminance);
    return curDc;
}
after this the outBlock is treated by the color space conversion based on the image type.
What I want to optimize is the overall performance. The entire image is decompressed in the following way with 4 luminance blocks for component 1, 1 chrominance block for component 2 and 1 chrominance block for component 3. There are 4 more blocks for another luminance component, but I dont know what it is used for, so we can ignore it. The code looks like this:
void decodeImageType0(
        uint32_t width,
        uint32_t height,
        std::vector<uint8_t> &outData,
        BitStream &stream,
        const tHuffTable &dcLumTable,
        const tHuffTable &acLumTable,
        const tHuffTable &dcCromTable,
        const tHuffTable &acCromTable,
        float *lumQuant[4],
        float *cromQuant[4]) {
    int16_t lum0[4][64]{};
    int16_t lum1[4][64]{};
    int16_t crom0[64]{};
    int16_t crom1[64]{};
    uint32_t colorBlock[16 * 16]{};
    const auto actualHeight = ((height + 15) / 16) * 16;
    const auto actualWidth = ((width + 15) / 16) * 16;
    int16_t prevDc[4] = {0};
    for (auto y = 0; y < (actualHeight / 16); ++y) {
        for (auto x = 0; x < (actualWidth / 16); ++x) {
            for (auto &lum : lum0) {
                prevDc[0] = processBlock(prevDc[0], stream, dcLumTable, acLumTable, lumQuant[0], true, lum);
            }
            prevDc[1] = processBlock(prevDc[1], stream, dcCromTable, acCromTable, cromQuant[1], false, crom0);
            prevDc[2] = processBlock(prevDc[2], stream, dcCromTable, acCromTable, cromQuant[2], false, crom1);
            for (auto &lum : lum1) {
                prevDc[3] = processBlock(prevDc[3], stream, dcLumTable, acLumTable, lumQuant[3], true, lum);
            }
            decodeColorBlockType0(lum0, lum1, crom0, crom1, colorBlock);
            for (auto row = 0; row < 16; ++row) {
                if(y * 16 + row >= height || x * 16 >= width) {
                    continue;
                }
                const auto numPixels = std::min(16u, width - x * 16);
                memcpy(outData.data() + (y * 16 + row) * width * 4 + x * 16 * 4, &colorBlock[row * 16], numPixels * 4);
            }
        }
    }
}
Now my measurements have shown that over 80% of the time is spent inside the idct function, so this is where I want to optimize. The function looks like this, after I applied what I could think of to optimize it. I have created a cache of the static coefficients used in the IDCT process which significantly improved performance, but I hope there is still room for more, for example nanojpg is 3 times faster (however with invalid results).
float idctHelper(const int16_t *inBlock, int32_t u, int32_t v, int32_t blockWidth, int32_t blockHeight) {
    glm::vec<4, float, glm::packed_lowp> vec3{};
    float result = 0.0f;
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; x += 4) {
            const auto idx = (v * 8 + u) * 64 + y * 8 + x;
            vec3 = glm::vec<4, float, glm::packed_lowp>(inBlock[y * blockWidth + x], inBlock[y * blockWidth + x + 1], inBlock[y * blockWidth + x + 2], inBlock[y * blockWidth + x + 3]) *
                    glm::vec<4, float, glm::packed_lowp>(idctLookup[idx], idctLookup[idx + 1], idctLookup[idx + 2], idctLookup[idx + 3]);
            result += vec3.x + vec3.y + vec3.z + vec3.w;
        }
    }
    return result;
}
template<typename T, typename U = T>
U clamp(T value, T min, T max) {
    return static_cast<U>(std::min<T>(std::max<T>(value, min), max));
}
void idct(int16_t *outBlock, int16_t *inBlock, bool isLuminance, int32_t blockWidth = 8, int32_t blockHeight = 8) {
    for (auto y = 0; y < blockHeight; ++y) {
        for (auto x = 0; x < blockWidth; ++x) {
            auto value = static_cast<int16_t>(std::round(
                    0.25f * idctHelper(inBlock, x, y, blockWidth, blockHeight)));
            if (isLuminance) {
                value = clamp<int16_t>(static_cast<int16_t>(value + 128), 0, 255);
            } else {
                value = clamp<int16_t>(value, -256, 255);
            }
            outBlock[y * blockWidth + x] = value;
        }
    }
}
This is the cache that is created once:
float alphaFunction(int32_t n) {
    static float INV_SQRT_2 = 1.0f / sqrtf(2.0f);
    if (n == 0) {
        return INV_SQRT_2;
    } else {
        return 1;
    }
}
        for (auto u = 0; u < 8; ++u) {
            for (auto v = 0; v < 8; ++v) {
                for (auto x = 0; x < 8; ++x) {
                    for (auto y = 0; y < 8; ++y) {
                        idctLookup[(v * 8 + u) * 64 + y * 8 + x] = alphaFunction(x) * alphaFunction(y) *
                                                                   cosf((2 * u + 1) * x * (float) M_PI / 16.0f) *
                                                                   cosf((2 * v + 1) * y * (float) M_PI / 16.0f);
                    }
                }
            }
        }
c++ performance image
c++ performance image
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 3 mins ago
CromonCromon
101
101
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Cromon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
                            0
                        
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Cromon is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211391%2foptimizing-custom-jpeg-decompression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
                            0
                        
active
oldest
votes
                            0
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
Cromon is a new contributor. Be nice, and check out our Code of Conduct.
Cromon is a new contributor. Be nice, and check out our Code of Conduct.
Cromon is a new contributor. Be nice, and check out our Code of Conduct.
Cromon is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211391%2foptimizing-custom-jpeg-decompression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown