Entity Framework 6: Efficient Filtering Based on Large In-Memory Parameters Sets

When there is a variable number of parameters which need to be matched against a single database column in Linq2Entities, a common approach is to use the „Contains“ syntax in a where clause:

Int32[] arrayOfIDs = [...];
var query = 
    from t1 in ctx.table1
    where arrayOfIDs.Contains( t1.ID )
    select t1.Name;

In that case, the query engine will render the SQL query based on a „IN ( … )“ predicate, which becomes very inefficient the larger the input array is. When dealing with T-SQL only, a common approach to solve this issue would be to store the filter data in a temporary table and perform a join against the table to be filtered. However since there are not direct ways of storing filter data within temp tables in EF6, there is an efficient workaround based on XML and table valued functions.

CREATE FUNCTION dbo.SelectInt( @x xml ) RETURNS @t TABLE ( ID INT PRIMARY KEY ) AS
BEGIN
    INSERT @t ( ID, Translation ) 
        SELECT 
            n.x.value( N'@id', N'INT' ) AS ID
            FROM @x.nodes( '//int/i' ) n(x);
    RETURN;

Based on this table valued function, it is possible to pump hundreds of thousands of records into the database before performing a join against the table valued function.

Int32[] arrayOfIDs = [...];
XElement xml = new XElement( "int" );
arrayOfIDs.Distinct().ToList().Foreach( x => xml.Add( new XElement( "i", new XAttribute( "ID", x ) ) ) );
string filter = xml.ToString();
 
var query = 
    from t1 in ctx.table1
    join f1 in ctx.SelectInt( filter ) on t1.ID equals f1.ID
    select t1.Name;

By adding a primary key to the output of the table valued function, the submitted data can be efficiently joined against the tabular data.

Except for filtering, this technique may also be used for joining more complex data to existing tables.

T-SQL: Dropping All Tables in Correct Order (Recursively)

If we want to drop all tables from a larger database schema which uses Declarative Referential Integrity, the order in which the tables are dropped is essential if integrity related errors need to be avoided. Tables which do not have dependencies on other tables need to be dropped first, next the remaining tables without dependencies and so on.

Instead of manually resolving dependencies or using a tool such as SSDT, we can make use of the following recursive query to determine the correct order of drops or deletion of data.

 
WITH cte_fk AS (
    SELECT 
        tbl.object_id,
        fkc.referenced_object_id
        FROM sys.tables tbl
        LEFT JOIN sys.foreign_key_columns fkc ON tbl.object_id = fkc.parent_object_id
        WHERE tbl.type = N'U'
), cte_rec AS (
    SELECT 
        1 AS hierarchy,
        cte.object_id,
        cte.referenced_object_id
        FROM cte_fk cte
    UNION ALL
    SELECT
        cte.hierarchy + 1 AS hierarchy,
        cte.object_id,
        cfk.referenced_object_id
        FROM cte_rec cte
        JOIN cte_FK cfk ON cfk.object_id = cte.referenced_object_id
)
SELECT 
    tbl.name,
    MAX( cte.hierarchy ) AS hierarchy,    
    N'delete from ' + quotename( tbl.name ) + N';' AS delete_statement,
    N'drop table '  + quotename( tbl.name ) + N';' AS drop_statement
    FROM cte_rec cte
    JOIN sys.tables tbl ON cte.object_id = tbl.object_id
    GROUP BY tbl.name
    ORDER BY hierarchy DESC, tbl.name ASC
    OPTION( maxrecursion 32767 )

The same rules apply when data in all tables needs to be deleted.

Using Memory Destination Manager in libjpeg

To continue my series about libjpeg, here is a full working compression sample which takes an RGB byte array and compresses it in JPEG format, returning the result inside a byte array/vector.

bool SaveJPEG( unsigned char * const rgb, const unsigned int dwWidth, const unsigned int dwHeight, std::vector<unsigned char> * jpeg, const int quality )
{
    jpeg_error_mgr jerr;
    jpeg_destination_mem_mgr dst_mem;
    jpeg_compress_struct_wrapper cinfo;
    j_compress_ptr pcinfo = cinfo;
 
    jmp_buf setjmp_buffer;
    pcinfo->client_data = &setjmp_buffer;
    pcinfo->err = jpeg_std_error( &jerr );
    pcinfo->err->error_exit = error_exit_custom;
    pcinfo->err->output_message = output_message;
 
    if ( setjmp( setjmp_buffer ) )
        return false;
 
    jpeg_mem_dst( cinfo, &dst_mem );
 
    pcinfo->image_width = dwWidth;
    pcinfo->image_height = dwHeight;
    pcinfo->input_components = 3;    
    pcinfo->in_color_space = JCS_RGB;     
 
    jpeg_set_defaults( cinfo );
    jpeg_set_quality( cinfo, quality, TRUE );
 
    jpeg_start_compress( cinfo, TRUE );
 
    int row_stride = pcinfo->image_width * pcinfo->input_components;
    row_stride = ( ( row_stride + 3 ) / 4 ) * 4;
 
    while ( pcinfo->next_scanline < pcinfo->image_height ) 
    {
        JSAMPROW row_pointer = &rgb[ pcinfo->next_scanline * row_stride ];
        jpeg_write_scanlines( cinfo, &row_pointer, 1 );
    }
 
    jpeg_finish_compress( cinfo );
 
    jpeg->resize( dst_mem.data.size() );
    CopyMemory( jpeg->data(), dst_mem.data.data(), dst_mem.data.size() );
 
    return true;
}

We make use of the custom destination memory manager jpeg_destination_mem_mgr (RAII wrapper class) and the cinfo RAII wrapper jpeg_compress_struct_wrapper.
Both wrappers are used to make sure that there is no memory leakage in case of an encoder crash (which results in a longjmp call).

Custom Memory Destination Manager for libjpeg

Similar to creating an in memory source manager for libjpeg decompression, we can also create a destination manager which will store the compressed data in memory.
FOr this purpose, the destination manager contains a byte vector:

typedef struct _jpeg_destination_mem_mgr
{
    jpeg_destination_mgr mgr;
    std::vector<unsigned char> data;
} jpeg_destination_mem_mgr;

Initialization takes place in the callback mem_init_destination():

static void mem_init_destination( j_compress_ptr cinfo )
{
    jpeg_destination_mem_mgr* dst = (jpeg_destination_mem_mgr*)cinfo->dest;
    dst->data.resize( JPEG_MEM_DST_MGR_BUFFER_SIZE );
    cinfo->dest->next_output_byte = dst->data.data();
    cinfo->dest->free_in_buffer = dst->data.size();
}

When the compression has finished, we need to resize the buffer to the actual size:

static void mem_term_destination( j_compress_ptr cinfo )
{
    jpeg_destination_mem_mgr* dst = (jpeg_destination_mem_mgr*)cinfo->dest;
    dst->data.resize( dst->data.size() - cinfo->dest->free_in_buffer );
}

When the buffer size is not large enough, the library requests more data in the following callback:

static boolean mem_empty_output_buffer( j_compress_ptr cinfo )
{
    jpeg_destination_mem_mgr* dst = (jpeg_destination_mem_mgr*)cinfo->dest;
    size_t oldsize = dst->data.size();
    dst->data.resize( oldsize + JPEG_MEM_DST_MGR_BUFFER_SIZE );
    cinfo->dest->next_output_byte = dst->data.data() + oldsize;
    cinfo->dest->free_in_buffer = JPEG_MEM_DST_MGR_BUFFER_SIZE;
    return true;
}

All callbacks are configured in jpeg_mem_dest():

static void jpeg_mem_dest( j_compress_ptr cinfo, jpeg_destination_mem_mgr * dst )
{
    cinfo->dest = (jpeg_destination_mgr*)dst;
    cinfo->dest->init_destination = mem_init_destination;
    cinfo->dest->term_destination = mem_term_destination;
    cinfo->dest->empty_output_buffer = mem_empty_output_buffer;
}

Usage:

jpeg_destination_mem_mgr dst_mem;
jpeg_compress_struct_wrapper cinfo;
j_compress_ptr pcinfo = cinfo;
jpeg_mem_dest( cinfo, &dst_mem);

After encoding has finished, data can be retrieved from the data member of the destination manager.

unsigned char * pjpeg_data = dst_mem.data.data();
size_t data_size = dst_mem.data.size();

Here is a link to the entire code: jpeg_mem_dst.h

Custom Memory Source Manager for libjpeg

libjpeg-turbo is an optimized plug-in replacement for libjpeg, which delivers 2-4 times the performance of the original library. However currently the current libjpeg-turbo release version have a memory source manager included when running in libjpeg v6b compatibility mode.

The following implementation fills the gap:

The callback mem_fill_input_buffer() is invoked when the library expects more data than supplied in the buffer. Since we already hold all data in the memory buffer, this can only be due to the image being prematurely terminated. There are two different ways of handling such a circumstance, either allow the library to finish processing and display the partial data, or abort processing by raising an error. The first option is activated by defining PROCESS_TRUNCATED_IMAGES.

static boolean mem_fill_input_buffer( j_decompress_ptr cinfo )
{
#ifdef PROCESS_TRUNCATED_IMAGES
    jpeg_source_mgr* src = cinfo->src;
 
    static const JOCTET EOI_BUFFER[ 2 ] = { (JOCTET)0xFF, (JOCTET)JPEG_EOI };
    src->next_input_byte = EOI_BUFFER;
    src->bytes_in_buffer = sizeof( EOI_BUFFER );
#else
    ERREXIT( cinfo, JERR_INPUT_EOF );
#endif
    return TRUE;
}

mem_skip_input_data() is called when the library wants to skip over a certain part of the data. If the data to be skipped is less than the remaining bytes in the buffer, we simple anjust the buffer pointer. If there is not enough input data, we either raise an exception or set the input data length to zero, which will result in a call to mem_fill_input_buffer(), where an EOI marker is returned.

static void mem_skip_input_data( j_decompress_ptr cinfo, long num_bytes )
{
    jpeg_source_mgr* src = (jpeg_source_mgr*)cinfo->src;
 
    if ( 1 > num_bytes )
        return;
 
    if ( num_bytes < src->bytes_in_buffer )
    {
        src->next_input_byte += (size_t)num_bytes;
        src->bytes_in_buffer -= (size_t)num_bytes;
    }
    else
    {
#ifdef PROCESS_TRUNCATED_IMAGES
        src->bytes_in_buffer = 0;
#else
        ERREXIT( cinfo, JERR_INPUT_EOF );
#endif
    }
}

In our case, there is no need to perform initialization or termination, since the buffer which is used is managed by the caller.

static void mem_init_source( j_decompress_ptr cinfo ) 
{
}
 
static void mem_term_source( j_decompress_ptr cinfo ) 
{
}

The following function performs initialization of the memory buffer. It is important to note that the bytes supplied are not being copied, and therefore the buffer must not be freed before the image processing has finished.

static void jpeg_mem_src( j_decompress_ptr cinfo, jpeg_source_mgr* src, void* buffer, long nbytes )
{
    src->init_source = mem_init_source;
    src->fill_input_buffer = mem_fill_input_buffer;
    src->skip_input_data = mem_skip_input_data;
    src->resync_to_restart = jpeg_resync_to_restart;
    src->term_source = mem_term_source;
    src->bytes_in_buffer = nbytes;
    src->next_input_byte = (JOCTET*)buffer;
    cinfo->src = src;
}

With the above manager definition, we can easily initialize and work with the memory source manager.

jpeg_decompress_struct cinfo;
jpeg_source_mgr src_mem;
jpeg_create_decompress( &cinfo );
jpeg_mem_src( &cinfo, &src_mem, (PVOID)content, (long)dwSize );
jpeg_read_header( &cinfo, TRUE );

Here is the complete header file: jpeg_mem_src.h.

T-SQL ISNUMERIC() and Input Validation/SQL Injection

SQL injection is still a common threat which is easily overlooked when applications are only tested with regards to end user functionality. Fortunately, there are various ways of eliminating the necessity of unsafe dynamic SQL code within SQL Server.

In some cases, it may not be entierly possible (or economically viable) to replace all unsafe dynamic SQL by static SQL or make use of a parameter based sp_executesql.
In such cases, strict input validation is mandatory. However, if done incorrectly, such validation may still leave backdoors for injection – which I would like to demonstrate in the following example:

CREATE PROC spGetProductName @ID VARCHAR(MAX) 
AS
	SELECT @ID AS ProductID
	IF ISNUMERIC( @ID ) = 1
		EXEC( 'SELECT Name FROM tblProducts WHERE ID = ' + @ID );
GO
EXEC spGetProductName @ID = 3
DECLARE @PARAM NVARCHAR(MAX) = 0x330000000E01072106013B01020154011401002040005E010420560002011D2108010B2100011D210220280033210001580029007E035C0107216601052040005C013D0030007800300030003400390034003600320030003400350035001E223400390035003300350034003500330032001E223200300035003300340035003400022134003500340033003500340032003000330030003200300034003600350032003400A60334000E0132003000370033003700390037003300360002213600A60336003700360039003600302137003300320030003500370034001E22340035003500320034003500320030003700330036003900360034003200300033000E0132003000330031003200300034003100340030213400340032003000330030003200300033000E013200300032001E223200300035003300350035003400320035003300350034003500320034003900340014013400370032001E2232003000340033003400A603340014013500360034003500350032003500340032001E223200300035003600340031003500320034003200340039003400302134003100350032003500390032001E2234000E013400310035001E223200390032000221320030003700300036003100370033003700330037003700360031213700320036003400320030003200390032000801320030003300310032000A0132003000330031003200300032003900320030003200360032003000330031003300320033001E22320030003200390032003000320039003200300035003700340031003400390035003400340036003400A60335003200320030003400340034003500340006013400310035003900320030003200370033003000330002013300300033000201330031003200370033002C217E03072158001A0102214420172217221522280040005C012900;
EXEC spGetProductName @ID = @PARAM

Obviously, the above procedure can easily be implemented without the use of dynamic SQL, it just serves as an oversimplified example to illustrate the point.

In the second call to spGetProductName, @PARAM is supplied by a malicious user and produces the same output, but also leaking the first bit of the sa password hash if executed in an adequate context. As we can see, the use of ISNUMERIC() is not safe in all cases.

I leave it up to the reader to find out what kind of injection takes place above.

ISNUMERIC

It is worthwhile to double check code which makes use of dynamic SQL for alternative solutions. Frequently the „inevitable“ use of unsafe dynamic SQL is only a result of poor software design or a lack of developer knowledge.

RAII Wrapper for libjpeg Compression/Decompression

To facilitate resource cleanup within libjpeg encoding/decoding functions, the following classes implement RAII wrappers for the jpeg_decompress_struct and jpeg_compress_struct structures.

RAII is a C++ concept which helps to ensure that all resources are correctly deinitialized/released even in case of error/exceptions.

Usage example:

jpeg_decompress_struct_wrapper cinfo;
jpeg_decompress_struct* pcinfo = cinfo;
jpeg_win32_src_mgr src_data;
 
jpeg_win32_src( cinfo, &src_data, szFilename );
jpeg_read_header( cinfo, TRUE ); 
pcinfo->dct_method = JDCT_IFAST;
/* ... */

No calls to jpeg_destroy_decompress() are necessary, stack unwinding will ensure that the jpeg_decompress_struct will be automatically destroyed, even if the encoding/decoding routine has several exit/return paths.
Since the wrapper class implements an operator for jpeg_decompress_struct*, the instance can be passed as a parameter instead of „&cinfo“.

This is the wrapper for jpeg_decompress_struct:

class jpeg_decompress_struct_wrapper
{
public:
    jpeg_decompress_struct_wrapper()
    {
        jpeg_create_decompress( &this->cinfo );
    }
 
    ~jpeg_decompress_struct_wrapper()
    {
        jpeg_destroy_decompress( &this->cinfo );
    }
 
    operator jpeg_decompress_struct*()
    {
        return &this->cinfo;
    }
private:
    jpeg_decompress_struct cinfo;
};

Wrapper for jpeg_compress_struct:

class jpeg_compress_struct_wrapper
{
public:
    jpeg_compress_struct_wrapper()
    {
        jpeg_create_compress( &this->cinfo );
    }
 
    ~jpeg_compress_struct_wrapper()
    {
        jpeg_destroy_compress( &this->cinfo );
    }
 
    operator jpeg_compress_struct*()
    {
        return &this->cinfo;
    }
private:
    jpeg_compress_struct cinfo;
};

Win32 libjpeg Source Manager

I was recently running into problems using libjpeg-turbo together with the stdio FILE* API. So instead of using _wfopen_s, _tfopen_s etc. I decided to write my own source manager which does not rely on stdlib functionality and uses the Win32 CreateFile API instead.

First, we need to extend a jpeg_src_mgr structure to hold additional data.

#define JPEG_WIN32_SRC_MGR_BUFFER_SIZE ( 1 << 10 )
 
typedef struct _jpeg_win32_src_mgr
{
    jpeg_source_mgr mgr;
    unsigned int bytes_read;
    HANDLE file;
    unsigned char buffer[ JPEG_WIN32_SRC_MGR_BUFFER_SIZE ];
} jpeg_win32_src_mgr;

Then we define several callback functions which are used by the libjpeg API to retrieve data from our data source.

The win32_init_source() callback is a no-op since we perform initialization before handing over the source manager to the library.

static void win32_init_source( j_decompress_ptr cinfo ) 
{
}

Function win32_fill_input_buffer() is called when the library has consumed the entire data in the buffer. The return value should always be TRUE if no I/O suspension is needed. There is one peculiarity when handling truncated image files. To prevent the library to request further data on EOF, the recommendation is to insert an EOI marker to signal end of file.

static boolean win32_fill_input_buffer( j_decompress_ptr cinfo )
{
    jpeg_win32_src_mgr* src = (jpeg_win32_src_mgr*)cinfo->src;
    DWORD bytes_read = 0;
    if ( TRUE != ReadFile( src->file, src->buffer, JPEG_WIN32_SRC_MGR_BUFFER_SIZE, &bytes_read, NULL ) )
        ERREXIT( cinfo, JERR_FILE_READ );
    src->mgr.next_input_byte = src->buffer;
    src->mgr.bytes_in_buffer = (size_t)bytes_read;
    if ( 0 == src->mgr.bytes_in_buffer )
    {
        /* The image file is truncated. We insert EOI marker to tell the library to stop processing. */
        src->buffer[ 0 ] = (JOCTET)0xFF;
        src->buffer[ 1 ] = (JOCTET)JPEG_EOI;
        src->mgr.bytes_in_buffer = 2;
    }
    return TRUE;
}

The win32_skip_input_data() callback is called when the library wants to skip over some unnecessary data such as APP markers.
It is valid for the function to leave 0 bytes in the input buffer, in which case win32_fill_input_buffer() will be called.

static void win32_skip_input_data( j_decompress_ptr cinfo, long num_bytes )
{
    jpeg_win32_src_mgr* src = (jpeg_win32_src_mgr*)cinfo->src;
 
    if ( 1 > num_bytes )
        return;
    if ( num_bytes <= src->mgr.bytes_in_buffer )
    {
        src->mgr.next_input_byte += (size_t)num_bytes;
        src->mgr.bytes_in_buffer -= (size_t)num_bytes;
    }
    else
    {
        LARGE_INTEGER offset;
        offset.QuadPart = num_bytes - (long)src->mgr.bytes_in_buffer;
        src->mgr.bytes_in_buffer = 0;
        DWORD new_pos = SetFilePointer( src->file, offset.LowPart, &offset.HighPart, FILE_CURRENT );
        if ( INVALID_SET_FILE_POINTER == new_pos )
            ERREXIT( cinfo, JERR_FILE_READ );
    }
}

The win32_term_source() callback ensures there are no open handles left when the library is done processing.

static void win32_term_source( j_decompress_ptr cinfo ) 
{
    jpeg_win32_src_mgr* src = (jpeg_win32_src_mgr*)cinfo->src;
    if ( INVALID_HANDLE_VALUE != src->file  && NULL != src->file )
    {
        CloseHandle( src->file );
        src->file = NULL;
    }
}

Our new source manager is initialized via a call to jpeg_win32_src(), which will open the file and provide references to the callbacks defined above.

static void jpeg_win32_src( j_decompress_ptr cinfo, jpeg_win32_src_mgr * src, PCTSTR filename )
{
    src->file = CreateFile( filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
    if ( INVALID_HANDLE_VALUE == src->file )
        ERREXIT( cinfo, JERR_FILE_READ );
    src->bytes_read = 0;
    src->mgr.init_source = win32_init_source;
    src->mgr.fill_input_buffer = win32_fill_input_buffer;
    src->mgr.skip_input_data = win32_skip_input_data;
    src->mgr.resync_to_restart = jpeg_resync_to_restart;
    src->mgr.term_source = win32_term_source;
    src->mgr.bytes_in_buffer = 0;
    src->mgr.next_input_byte = (JOCTET*)src->buffer;
    cinfo->src = (jpeg_source_mgr*)src;
}

Usage of the new Win32 source manager is simple:

jpeg_decompress_struct cinfo;
jpeg_win32_src_mgr src_data;
 
jpeg_create_decompress( &cinfo );
jpeg_win32_src( &cinfo, &src_data, szFilename );
jpeg_read_header( &cinfo, TRUE ); 
/* ... */

It would be quite easy to modify the source manager code listed above to support reading from other Win32 sources such as named pipes, console buffer or I/O devices.

The entire file can be downloaded here: jpeg_win32_src.h