How 2 Bytes can fit into 1 Byte
Posted by Matthias Rogel on 20. February 2014
Chris Saxon posted a nice quiz regarding a pitfall when creating tables with VARCHAR2-columns: when you are not explicit in specifying the length-semantics ( CHAR or BYTE ), a session parameter, which may vary, is used.
This is a short follow-up of his story, which shows more pitfalls when creating views and selecting from those.
We end up in a structure which can – by definition – contain only 1 byte, but actually shows containing two bytes !
sokrates@11.2 > alter session set nls_length_semantics=CHAR; Session altered. sokrates@11.2 > create table i_can_insert ( 2 into_this varchar2(1) 3 ); Table created. sokrates@11.2 > insert into i_can_insert (into_this) 2 values ('£'); 1 row created. sokrates@11.2 > create view i_can_select (from_that) 2 as 3 select cast(into_this as varchar2(1)) 4 from i_can_insert; View created. sokrates@11.2 > describe i_can_insert Name Null? Type ----------------------------------------- -------- ---------------------------- INTO_THIS VARCHAR2(1) sokrates@11.2 > describe i_can_select Name Null? Type ----------------------------------------- -------- ---------------------------- FROM_THAT VARCHAR2(1) sokrates@11.2 > select * from i_can_select; FROM_THAT ---------- £ sokrates@11.2 > REM now we change the length semantics ... sokrates@11.2 > alter session set nls_length_semantics=BYTE; Session altered. sokrates@11.2 > REM note that the output of describe changes ! sokrates@11.2 > describe i_can_insert Name Null? Type ----------------------------------------- -------- ---------------------------- INTO_THIS VARCHAR2(1 CHAR) sokrates@11.2 > describe i_can_select Name Null? Type ----------------------------------------- -------- ---------------------------- FROM_THAT VARCHAR2(1 CHAR) sokrates@11.2 > create view can_i_select (from_that) 2 as 3 select cast(into_this as varchar2(1)) 4 from i_can_insert; View created. sokrates@11.2 > describe can_i_select Name Null? Type ----------------------------------------- -------- ---------------------------- FROM_THAT VARCHAR2(1) sokrates@11.2 > REM now, can I select ? sokrates@11.2 > set null nada sokrates@11.2 > select * from can_i_select; FROM_THAT ---------- nada sokrates@11.2 > REM no I cannot, output would be too long in my current semantics, so it is trimmed ( nothing left ) sokrates@11.2 > sokrates@11.2 > REM now, can I select from i_can_select ? sokrates@11.2 > select * from i_can_select; FROM_THAT ---------- nada sokrates@11.2 > REM no sokrates@11.2 > select * from i_can_insert; INTO_THIS ---------- £ sokrates@11.2 > REM but still from the table itself, I can select ! sokrates@11.2 > sokrates@11.2 > REM let us change back the semantics sokrates@11.2 > alter session set nls_length_semantics=CHAR; Session altered. sokrates@11.2 > describe can_i_select Name Null? Type ----------------------------------------- -------- ---------------------------- FROM_THAT VARCHAR2(1 BYTE) sokrates@11.2 > select * from can_i_select; FROM_THAT ---------- £ sokrates@11.2 > select lengthb(from_that) from can_i_select; LENGTHB(FROM_THAT) ------------------ 2
So, CAN_I_SELECT.FROM_THAT can only hold 1 byte, but the byte-length of its content is 2 !
-> looks like a bug to me.
Note, that USER_VIEWS.TEXT shows the same for both views, even DBMS_METADATA is not explicit , only COLS shows the difference
sokrates@11.2 > select text from user_views where view_name='CAN_I_SELECT'; TEXT -------------------------------------------------------------------------------- select cast(into_this as varchar2(1)) from i_can_insert sokrates@11.2 > select text from user_views where view_name='I_CAN_SELECT'; TEXT -------------------------------------------------------------------------------- select cast(into_this as varchar2(1)) from i_can_insert sokrates@11.2 > select dbms_metadata.get_ddl('VIEW', 'CAN_I_SELECT') from dual; DBMS_METADATA.GET_DDL('VIEW','CAN_I_SELECT') -------------------------------------------------------------------------------- CREATE OR REPLACE FORCE VIEW "SOKRATES"."CAN_I_SELECT" ("FROM_THAT") AS select cast(into_this as varchar2(1)) from i_can_insert sokrates@11.2 > select dbms_metadata.get_ddl('VIEW', 'I_CAN_SELECT') from dual; DBMS_METADATA.GET_DDL('VIEW','I_CAN_SELECT') -------------------------------------------------------------------------------- CREATE OR REPLACE FORCE VIEW "SOKRATES"."I_CAN_SELECT" ("FROM_THAT") AS select cast(into_this as varchar2(1)) from i_can_insert sokrates@11.2 > select table_name, column_name, data_length from cols where table_name in ('I_CAN_SELECT', 'CAN_I_SELECT'); TABLE_NAME COLUMN_NAME DATA_LENGTH ------------------------------ ------------------------------ ----------- CAN_I_SELECT FROM_THAT 1 I_CAN_SELECT FROM_THAT 4
Dom Brooks said
Test case requires a multibyte characterset to start with – obvious I know but…
I categorize this not as a bug, but user error.
A view contains a dynamic SQL expression.
Because BYTE vs CHAR is not defined in the CAST expression in the view, the exact semantics of the CAST is during the runtime evaluation of the view SQL based on your session settings.
The same thing could affect a selected column or a join which relies on implicit datatype conversion, etc.
NLS_LENGTH_SEMANTICS affects object definitions, expressions and variables.
Of course you can select from i_can_select when the nls_length_semantics is byte – that should not be a surprise.
The definition of the table is already fixed to CHAR.
And obviously you can’t CAST a two-byte character to one-byte so NULL, or unknown, is the correct answer.
Really what this is about is whether CHAR_USED together with DATA_LENGTH are considered 100% relevant/reliable for such a view.
And clearly they are not.
It’s possible this might have interesting repercussions for any driver code etc which might use such definitions to define internal length restrictions, memory allocations etc.
But to me the demo above is expected behaviour and the moral of the story is when dealing with multi-byte charactersets, pay careful attention to length semantics.
Matthias Rogel said
“…Really what this is about is whether CHAR_USED together with DATA_LENGTH are considered 100% relevant/reliable for such a view. …”
exactly
“…And clearly they are not. …”
I always thought they would be.
“…It’s possible this might have interesting repercussions for any driver code etc which might use such definitions to define internal length restrictions, memory allocations etc….”
Absolutely.
Any idea how to correctly find the upper bound for the byte length of a column then when the data dictionary is allowed to lie ?
Multiply it by 4 “just to be save” ? Looks very strange to me. And who can tell if it would be save this way ?
Matthias Rogel said
looks like the data dictionary is a notorious liar anyway:
Chris Saxon said
Interesting – just reinforces my belief that you should always specify char or byte explicitly for the varchar rather than relying on session settings!