Sokrates on Oracle

How 2 Bytes can fit into 1 Byte

Posted by Matthias Rogel on 20. February 2014

Chris Saxon posted a nice quiz regarding a pitfall when creating tables with VARCHAR2-columns: when you are not explicit in specifying the length-semantics ( CHAR or BYTE ), a session parameter, which may vary, is used.

This is a short follow-up of his story, which shows more pitfalls when creating views and selecting from those.
We end up in a structure which can – by definition – contain only 1 byte, but actually shows containing two bytes !

sokrates@11.2 > alter session set nls_length_semantics=CHAR;

Session altered.

sokrates@11.2 > create table i_can_insert (
  2    into_this varchar2(1)
  3  );

Table created.

sokrates@11.2 > insert into i_can_insert (into_this)
  2  values ('£');

1 row created.

sokrates@11.2 > create view i_can_select (from_that)
  2  as
  3  select cast(into_this as varchar2(1))
  4  from i_can_insert;

View created.

sokrates@11.2 > describe i_can_insert
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 INTO_THIS                                          VARCHAR2(1)

sokrates@11.2 > describe i_can_select
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 FROM_THAT                                          VARCHAR2(1)

sokrates@11.2 > select * from i_can_select;

FROM_THAT
----------
£

sokrates@11.2 > REM now we change the length semantics ...
sokrates@11.2 > alter session set nls_length_semantics=BYTE;

Session altered.

sokrates@11.2 > REM note that the output of describe changes !
sokrates@11.2 > describe i_can_insert
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 INTO_THIS                                          VARCHAR2(1 CHAR)

sokrates@11.2 > describe i_can_select
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 FROM_THAT                                          VARCHAR2(1 CHAR)

sokrates@11.2 > create view can_i_select (from_that)
  2  as
  3  select cast(into_this as varchar2(1))
  4  from i_can_insert;

View created.

sokrates@11.2 > describe can_i_select
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 FROM_THAT                                          VARCHAR2(1)

sokrates@11.2 > REM now, can I select ?
sokrates@11.2 > set null nada
sokrates@11.2 > select * from can_i_select;

FROM_THAT
----------
nada

sokrates@11.2 > REM no I cannot, output would be too long in my current semantics, so it is trimmed ( nothing left )
sokrates@11.2 >
sokrates@11.2 > REM now, can I select from i_can_select ?
sokrates@11.2 > select * from i_can_select;

FROM_THAT
----------
nada

sokrates@11.2 > REM no
sokrates@11.2 > select * from i_can_insert;

INTO_THIS
----------
£

sokrates@11.2 > REM but still from the table itself, I can select !
sokrates@11.2 >
sokrates@11.2 > REM let us change back the semantics
sokrates@11.2 > alter session set nls_length_semantics=CHAR;

Session altered.

sokrates@11.2 > describe can_i_select
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 FROM_THAT                                          VARCHAR2(1 BYTE)

sokrates@11.2 > select * from can_i_select;

FROM_THAT
----------
£

sokrates@11.2 > select lengthb(from_that) from can_i_select;

LENGTHB(FROM_THAT)
------------------
                 2

So, CAN_I_SELECT.FROM_THAT can only hold 1 byte, but the byte-length of its content is 2 !
-> looks like a bug to me.

Note, that USER_VIEWS.TEXT shows the same for both views, even DBMS_METADATA is not explicit , only COLS shows the difference

sokrates@11.2 > select text from user_views where view_name='CAN_I_SELECT';

TEXT
--------------------------------------------------------------------------------
select cast(into_this as varchar2(1))
from i_can_insert


sokrates@11.2 > select text from user_views where view_name='I_CAN_SELECT';

TEXT
--------------------------------------------------------------------------------
select cast(into_this as varchar2(1))
from i_can_insert


sokrates@11.2 > select dbms_metadata.get_ddl('VIEW', 'CAN_I_SELECT') from dual;

DBMS_METADATA.GET_DDL('VIEW','CAN_I_SELECT')
--------------------------------------------------------------------------------

  CREATE OR REPLACE FORCE VIEW "SOKRATES"."CAN_I_SELECT" ("FROM_THAT") AS
  select cast(into_this as varchar2(1))
from i_can_insert


sokrates@11.2 > select dbms_metadata.get_ddl('VIEW', 'I_CAN_SELECT') from dual;

DBMS_METADATA.GET_DDL('VIEW','I_CAN_SELECT')
--------------------------------------------------------------------------------

  CREATE OR REPLACE FORCE VIEW "SOKRATES"."I_CAN_SELECT" ("FROM_THAT") AS
  select cast(into_this as varchar2(1))
from i_can_insert


sokrates@11.2 > select table_name, column_name, data_length from cols where table_name in ('I_CAN_SELECT', 'CAN_I_SELECT');

TABLE_NAME                     COLUMN_NAME                    DATA_LENGTH
------------------------------ ------------------------------ -----------
CAN_I_SELECT                   FROM_THAT                                1
I_CAN_SELECT                   FROM_THAT                                4

4 Responses to “How 2 Bytes can fit into 1 Byte”

  1. Dom Brooks said

    Test case requires a multibyte characterset to start with – obvious I know but…

    I categorize this not as a bug, but user error.

    A view contains a dynamic SQL expression.
    Because BYTE vs CHAR is not defined in the CAST expression in the view, the exact semantics of the CAST is during the runtime evaluation of the view SQL based on your session settings.
    The same thing could affect a selected column or a join which relies on implicit datatype conversion, etc.

    NLS_LENGTH_SEMANTICS affects object definitions, expressions and variables.

    Of course you can select from i_can_select when the nls_length_semantics is byte – that should not be a surprise.
    The definition of the table is already fixed to CHAR.

    And obviously you can’t CAST a two-byte character to one-byte so NULL, or unknown, is the correct answer.

    Really what this is about is whether CHAR_USED together with DATA_LENGTH are considered 100% relevant/reliable for such a view.
    And clearly they are not.
    It’s possible this might have interesting repercussions for any driver code etc which might use such definitions to define internal length restrictions, memory allocations etc.

    But to me the demo above is expected behaviour and the moral of the story is when dealing with multi-byte charactersets, pay careful attention to length semantics.

    • “…Really what this is about is whether CHAR_USED together with DATA_LENGTH are considered 100% relevant/reliable for such a view. …”

      exactly

      “…And clearly they are not. …”

      I always thought they would be.

      “…It’s possible this might have interesting repercussions for any driver code etc which might use such definitions to define internal length restrictions, memory allocations etc….”

      Absolutely.
      Any idea how to correctly find the upper bound for the byte length of a column then when the data dictionary is allowed to lie ?
      Multiply it by 4 “just to be save” ? Looks very strange to me. And who can tell if it would be save this way ?

      • looks like the data dictionary is a notorious liar anyway:

        sokrates@11.2 > insert into i_can_select values(null);
        insert into i_can_select values(null)
        *
        ERROR at line 1:
        ORA-01733: virtual column not allowed here
        
        
        sokrates@11.2 > select virtual_column from user_tab_cols where table_name='I_CAN_SELECT';
        
        VIRTUAL_COLUMN
        --------------------
        NO
        
  2. Interesting – just reinforces my belief that you should always specify char or byte explicitly for the varchar rather than relying on session settings!

Leave a comment