Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 443 Vote(s) - 3.47 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Counting DISTINCT over multiple columns

#11
To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

In MySQL you can do the same thing without the concatenation step as follows:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

This feature is mentioned in the MySQL documentation:

[To see links please register here]

Reply

#12
Here's a shorter version without the subselect:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.

Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.
Reply

#13
It works for me. In oracle:

SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;

In jpql:

SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;
Reply

#14
How about this,

Select DocumentId, DocumentSessionId, count(*) as c
from DocumentOutputItems
group by DocumentId, DocumentSessionId;

This will get us the count of all possible combinations of DocumentId, and DocumentSessionId
Reply

#15
You can just use the Count Function Twice.

In this case, it would be:

SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId)
FROM DocumentOutputItems
Reply

#16
I had a similar question but the query I had was a sub-query with the comparison data in the main query. something like:

Select code, id, title, name
(select count(distinct col1) from mytable where code = a.code and length(title) >0)
from mytable a
group by code, id, title, name
--needs distinct over col2 as well as col1

ignoring the complexities of this, I realized I couldn't get the value of a.code into the subquery with the double sub query described in the original question

Select count(1) from (select distinct col1, col2 from mytable where code = a.code...)
--this doesn't work because the sub-query doesn't know what "a" is

So eventually I figured out I could cheat, and combine the columns:

Select count(distinct(col1 || col2)) from mytable where code = a.code...

This is what ended up working

Reply

#17
This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in MySQL like a charm.

select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)
from DocumentOutputItems
group by i ,s;


Reply

#18
I have used this approach and it has worked for me.

SELECT COUNT(DISTINCT DocumentID || DocumentSessionId)
FROM DocumentOutputItems

For my case, it provides correct result.
Reply

#19
If you're working with datatypes of fixed length, you can cast to `binary` to do this very easily and very quickly. Assuming `DocumentId` and `DocumentSessionId` are both `int`s, and are therefore 4 bytes long...

```SQL
SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems
```

My specific problem required me to divide a `SUM` by the `COUNT` of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The `CHECKSUM` solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.

However, using the above solution had virtually no increase on the query time (comparing with using simply the `SUM`), and should be completely reliable! It should be able to help others in a similar situation so I'm posting it here.
Reply

#20
Some SQL databases can work with a tuple expression so you can just do:
```
SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
FROM DocumentOutputItems;
```
If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.
```COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))```.

[MySQL specifically supports `COUNT(DISTINCT expr, expr, ...)`](

[To see links please register here]

) which is non-SQL standard syntax. It also notes `In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).`

A related use of tuples is performing `IN` queries such as:
```
SELECT * FROM DocumentOutputItems
WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));
```
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through