/* ADCAMPAIGN-16: Клиенты тревела с потеряными городами в расширенном таргетинге


Отбор происходит по порогу, оценивается количество хитов в регионе. 
Порог на минимальное количество хитов в регионе из головы эксперта.
Если у клиента было Ч хитов с таким гео, то он с этим изначально ок
Если мы угадали, то явно на него таргетируемся через условия показа с гео добавкой
Если нет, то повод улучшить минусовку.
*/

$SUBSPACE = "pub";
$PERIOD_CALC = 30;
$INSIGHT_ID = 4;
$NOW = EvaluateExpr(CurrentUtcTimestamp());
$format_date = DateTime::Format("%Y-%m-%d");

$DIMENSION_1 = "client_id";
$DIMENSION_2 = "counterparty";
$YT_DST_1 = "home/vipplanners/insights/" || $SUBSPACE || '/' || CAST($INSIGHT_ID AS String) || '/' || $DIMENSION_1 || '/' || $format_date($NOW);
$YT_DST_2 = "home/vipplanners/insights/" || $SUBSPACE || '/' || CAST($INSIGHT_ID AS String) || '/' || $DIMENSION_2 || '/' || $format_date($NOW);

USE hahn;

-- Список таблиц для расчета за последние Ч дней.
$hit_tables = (
SELECT TOP_BY(Path, CAST(TableName(Path) AS Date), $PERIOD_CALC)
FROM FOLDER(`home/logfeller/logs/bs-hit-log/1d`)
);

$event_tables = (
SELECT TOP_BY(Path, CAST(TableName(Path) AS Date), $PERIOD_CALC)
FROM FOLDER(`cooked_logs/bs-chevent-cooked-log/1d`)
);

$client_tables = (
SELECT TOP_BY(Path, CAST(TableName(Path) AS Date), $PERIOD_CALC)
FROM FOLDER(`home/comdep-cubes/direct/production/hypercubes/clients`)
);


-- https://doc.yandex-team.ru/lib/libgeobase5/concepts/region-types.html
$get_city = ($id) -> {
  $geo = Geo::RoundRegionById($id, "city");
  RETURN IF($geo.type = 6, $geo.id, Nothing(Int32?));
};

$hit_region = (
SELECT
  pageid
, hitlogid
, city_from_query
FROM (
  SELECT 
    CAST(pageid AS Int64) AS pageid
  , CAST(hitlogid AS Uint64) AS hitlogid
  , $get_city(CAST(IF(bestgeoregionid != "0", bestgeoregionid, regionfromquery) AS Int32)) AS city_from_query
  FROM EACH($hit_tables)
  WHERE bestgeoregionid != "0" OR regionfromquery != "0"
)
WHERE city_from_query IS NOT NULL
);

$good_banner = (
SELECT 
  BannerID
, Category AS MediaGroupID
FROM `home/comdep-analytics/banner-categories/BannerIDMediaGroup`
WHERE Category IN ( 
  -- набор категорий из головы эксперта
  -- https://yql.yandex-team.ru/Operations/XL3Semim9axsnvDH608vKrtWUqflxZSjIxe0jsA42fM=
  -- ul указан специально, некорректно работал вывод типов
  983491328489533790ul,   -- УСЛУГИ ПО АРЕНДЕ ЖИЛЫХ ПОМЕЩЕНИЙ
  17347395858893381749ul, -- ГОСТИНИЦЫ
  12518978136597314482ul, -- ПАССАЖИРСКИЕ ПЕРЕВОЗКИ
  1199824593919247438ul,  -- УСЛУГИ ЖЕЛЕЗНОДОРОЖНОГО ПАССАЖИРСКОГО ТРАНСПОРТА
  3201796442998457704ul,  -- УСЛУГИ ПО ОПЕРАЦИЯМ С ЖИЛЫМИ ПОМЕЩЕНИЯМИ
  5947381187149658357ul,  -- БИЛЕТЫ НА АВТОБУС
  8925645584749811233ul,  -- УСЛУГИ ВОЗДУШНОГО ПАССАЖИРСКОГО ТРАНСПОРТА
  10448206104281745186ul, -- ПРОКАТ АВТОМОБИЛЕЙ
)
);

INSERT INTO @event_prefiltered WITH TRUNCATE 
SELECT
  hitlogid
, e.bannerid AS bannerid
, e.billingexportid AS cid
FROM EACH($event_tables) AS e
LEFT SEMI JOIN $hit_region AS qr
  ON e.pageid = qr.pageid
  AND e.hitlogid = qr.hitlogid
LEFT SEMI JOIN $good_banner AS gb
  ON e.bannerid = gb.BannerID
; COMMIT;

$cid_client = (
SELECT
  c.cid AS cid
, c.ClientID AS ClientID
FROM `//home/direct/db/campaigns` AS c
);

INSERT INTO @joined_event WITH TRUNCATE 
SELECT
  h.hitlogid AS hitlogid
, h.city_from_query AS city_from_query
, h.ClientID AS ClientID
, h.MediaGroupID AS MediaGroupID
, crm.curr_counterparty_name AS counterparty
FROM (
  SELECT 
    hitlogid
  , SOME(h.city_from_query) AS city_from_query
  , AGGREGATE_LIST(DISTINCT c.ClientID) AS ClientID
  , AGGREGATE_LIST(DISTINCT b.MediaGroupID) AS MediaGroupID
  FROM @event_prefiltered AS e
  JOIN $hit_region AS h
    ON e.hitlogid = h.hitlogid
  JOIN $cid_client AS c
    ON e.cid = c.cid
  JOIN $good_banner AS b
    ON e.bannerid = b.BannerID
  GROUP BY e.hitlogid AS hitlogid
)  AS h
FLATTEN BY (MediaGroupID, ClientID)
JOIN `//home/comdep-analytics/public/client_tiers/fact/latest` AS crm
  ON h.ClientID = crm.client_id
; COMMIT;

$region_mg_stat = (
SELECT
  city_from_query
, MediaGroupID
, COUNT(DISTINCT hitlogid) AS Hits
FROM @joined_event
GROUP BY 
  city_from_query
, MediaGroupID
);

$region_client_stat = (
SELECT 
  ClientID
, city_from_query
FROM (
  SELECT
    city_from_query
  , ClientID
  , COUNT(DISTINCT hitlogid) AS Hits
  FROM @joined_event
  GROUP BY 
    city_from_query
  , ClientID
)
-- Порог из головы эксперта
WHERE Hits > 100
);

$region_counterparty_stat = (
SELECT 
  counterparty
, city_from_query
FROM (
  SELECT
    city_from_query
  , counterparty
  , COUNT(DISTINCT hitlogid) AS Hits
  FROM @joined_event
  GROUP BY 
    city_from_query
  , counterparty
)
-- Порог из головы эксперта
WHERE Hits > 100
);

$mg_client_stat = (
SELECT 
  ClientID
, MediaGroupID
FROM (
  SELECT
    MediaGroupID
  , ClientID
  , COUNT(DISTINCT hitlogid) AS Hits
  FROM @joined_event
  GROUP BY 
    MediaGroupID
  , ClientID
)
-- Порог из головы эксперта
WHERE Hits > 100
);

$mg_counterparty_stat = (
SELECT 
  counterparty
, MediaGroupID
FROM (
  SELECT
    MediaGroupID
  , counterparty
  , COUNT(DISTINCT hitlogid) AS Hits
  FROM @joined_event
  GROUP BY 
    MediaGroupID
  , counterparty
)
-- Порог из головы эксперта
WHERE Hits > 100
);

$client_region_mg_stat_1 = (
SELECT
  ClientID
, city_from_query
, MediaGroupID
, COUNT(DISTINCT e.hitlogid) AS Hits
FROM @joined_event AS e
LEFT SEMI JOIN $region_client_stat AS wl1
   ON e.ClientID = wl1.ClientID
  AND e.city_from_query = wl1.city_from_query
LEFT SEMI JOIN $mg_client_stat AS wl2
   ON e.ClientID = wl2.ClientID
  AND e.MediaGroupID = wl2.MediaGroupID
GROUP BY 
  e.ClientID AS ClientID
, e.city_from_query AS city_from_query
, e.MediaGroupID AS MediaGroupID
);

$counterparty_region_mg_stat_1 = (
SELECT
  counterparty
, city_from_query
, MediaGroupID
, COUNT(DISTINCT e.hitlogid) AS Hits
FROM @joined_event AS e
LEFT SEMI JOIN $region_counterparty_stat AS wl1
   ON e.counterparty = wl1.counterparty
  AND e.city_from_query = wl1.city_from_query
LEFT SEMI JOIN $mg_counterparty_stat AS wl2
   ON e.counterparty = wl2.counterparty
  AND e.MediaGroupID = wl2.MediaGroupID
GROUP BY 
  e.counterparty AS counterparty
, e.city_from_query AS city_from_query
, e.MediaGroupID AS MediaGroupID
);

$client_region_mg_stat_2 = (
SELECT 
  c.ClientID AS client_id
, c.city_from_query AS city_from_query
, mg.MediaGroup AS mediagroup
, c.Hits AS client_hits
, t.Hits AS total_hits
FROM $client_region_mg_stat_1 AS c  -- client in segment
JOIN $region_mg_stat AS t  -- total of segment
   ON c.city_from_query = t.city_from_query
  AND c.MediaGroupID = t.MediaGroupID
LEFT JOIN `home/comdep-analytics/common/MediaGroups` AS mg
  ON c.MediaGroupID = mg.MediaGroupID
);

$counterparty_region_mg_stat_2 = (
SELECT 
  c.counterparty AS counterparty
, c.city_from_query AS city_from_query
, mg.MediaGroup AS mediagroup
, c.Hits AS client_hits
, t.Hits AS total_hits
FROM $counterparty_region_mg_stat_1 AS c  -- client in segment
JOIN $region_mg_stat AS t  -- total of segment
   ON c.city_from_query = t.city_from_query
  AND c.MediaGroupID = t.MediaGroupID
LEFT JOIN `home/comdep-analytics/common/MediaGroups` AS mg
  ON c.MediaGroupID = mg.MediaGroupID
);

$cube_stat = (
SELECT 
  client_id
, SUM(search_shows) AS search_shows
, SUM(clicks) AS clicks
, SUM(cost) AS cost_rub_wo_nds
FROM EACH($client_tables)
GROUP BY client_id
HAVING SUM(cost) > 100
);

$counterparty_cube_stat = (
SELECT 
  counterparty
, SUM(c.search_shows) AS search_shows
, SUM(c.clicks) AS clicks
, SUM(c.cost) AS cost_rub_wo_nds
FROM EACH($client_tables) AS c
JOIN `//home/comdep-analytics/public/client_tiers/fact/latest` AS crm
  ON c.client_id = crm.client_id
GROUP BY crm.curr_counterparty_name AS counterparty
HAVING SUM(cost) > 100
);

$client_data = (
SELECT
  a.client_id AS client_id
, lost_hits
, badness_pct
, client_hits
, segments_top_bad
, region_mediagroup_count
, mediagroup_count
, s.search_shows ?? 0 AS client_search_shows
, s.clicks ?? 0 AS client_clicks
, s.cost_rub_wo_nds ?? 0.0 AS client_cost_rub_wo_nds
FROM (
  SELECT 
    client_id
  , SUM(total_hits - client_hits) AS lost_hits                                      -- Количество потерянных хитов в сегментах с расширенным гео
  , (SUM(total_hits - client_hits) * 1.0) / sum(total_hits) * 100.0 AS badness_pct  -- Доля потерянный хитов в сегментам с расширенным гео
  , SUM(client_hits) AS client_hits                                                 -- Примерное количество хитов в сегментах с раширенным гео, замножено из-за мультикатегорий
  , TOP_BY(
      AsTuple(
        mediagroup, Geo::RegionById(city_from_query).name, Geo::RegionById(Geo::FindCountry(city_from_query)).name, total_hits - client_hits
      ), total_hits - client_hits, 100) AS segments_top_bad  -- Топ сегментов с потерями в расширенном гео
  , COUNT(*) AS region_mediagroup_count                      -- Количество сегментов с расширенным гео с потенциалом
  , COUNT(DISTINCT mediagroup) AS mediagroup_count           -- Количество категорий где актуальна проработка
  FROM $client_region_mg_stat_2
  WHERE (total_hits - client_hits) > 1
  GROUP BY client_id
) AS a
JOIN $cube_stat AS s
  ON a.client_id = s.client_id
); COMMIT;

$counterparty_data = (
SELECT
  a.counterparty AS counterparty
, lost_hits
, badness_pct
, client_hits
, segments_top_bad
, region_mediagroup_count
, mediagroup_count
, s.search_shows ?? 0 AS client_search_shows
, s.clicks ?? 0 AS client_clicks
, s.cost_rub_wo_nds ?? 0.0 AS client_cost_rub_wo_nds
FROM (
  SELECT 
    counterparty
  , SUM(total_hits - client_hits) AS lost_hits                                      -- Количество потерянных хитов в сегментах с расширенным гео
  , (SUM(total_hits - client_hits) * 1.0) / sum(total_hits) * 100.0 AS badness_pct  -- Доля потерянный хитов в сегментам с расширенным гео
  , SUM(client_hits) AS client_hits                                                 -- Примерное количество хитов в сегментах с раширенным гео, замножено из-за мультикатегорий
  , TOP_BY(
      AsTuple(
        mediagroup, Geo::RegionById(city_from_query).name, Geo::RegionById(Geo::FindCountry(city_from_query)).name, total_hits - client_hits
      ), total_hits - client_hits, 100) AS segments_top_bad  -- Топ сегментов с потерями в расширенном гео
  , COUNT(*) AS region_mediagroup_count                      -- Количество сегментов с расширенным гео с потенциалом
  , COUNT(DISTINCT mediagroup) AS mediagroup_count           -- Количество категорий где актуальна проработка
  FROM $counterparty_region_mg_stat_2
  WHERE (total_hits - client_hits) > 1
  GROUP BY counterparty
) AS a
JOIN $counterparty_cube_stat AS s
  ON a.counterparty = s.counterparty
); COMMIT;

$_ltv = ($data) -> {
  -- полезность найденного завсит от того на сколько суммарный объем значим в сравнении со статистикой по клиенту
  $criticality = MIN_OF($data.lost_hits * 1.0 / $data.client_search_shows, 1);
  -- максимальный эффект назначен экспертно
  $MAX = 0.25;
  $coef = $MAX * $criticality;
  RETURN $coef * $data.client_cost_rub_wo_nds;
};
$_hours = ($data) -> {
  -- 2 часа на категорию
  RETURN $data.mediagroup_count * 2;
};

$add_forecasts = ($data) -> {
  $data = AddMember($data, "diff_ltv30", $_ltv($data));
  $data = AddMember($data, "task_est_hours", $_hours($data));
  RETURN $data;
};

INSERT INTO $YT_DST_1 WITH TRUNCATE
SELECT
  client_id
, $INSIGHT_ID AS insight_type
, DateTime::ToSeconds($NOW) AS insight_time
, Yson::SerializePretty(
    Yson::From(
      $add_forecasts(data)
    )
  ) AS data
FROM (
  SELECT 
    client_id
  , AsStruct(
    lost_hits AS lost_hits,
    badness_pct AS badness_pct,
    client_hits AS client_hits,
    segments_top_bad AS segments_top_bad,
    region_mediagroup_count AS region_mediagroup_count,
    mediagroup_count AS mediagroup_count,
    client_search_shows AS client_search_shows,
    client_clicks AS client_clicks,
    client_cost_rub_wo_nds AS client_cost_rub_wo_nds
  ) AS data
  FROM $client_data
)
;

INSERT INTO $YT_DST_2 WITH TRUNCATE
SELECT
  counterparty
, $INSIGHT_ID AS insight_type
, DateTime::ToSeconds($NOW) AS insight_time
, Yson::SerializePretty(
    Yson::From(
      $add_forecasts(data)
    )
  ) AS data
FROM (
  SELECT 
    counterparty
  , AsStruct(
    lost_hits AS lost_hits,
    badness_pct AS badness_pct,
    client_hits AS client_hits,
    segments_top_bad AS segments_top_bad,
    region_mediagroup_count AS region_mediagroup_count,
    mediagroup_count AS mediagroup_count,
    client_search_shows AS client_search_shows,
    client_clicks AS client_clicks,
    client_cost_rub_wo_nds AS client_cost_rub_wo_nds
  ) AS data
  FROM $counterparty_data
)
;
